Edge Impulse announced its new object detection algorithm, dubbed Faster Objects, More Objects (FOMO), targeting extremely power and memory constrained computer vision applications.
Some quotes from their blog:
FOMO is a ground-breaking algorithm that brings real-time object detection, tracking and counting to microcontrollers for the first time. FOMO is 30x faster than MobileNet SSD and runs in <200K of RAM. To give you an idea, we have seen results around 30 fps on the Arduino Nicla Vision (Cortex-M7 MCU) using 245K RAM.
Since object detection models are making a more complex decision than object classification models they are often larger (in parameters) and require more data to train. This is why we hardly see any of these models running on microcontrollers.
The FOMO model provides a variant in between; a simplified version of object detection that is suitable for many use cases where the position of the objects in the image is needed but when a large or complex model cannot be used due to resource constraints on the device.
It's nice to see they are up-front about the limitations of their method:
Works better if the objects have a similar size
FOMO can be thought of as object detection where the bounding boxes are all square and, with the default configuration, 1/8th of the resolution of the input. This means it operates best when the objects are all of a similar size. For many use cases, for example, those with a fixed location of the camera, it isn't a problem.
Objects shouldn’t be too close to each other.
If your classes are “screw,” “nail,” “bolt,” each cell (or grid) will be either “screw,” “nail,” “bolt,” and "background.” It's thus not possible to detect distinct objects where their centroids occupy the same cell in the output. It is possible, though, to increase the resolution of the image (or to decrease the heat map factor) to reduce this limitation.
They shows various demo applications such as object detection, counting and tracking.