Friday, May 06, 2022

Will event-cameras dominate computer vision?

Dr. Ryad Benosman, a professor at University of Pittsburgh believes a huge shift is coming to how we capture and process images in computer vision applications. He predicts that event-based (or, more broadly, neuromorphic) vision sensors are going to dominate in the future.

Dr. Benosman will be a keynote speaker at this year's Embedded Vision Summit

EETimes published an interview with him; some excerpts below.

According to Benosman, until the image sensing paradigm is no longer useful, it holds back innovation in alternative technologies. The effect has been prolonged by the development of high–performance processors such as GPUs which delay the need to look for alternative solutions.

“Why are we using images for computer vision? That’s the million–dollar question to start with,” he said. “We have no reasons to use images, it’s just because there’s the momentum from history. Before even having cameras, images had momentum.”

Benosman argues, image camera–based techniques for computer vision are hugely inefficient. His analogy is the defense system of a medieval castle: guards positioned around the ramparts look in every direction for approaching enemies. A drummer plays a steady beat, and on each drumbeat, every guard shouts out what they see. Among all the shouting, how easy is it to hear the one guard who spots an enemy at the edge of a distant forest?

“People are burning so much energy, it’s occupying the entire computation power of the castle to defend itself,” Benosman said. If an interesting event is spotted, represented by the enemy in this analogy, “you’d have to go around and collect useless information, with people screaming all over the place, so the bandwidth is huge… and now imagine you have a complicated castle. All those people have to be heard.”

“Pixels can decide on their own what information they should send, instead of acquiring systematic information they can look for meaningful information — features,” he said. “That’s what makes the difference.”

This event–based approach can save a huge amount of power, and reduce latency, compared to systematic acquisition at a fixed frequency.

“You want something more adaptive, and that’s what that relative change [in event–based vision] gives you, an adaptive acquisition frequency,” he said. “When you look at the amplitude change, if something moves really fast, we get lots of samples. If something doesn’t change, you’ll get almost zero, so you’re adapting your frequency of acquisition based on the dynamics of the scene. That’s what it brings to the table. That’s why it’s a good design.”

He goes on to admit some of the key challenges that need to be addressed before neuromorphic vision becomes the dominant paradigm. He believes these challenges are surmountable.

“The problem is, once you increase the number of pixels, you get a deluge of data, because you’re still going super fast,” he said. “You can probably still process it in real time, but you’re getting too much relative change from too many pixels. That’s killing everybody right now, because they see the potential, but they don’t have the right processor to put behind it.” 

“[Today’s DVS] sensors are extremely fast, super low bandwidth, and have a high dynamic range so you can see indoors and outdoors,” Benosman said. “It’s the future. Will it take off? Absolutely!”

“Whoever can put the processor out there and offer the full stack will win, because it’ll be unbeatable,” he added. 

Read the full article here:



  1. will the hammer dominate the saw constructing houses? well, i doubt. the fact is there are tasks for both. event based sensors will find their niches but the frame based tasks will not disappear. in robotics there are tasks that require a global shutter sensor, this will also be the case in 10 years.

  2. Like the comment above, there will indeed probably be tasks for which event-camera's will become the best suited imaging technology. The question I have is, given a certain problem, how do you choose the image sensor type you should use and how do you compare the performance of the event-camera and regular camera approach?

  3. Disruption which implies the demise of incumbent technology (and players) usually occurs on the back of a new market, C.Christensen theorized this from the analysis of the hard disc drive market. Event-Based image sensor technology is probably of such nature, but it will have to grow from a massive transformation of the market, which Robotics and Ambiant Intelligence (metaverse) are possible candidates. P.Cambou Yole Développement

  4. If your end goal is to form an image for human consumption, I dare say that you'll never beat the amount of money and engineering hours put into standard CMOS sensors.

    If you have a special application that is limited by CMOS, then an event camera could succeed. For example, an event based camera requires much less sensor-to-ISP/computer bandwidth than a CMOS sensor so the event based camera can record image data at a higher rate. Since CMOS sensors can handle ~1000 fps pretty well (i.e. Samsung Galaxy S20), I guess that event based cameras could succeed where applications are demanding >>1000fps.

    Neural networks are an interesting application, but they may have to be re-architected somewhat, since my understanding is that current networks are designed to process entire image frames versus just a set of updated pixels. If you end up generating a full frame image *before* the neural network, then you are just using the event camera for data compression, which seems like a waste of clever technology.

  5. The Event camera & neuromorphic computing gang needs to grow up beyond the "its sensible because its like how human vision operates" to demonstrating actual applications of these technologies on real benchmarks. Its about time they started doing comparisons with existing methods and reporting real applications.

  6. Event based image sensor is an architecture, not a new technology from a semiconductor process point of view, it does build upon the legacies of CMOS Pinned photodiode innovation and the decades of industrialization that followed. I observe the discussion about why we need event-based image sensors is evolving favorably. Many people now recognize the temporal resolution ie speed which is equivalent to ~1,000fps as being one of the great benefits. Next step will be to recognize the native High Dynamic Range ~120dB, then the low latency detection capability in the order of ~1ms, then the easy addition of Depth information etc, etc… The desire to oppose Frame versus Event on existing applications is probably not the right way to discuss about this new technology, but rather the imaging community should ask what kind of new application will enable Event-Based imaging? One great researcher to listen to on this matter is Dr Scaramuzza . P.Cambou Yole Développement

  7. It's true that current DVS development in in some kind of megapixel race that is stifling development of smarter pixels to obtain smaller ones - for example none of the current cameras have a surround like all real retinas (see for a proposal for economical center-surround event camera).
    What we have found is that DVS are very nicely matched to DNN inference hardware that exploits activation sparsity to reduce MACs and latency, e.g. in EyeRiss, NullHop, and TwoNullHop (e.g. see Aimar's PhD thesis at These CNN accelerators, when driven by properly exposed activity-drven "frames" of DVS events (e.g. exposed with AreaEventCount method of EDFLOW camera provide an easy way to exploit the low latency, HDR, and sparse output of DVS with efficient hardware CNNs,


All comments are moderated to avoid spam and personal attacks.