Pelican Imaging started the new year with quite a few news. First, its website is updated with special emphasis on depth-enabled features:
Second, the company publishes its CTO and co-founder Kartik Venkataraman's keynote on the future of mobile imaging presented at the International Conference on Consumer Electronics (ICCE) on January 12 in Las Vegas, NV. The keynote addresses some of the concerns about the array camera resolution (click to enlarge):
The presentation also talks about the depth-enabled features, after-the-fact focus, background blurring and substitution, videoconferencing, and more. The depth map is not bad for such a small camera:
Third news is that Pelican Imaging's technology has been nominated for a 2014 Edison Award - the nominee #181 out of 309. The winners will be announced on April 30.
The super resolution factor is around 1.4 and not 2.4 as they claim. here is why: each channel is 1000x750 so should resolve 750 lines per image height and not 450 as they present.
ReplyDeleteToo much lies. Just say you have an array camera with 1 mpix that resolves some basic depth at 20-30 CM.
I dont know if these guys are stupid or think we are. Looking at their depth maps of the person with back pack i see that the straps which are literally on the shoulder of the person are at a very different distance than the shoulder. This must be due to lousy post processing of the maps. The depth map of the 3 people: the closet person has two very different colors/depths on his eyes. clearly mistakes of algorithms. They claim to be as good as Iphone5 but on the Res chart they reach 1000 while iphone reaches 1800. thats 3.2 times more res on an 8 mpix iphone compared to a 16 mpix array.
ReplyDeleteNice performance from the Iphone 5
ReplyDeleteThat white sheet of paper on the floor would probably be a hole in the floor, because the pelican measurement can not be wrong. Honestly, if you want to lie about the quality of depth perception, use an image, which looks more like a depth map, and less like a thermal image.
ReplyDeleteThe depth data seems pretty limited in its usefulness. The error seems more than 10x greater than ToF and I am not sure how the total system power compares, given the computational power requirements of the array camera. The depth images are clearly not yet useful for most applications. I wonder why RM is missing his forehead, for example? Seems premature to be highlighting depth acquisition at this performance level.
ReplyDeleteThen there remains the imaging question. There are just so many photons incident on the full sensor given the scene, optics and chip size. If you have to do computation, then the noise always gets worse (consider just adding to noisy signals together, or worse, subtracting them). It is just fundamental, and all the computational power you can apply is not really going to help get rid of the noise. The computational requirements and lower quality image is a steep price to pay for a thinner package.
Off hand, I would say that by now, if this technology was going somewhere good, it would have gone there already. Things kind of work, or they don't, most times. I had my fingers crossed for Pelican for many reasons, but now I am less optimistic. Then again, getting into a few handsets could change the equation for the better.
Computation does not always make noise worse. While adding and subtracting increase noise, averaging can decrease it. For linear operations, noise amplification depends on the weights.
ReplyDeleteI am sorry but the Pelican depth maps are not depth maps and just rubbish. The presenter talk about discriminating between the person and the backpack. As I see in the map, dark red represents long distance. So according to this the backpack is farther away than the person (darker) and also the pants of the guy are at a completely different distance. The kinect maps looks true. pelican: poor results and even worse presentation.
ReplyDeleteIt looks like their correspondence algorithm is not quite refined. This is surprising to me given the amount of money they got. On top of that, correspondence has been studied to death (and many papers published). They can do much better just by implementing existing algorithms.
ReplyDeleteWith respect to what other people said on here.
1 - the sheet of paper on the floor doesn't have texture, so a correspondence can't be established between the sensors. However, this should have been fixed by their regularization addition to the algorithm. It should have filled in with approximate depth values from the surrounding, but it didn't. I guess there's a bug in their regularization code.
2 - RM is missing a forehead because his forehead is not wrinkly enough :p It doesn't contain enough texture to register across sensors. In fact, on close inspection, it should have sufficient texture. The fact that they can't register sensors based on what is available suggests that their "match window" is not large enough. I wonder what their match window size is? Maybe someone from Pelican can clue us in here...
The white scroll on the wall in top left hand corner of the picture seems to also cause problems for their depth extraction algorithm. The bottom edge of the scroll contains very high horizontal orientation energy, and it should have generated very high vertical disparity energy. With 16 sensors, they should have been able to generate very good depth information. This should have been an advantage of having 16 sensors over just stereo side by side. However, their algorithm failed to utilize this advantage.
3. It looks like their depth map is riddled with errors. In the female's head, the nose should have been the reddest part of her face, but it is not. The farthest male's forehead has a large spot that is more red than the female. RM's nose should have been the reddest part of the whole image, but it is not. And the list goes on and on... All of these are probably due to false matches that should have been flagged. Even so, there shouldn't have been so many false matches. I suspect it is primarily due to them using a very small match window and not all 16 sensors.
1,2, and 3 all indicate that they're trying to cut corners. I suspect it's to reduce computational cost. Regularization (e.g. belief propagation via MRF) is iterative and can be quite expensive (1). Using a large match window proportionally increases the correspondence search time (2), and using all 16 sensors to establish correspondence also is computationally demanding (2+3). Detecting false matches requires more computing time (3).
For these reasons, their results are disappointing. Given all the investor $, I would think that there may be a trick up their sleeves that they haven't revealed. If I were them, I'd keep my mouth shut until there is something reasonable to show. Marketing is not necessary until you actually have a product to sell....
I think the idea is good. People have been studying stereopsis since the 1800s (Wheatstone in 1838), and it's about time some interesting application comes out of it for consumer devices. Unfortunately, it looks like these guys have all the relevant hardware, but their algorithms missed even the simplest issues that have been studied for decades.
the first thing I see is the x-axis must have some unit. In US we normally will take it Inch/Feet? Is it mm/cm/meter/Km :D
ReplyDeleteThese slides are obviously very poorly done. I can spend hours picking at problems with it. With respect to the unit, given that all the units so far are in metrics, it is likely that the x-axis for that plot is in meters. cm or km would be ridiculous. I assume he announced it in the talk.
DeleteHowever, the more interesting point is that the plot is probably wrong, and the rest of the data is probably either made up or done incorrectly.
Consider this, depth error as a function of depth is:
[disparity-error] * [depth]^2 / ([focal-length]*[baseline])
[disparity-error] is error from the correspondence algorithm. [baseline] is probably equivalent to what is called pitch in the slide.
In percent, it is:
100 * [disparity-error] * [depth] / ([focal-length]*[baseline])
where I've multiplied by 100 and divided by depth (to get percent). This shows that the depth error in percent is linear with respect to depth. The curves shown are quadratic. Why?
The curves are probably not empirical because they're so smooth. What does this mean? If they are empirical, then there is something seriously wrong in their correspondence algorithm that introduces an error linear with respect to depth.
That's not all. Here is the depth error in PrimeSense:
http://1.bp.blogspot.com/-bhjVtRpSH5Q/ToLbe_N7xNI/AAAAAAAAA9w/EWZhQiIL9zg/s1600/Primesense%2BDepth%2BResolution.JPG
Note that at 5m, PrimeSense error is about 1.5%. This is over an order of magnitude better than Pelican's solution. This makes sense because the projector and sensor are very far apart (probably over an order of magnitude farther than the separation of the farthest sensors in Pelican's array).
Now back track to the backpack slides. If PrimeSense is over an order of magnitude better in depth resolution and PrimeSense can't separate backpack from person, how in the world can Pelican? Something is not making sense here.
Next, let's look at the 3m case. At 3m, Pelican's error (based on the chart) is 20% or 600mm. With an error of 600mm, how can they get such good separation between backpack and person with such good SNR? That must be one big backpack! :p
Next, look at the 1.5m. PrimeSense's error at 1.5m is < 1cm. So why isn't there a separation between backpack and person? Is it a really small backpack? I've actually done a similar test myself, and you can certainly resolve backpack from person using PrimeSense at 1.5m.
What is likely happening is that they are using the sample program from OpenNI that outputs a depth image squeezed into 0...255 (in order to display). They then look at the pixel value at the backpack. This is clearly wrong.
I prefer to think that they used a sub-cm (very small) backpack and super-m (very large) backpack to test the two cases. It's more entertaining.
Now here's something to ponder: if the depth maps are riddled with errors, how do they get such good super resolved images?
What is disturbing to me is that the team doesn't know what their value propositions are. In fact, what they should have emphasized to get an edge over PrimeSense is the fact that the spatial resolution of their depth map is substantially better where it matters (i.e. when the scene contains a substantial amount high spatial frequency energy). And there you go, please PayPal me consultation $. Thank you ;p
Its sad how companies start nice. Get a marketing guy for CEO. Start selling non sense. Draw all customers away. Spend all money on nothing. Go bankrupt.
ReplyDeleteEven if they tll truth. 1050lpih means 1.47 Mpix effective. From 12 Mpix that's bad.
ReplyDelete