Tuesday, March 25, 2014

HTC One (M8) Duo Camera Features Depth Mapping

HTC has announced a Duo Camera in its new One (M8) smartphone. The camera is said to be able to capture a depth information, as stated in the company's official video:

The main 4MP, 2um "Ultrapixel" sensor appears to be identical to the previous generation One smartphone (Youtube link):


  1. So, no Toshiba dual camera. Good for ST

  2. "Depth Mapping"? Depth estimation sounds more precise to me.
    Weak texture => No depth... You should expect Pelican like results, no serious use (baseline--more cameras trade-off)

  3. I agree with Anon above. The generic term "depthmap" is being thrown around to mean things it isn't these days. While the second sensor adds data to do various (weak) tasks, I wouldn't call it a "depthmap".

  4. Although not accurate depth mapping, this is a good marketing stunt and will probably work for the average cell phone user.

    Who is supplying the second sensor, OV? is OV also supplying (or secondary supplying) the main sensor?

    This may take away the uniqueness of Pelican concept.

  5. Why wouldn't they be able to get accurate depth map?

    The baseline is looks ~ 30mm. With this baseline, they should get < 20mm error within a meter (<5 mm within half a meter) without clever math, and the number may be (a lot) better based on their focal length. The thickness of your finger is ~20mm. This error level is enough for segmenting people from each other and segmenting people from the background. It's enough to do the lytro focus effect. It's enough to do most things that the consumer would want to do.

    The main question is... does the consumer really want to do anything at all with depth? No one has addressed that question yet. Lytro tried to, and the answer is no (at least not for $399).

    Note that they should get substantially better depth than Pelican Imaging's depth data for a couple reasons: 1. longer focal length, 2. larger (~ 3x) baseline.

    Not having texture is a problem in theory, but practically, it is not.

    It seems people are using Pelican Imaging's published results as the gold standard for what can be done with stereo/array sensors. It is not.

  6. I would say the pelican depth maps represent the worst case of bad processing. Its good to have bounderies. Structured light is best pelican is worst. Everything else in between.

  7. maybe someone here can clarify, but how can large pixel size be such a big deal for low light? why not use high resolution sensor with small pixel size (and bin) for low light. do you get superlinear increase in effective well size with larger pixel size? or do you get better QE?

    it looks like the lens is an f2.0, so even if the effective well size is large, there's the issue of having enough light to fill it (in low light conditions).

    the problem in low light, i think, is not that the pixel size is small. it's the inherent trade off between SNR and sharpness from motion blur. you can get pretty good images with standard pixel size in low light... if you can hold steady and the scene doesn't move.

    increasing the pixel size doesn't help in low light at all...

    either there is something i am missing or HTC's marketing on lowlight capability is all BS.

    what you do get with large pixel size is better dynamic range, but you can get the same thing with binning. what really matters is the total sensor area, which is not mentioned at all anywhere, but it looks like the total sensor area doesn't change much because larger pixel size is accompanied by a decrease in resolution.

    1. and it looks like i was right. here's a comparison between m8 and sony's xperia z1:


      xperia z1 has a 1.1 um pixel size vs 2.8 um pixel size.

    2. If you divide the HTC pixel into 4 smaller pixels (1um each) and use digital binning, you reduce the noise by sqrt(4)=2, but you also reduce the signal by 4! Ideally, the SNR at low light of 4 binned small pixels is 3dB lower than that of the HTC one. To me, it makes sense having a bit larger pixels for good low light imaging. Also consider that with less pixels, the analog readout can run at lower speed for the same frame rate, generating less thermal noise and a lower power consumption.

    3. Why does signal reduce by 4 in 4-way binning?

    4. In digital binning you average the signal of 4 pixels. Being those pixels 4x smaller than a single big pixel they receive ideally 4x less light hence 4x less signal. If analog binning is possible then the signals (charge) of the small pixels are added therefore their signal is the same as that of the single big HTC pixel meaning same SNR.

    5. By adi's calculation, binning decreases signal by 4 and noise by 2, which means your SNR is reduced by a factor of 2. This makes absolutely no sense.

      Binning 4 small pixels together by averaging increases your SNR by 2x (in comparison to individual small pixels). This is because you are averaging 4 random variables together, and the RVs follow Poisson.

      What is the SNR increase with a 4x larger pixel (in comparison to individual small pixels)? Also 2x increase in the ideal scenario. By the way, this isn't necessarily true. In reality, the increase in SNR as size goes up is actually sub-linear, but let's give the M8 the benefit of the doubt

      There is no difference between large pixel vs small pixel (with binning) with respect to SNR assuming shot noise dominates.

      At low light, it is more likely that dark noise and readout noise dominate. Both of these noise sources are independent of the signal, and they are also (supposed to be) independent across pixels. This means that software binning after the fact can actually give you better data (by effective reducing dark+readout noise).

      The only point that adi makes that is valid is potential heat from high speed ADC. I can't imagine that the heat from ADC is substantial compared to heat generated by the SOC and other components on the device. Nonetheless, it is a valid point, and I would like to see some data on this.

      Now there is a additional issue.

      With a larger effective well size (say 4x), you are now cramming 0 to 4*N (e-) into 8 bits. What does this mean? It means that you are losing local contrast -- in exchange for larger dynamic range. This is in fact visible in pictures taken with the M8. Images tend to look washed out.

      This is all theory, but the good thing is that reality agrees with theory:


      One thing to notice is that in low light, M8 is roughly the same (if not, worse) than the Xperia z2. This is WITHOUT binning! Someone should certainly do a fair side-by-side comparison with binning, but I haven't seen it done yet.

      If you ask me, HTC screwed up here. It looks like someone at HTC is into photography, and the adage in photography is that increasing pixel size makes your signal better -- but this is typically accompanied by faster lens and larger sensor size.

      You can't just increase pixel size alone and expect to get better quality (for the reasons above).

      - td

    6. -td,

      I simply meant that ideally, at low light, the analog binning of 4 pixels improves the SNR by (ideally) 4 if noise is limited by readout and not photonshot noise, DC etc. Analog binning of 4 pixels with 1um pitch is similar to having a bigger 2um pixel=HTC pixel.

      The digital binning of 4 pixels with 1um pitch (again noise limited by readout circuits) improves the SNR by 2 (see eq. (1) Alakarhu's paper at iisw '13). So, analog binning of 4 pixels with 1um pitch or using a single 2um pixel IDEALLY and at LOW LIGHT improves SNR by 4 while digitally binning 4 pixels with 1um pitch each improves SNR by 2.

      Add to that the possibility of a larger noise due to the faster readout needed to read more pixels and you might get even lower SNR for the digitally binned pixel.

      At higher light levels, where the noise is photonshot limited, more pixels are surely better and here I agree with you.

      I have seen a lot of comparisons of HTC's "ultrapixel" camera with Samsung and Sony flagships. Indeed, the pics of the Sony and Samsung look better, even at low light. However I believe that is software related issue. Same story with the Google nexus 5: very good Sony 8mp sensor but bad software.

      That said, the camera of the iphone 5s is the best out there according to many phone reviewers. And it uses "large" 1.5um pixels...Again, I might be missing something, correct please if wrong..

    7. So this is where the confusion lies. How do you get 4x SNR improvement with larger pixel size or analog binning? With analog binning or larger pixel size, you also get 2x SNR improvement. You can work out the math for yourself. What you end up is the exact same equation as the one you quoted.

      What is puzzling to me is that HTC markets this ultrapixel as one of their core value propositions. In reality, it's not that good. Technicalities aside, it kinda sucks practically.... this kind of marketing plagues the imaging industry.

  8. These small but eye catching artifacts will ruin HTC's plan to position M8 as a premium and high end product.

  9. I wouldn't say small. See the asphalt on the girl pic in Engadget.


All comments are moderated to avoid spam and personal attacks.