A team from UC Berkeley and Intel Labs has posted a new pre-print titled "Dancing under the stars: video denoising in starlight". They present a new method for denoising videos captured in extremely low illumination of fractions of a lux.
Abstract:
Imaging in low light is extremely challenging due to low photon counts. Using sensitive CMOS cameras, it is currently possible to take videos at night under moonlight (0.05-0.3 lux illumination). In this paper, we demonstrate photorealistic video under starlight (no moon present, <0.001 lux) for the first time. To enable this, we develop a GAN-tuned physics-based noise model to more accurately represent camera noise at the lowest light levels. Using this noise model, we train a video denoiser using a combination of simulated noisy video clips and real noisy still images. We capture a 5-10 fps video dataset with significant motion at approximately 0.6-0.7 millilux with no active illumination. Comparing against alternative methods, we achieve improved video quality at the lowest light levels, demonstrating photorealistic video denoising in starlight for the first time.
The data in this paper was captured using an NIR-enhanced RGB camera sensor optimized for low light imaging. The authors write:
We choose to use a Canon LI3030SAI Sensor, which is a 2160x1280 sensor with 19µm pixels, 16 channel analog output, and increased quantum efficiency in NIR. This camera has a Bayer pattern consisting of red, green, blue (RGB), and NIR channels (800-950nm). Each RGB channel has an additional transmittance peak overlapping with the NIR channel to increase light throughput at night. During daylight, the NIR channel can be subtracted from each RGB channel to produce a color image, however at night when NIR is dominant, subtracting out the NIR channel will remove a large portion of the signal resulting in muffled colors. We pair this sensor with a ZEISS Otus 28mm f/1.4 ZF.2 lens, which we choose due to its large aperture and wide field-of-view.
Pre-print: https://arxiv.org/pdf/2204.04210.pdf
This is a cool preprint - it could use a little more TLC to make it really nice. I wish these CV people would include some cost estimates for their DNNs... they seem to think the only FOM that matters is accuracy, and everything else, like energy, speed, memory, don't matter at all.
ReplyDeleteThey also use a super expensive Canon CIS with 19um pixels and RGBN Bayer pattern... this chip is about 2MP but measures at least 40x20mm, it is HUGE. Plus I'm somewhat skeptical of their fancy noise modeling - it's not clear it would work in mass production, would each camera need it's own fine tuned denoiser? And it's not clear from their experiments how well the method uses Eric Warrent's ideas of "smart spatiotemporal summation" to reduce noise and motion blur. After reading the paper, I have no idea if this method could be practical to implement in real time, would it need a 200W GPU attached to the camera?
ReplyDeleteThese image noise is dominate by VFPN, which is hard to see on current CIS. I wonder the power it took to fix FPN and Hnoise will increase dramatically.
DeleteA skeptic like me would say that yes, they do clearly show perceptually superior denoising by comparison to other methods, but the lower light level is nearly entirely due to the Canon high QE and huge pixels that are about 100 times the area of economical MP automotive or surveillance CIS. The superior denoising might be mostly due to very specialized attack on the read and shot noise and column FPN of this particular CIS. And the computational costs are missing, so it's not clear if the method is practical. I hope they improve their published paper with these aspects.
ReplyDeleteBTW, the project page is https://kristinamonakhova.com/starlight_denoising/ and the direct arxiv link is https://arxiv.org/abs/2204.04210. It is CVPR 2022 paper, not clear if accepted or just submitted so far. There is one more thing in the paper that is a bit confusing: The night videos are shot at only 5-10 FPS, but the paper states the exposure times are 0.2 to 0.1 ms. I think this must be a mistake - it must be 200ms to 100ms. I see substantial motion blur for the fast moving things in the videos. Still the results are really cool IMO.
ReplyDeleteThe paper is accepted at CVPR 2022 as an oral presentation and will be presented this week. [Friday June 24, 2022]
Delete