Fast Bilateral-Space Stereo

A while ago I discovered the paper “Fast bilateral-space stereo for synthetic defocus” of Jon Barron [1] and it kept intriguing me. Therefore I decided to implement the dense stereo reconstruction method of this paper in order to get a real understanding and practical experience of this method. A link to the source code can be found at the bottom of the page.

The goal of the paper is to generate disparity maps in order to simulate synthetic shallow depth of field of an image stereo pair with a deep depth-of-field. The algorithm must produce good depth maps, both in terms of accurate depth estimates and more important in terms of localization of the edges in the depth map. The disparity map must closely track edges in the image to avoid rendering artifacts.

The core idea of the paper is to avoid per-pixel inference by leveraging techniques for fast bilateral filtering to “resample” the dense stereo problem from pixel-space into a much smaller “bilateral-space”. Bilateral-space is a resampling of pixel-space such that small, simple blurs between adjacent vertices in bilateral-space are equivalent to large, edge-aware blurs in pixel-space. Because inference is done in this compact “bilateral-space” instead of pixel-space, the approach is fast and scalable despite solving a global optimization problem with non-local smoothness priors.

Please check the paper [1] [2] for more technical details or check the video that can be found on the personal page of Jon Barron. Also Ugo Capeto made an implementation and provides a technical report of this algorithm which can be found here.

Below you can see the results of my implementation.

The following videos demonstrate the inference process of the disparity map.

 

The optimization process visualized in 3D.

 

The optimization process visualized in 3D. Post-processed using the domain transform, a fast edge-aware filtering technique.

 

Backstreet at dei Coronari

Some backstreet at dei Coronari street near the di San Salvatore in Lauro square, Rome, Italy. The original stereo-pair can be found here.

Left image of input stereo-pair

3D view of disparity map

3D view of post-processed disparity map

Different view of post-processed disparity map

Post-processed disparity map

Garibaldi Bridge

View from Garibaldi bridge over the Tiber river towards the Tiber island, Rome, Italy. The original stereo-pair can be found here.

Left image of stereo-pair

3D view of disparity map

The spikes are caused by vertices in the bilateral space whom cannot be inferenced. This can be resolved by lowering the resolution of the simplified bilateral grid.

3D view of post-processed disparity map

The edge-aware domain transform filter can remove the spikes.

Different view of post-processed disparity map

Post-processed disparity map

Mausoleum of Garibaldi

Mausoleum of Garibaldi, Rome, Italy. The original stereo-pair can be found here.

Left image of stereo-pair

3D view of disparity map

There are not enough stereo matches in the sky. That is why the sky is blended with the objects in the foreground.

3D view of post-processed disparity map

Different view of post-processed disparity map

Post-processed disparity map

Courtyard with fountains

Courtyard with fountains, Villa Giulia and the Etruscan Museum, Rome, Italy. The original stereo-pair can be found here.

Left image of stereo-pair

3D view of disparity map

3D view of post-processed disparity map

Different view of post-processed disparity map

Post-processed disparity map

Near the Organ Fountain

Near the Organ Fountain, villa D’Este, Tivoli, Italy. The original stereo-pair can be found here.

Left image of stereo-pair

3D view of disparity map

3D view of post-processed disparity map

Different view of post-processed disparity map

Post-processed disparity map

Vatican museum

Vatican museum, Rome, Italy. The original stereo-pair can be found here.

Left image of stereo-pair

3D view of disparity map

3D view of post-processed disparity map

Different view of post-processed disparity map

Post-processed disparity map

Fountain in park Villa Borghese

Fountain in park Villa Borghese, Rome, Italy. The original stereo-pair can be found here.

Left image of stereo-pair

3D view of disparity map

3D view of post-processed disparity map

Different view of post-processed disparity map

Post-processed disparity map

Near the Oval fountain

Near the Oval fountain, villa D’Este, Tivoli, Italy. The original stereo-pair can be found here.

Left image of stereo-pair

3D view of disparity map

The air is again blended between the objects whom touch the sky due to the lack of stereo matches in the sky.

3D view of post-processed disparity map

Different view of post-processed disparity map

Post-processed disparity map

Source Code

You can find the source code, without the 3D visualization, on my GitHub.
Writing performant code was not a criteria. I did not extensively tune the algorithm.
Only the simplified bilateral grid method without the multiscale optimization is implemented.
The algorithm needs a stereo pair as input and will generate a disparity map.

References

[1] Barron, Jonathan T., et al. “Fast bilateral-space stereo for synthetic defocus.” Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2015.

[2] Barron, Jonathan T., et al. “Fast bilateral-space stereo for synthetic defocus—Supplemental material.” Proc. IEEE Conf. Comput. Vis. Pattern Recognit.(CVPR). 2015.