Google built a ‘Frankenphone’ rig with 5 Pixel 3 smartphones to improve portrait shots

Google Pixel 3 and Pixel 3 XL, despite all the bugs and issues reported by users, have among the best camera on any smartphones right now. One of the curious elements of the Pixel 3’s camera system is that it achieves most of what competing devices are capable of with a single lens. While competitors like Samsung, Apple and Huawei use a secondary lens for depth effect, Google uses only single lens to achieve the same and in some cases, portrait shots from the Pixel 3 are much superior than those from rival devices.

As was the case last year with the Pixel 2 and Pixel 2 XL, Google uses software and other computational photography to achieve this effect. Now, in a blog post, Google explains how it predicts depth using a single camera system on the Pixel 3 and Pixel 3 XL. Rahul Garg, Research Scientist and Neal Wadhwa, Software Engineer, write that the Pixel 2 used phase-detection autofocus (PDAF) pixels (also known as dual-pixel autofocus) along with a traditional non-learned stereo algorithm to capture portrait mode images with blurred background.

Since the dual-pixel PDAF system is capable of shooting two slightly different views of the same scene, Google exploited the tech to create a parallax effect for portraits. This is further used to create a depth map in order to achieve the bokeh effect. With the Pixel 3, however, Google says it wanted to improve on these fundamentals and produce even better pictures.

In order to achieve better portrait shots on the Pixel 3, Google says it turned to machine learning that allows improved depth estimation and produces even better results. Google says the machine learning algorithm is trained to understand that depth produced from stereo algorithms is only one of the many cues present in images. One such new cue it added this year is the comparison of out-of-focus images in the background with sharply focused images that bring the subject closer and Google calls it defocus depth cue.

It also added other cues including one called semantic cue, which counts the number of pixels in an image of a person’s face to understand how far away that person is from the camera. With the data and cues in place, Google used machine learning to create an algorithm that combines all this data for more accurate measurement of depth.

The most bonkers thing that Google did is build a custom “Frankenphone” rig to train the neural network. The rig consisted of five Pixel 3 phones along with a Wi-Fi based solution allowing it to simultaneously capture pictures from all of the phones. It also says that these pictures were captured within a tolerance of around two milliseconds. The result from the rig was used to compute high-quality depth from photos by using structure from motion and multi-view stereo.

Watch: Google Pixel 3 XL Hands-On

Google says five different viewpoints allowed to create parallax effect in five different directions and achieve more accurate depth information. While industry moved to as many as four cameras, Google is setting a strong example as in how data and computational photography can do all of the same with just one camera.