My Interpretation of CVPR 2020 Best Papers

3 minute read

The Conference on Computer Vision and Pattern Recognition (CVPR) is a premier annual conference in its field. According to Wikipedia,

CVPR considers a wide range of topics related to computer vision and pattern recognition—basically any topic that is extracting structures or answers from images or video or applying mathematical methods to data to extract or recognize patterns. Common topics include object recognition, image segmentation, motion estimation, 3D reconstruction, and deep learning.

The 2020 Conference on CVPR was held on June 14 to 19. Because of the COVID-19, the CVPR conference went fully virtual this year. In this article, I will interpret two award-winning papers. Please note that I don’t have a background in computer vision, nor do I even have a background in computer science. So my interpretation may not be as insightful as ones from computer vision professionals. Feel free to let me know if I made any mistakes.

Table of Contents

Best Paper Award

Unsupervised Learning of Probably Symmetric Deformable 3D Objects from Images in the Wild

The Core Idea of This Paper

Purpose

Reconstruction of the 3D model from 2D images usually needs external supervision and photos from multiple viewpoints. This paper aims to learn 3D deformable objects from one raw single-view image, without external supervision.

Method

A photo-geometric autoencoding system (shown below) is adopted. It is a network decomposing an image into components of depth, albedo, viewpoint, and lighting. Then the network reconstructs the 3D model from those components, projects the reconstructed 3D model on a plane to obtain the corresponding reconstructed 2D RGB image, and compares the reconstructed RGB image with the original input image. This process forms a closed loop. The training goal is to minimize the difference between the reconstructed image and the original image, and this goal can be achieved under unsupervised conditions.

However, this closed loop can lead to degraded image quality. The author adopts several measures, including ablation and confidence map, to maintain the image quality in the looping process.

Results

Compared with the fully supervised method and the general unsupervised baseline, the author’s method is significantly better than the baseline and is close to full supervision.

My Reflection on This Paper

What’s good about this paper

This method can recover accurately the 3D shape of human and cat faces from single-view images without any supervision or prior shape information.

*Examples of 3D reconstruction in the paper*

What’s not so good about this paper

The reconstruction quality of the 3D model is lower in certain conditions, including extreme lighting, noisy texture, and extreme pose.

In addition, because the neural network takes advantage of the symmetry of object categories, this method is not suitable for asymmetric objects in future applications.

What are some possible business applications

Apps: Create your own Emoji (i.e. iPhone’s Animoji) by using a single photo

Device security: Face unlock in some cheap cellphones or other personal devices without a second front camera or other sensors

Best Student Paper Award

BSP-Net: Generating Compact Meshes via Binary Space Partitioning

The Core Idea of This Paper

Purpose

The author had some prior experience in the classical spatial data structure for computer graphics and thought it could be used for 3D learning to generate polygonal meshes for the shape of an object.

Method

The author introduced two concepts first. An Implicit field can be used to determine whether a point is inside or outside the shape. Binary space partitioning (BSP) is a method for subdividing a space into two convex sets by using partitions. The author developed a generative neural network named BSP-Net to form the polygonal meshes by combining every partition.

BSP-Net has 3 hidden layers. The first two hidden layers detect plane from the input, and the third finds convexes. In the output layer, shapes are produced by min-pooling. This network does not need supervision.

Results

The generated meshes are well suited to present the surface geometry of an object. The reconstruction quality by BSP-Net is competitive with other leading methods while using much fewer primitives.

My Reflection on This Paper

What’s good about this paper

BSP-Net is an unsupervised method. Compared with existing state-of-art methods, the number of polygons used by BSP-Net is significantly less. BSP-NET can synthesize the perfect and true 3D objects as much as possible.

What’s not so good about this paper

The time needed to train BSP-Net is quite long.

BSP-Net can only decompose an object into a collection of convexes. Although It works for concave objects, it only decomposes them into very small convexes.

What are some possible business applications

Home decoration: automatically produce a 3D sketch of furniture after its photo is taken during the shopping

Industrial design: automatically produce 3D meshes from the photo of a product, so that the meshes can be compared with the design model to spot defects

Hope you like my interpretation. All comments are welcome.

Image Attrition:

Featured Image from CVPR 2020 Official Site

Images in the text from the captioned papers

My Interpretation of CVPR 2020 Best Papers

Best Paper Award

The Core Idea of This Paper

Purpose

Method

Results

My Reflection on This Paper

What’s good about this paper

What’s not so good about this paper

What are some possible business applications

Best Student Paper Award

The Core Idea of This Paper

Purpose

Method

Results

My Reflection on This Paper

What’s good about this paper

What’s not so good about this paper

What are some possible business applications

Share this: