The Conference on Computer Vision and Pattern Recognition (CVPR) is a premier annual conference in its field. According to Wikipedia,
CVPR considers a wide range of topics related to computer vision and pattern recognition—basically any topic that is extracting structures or answers from images or video or applying mathematical methods to data to extract or recognize patterns. Common topics include object recognition, image segmentation, motion estimation, 3D reconstruction, and deep learning.
The 2020 Conference on CVPR was held on June 14 to 19. Because of the COVID-19, the CVPR conference went fully virtual this year. In this article, I will interpret two award-winning papers. Please note that I don’t have a background in computer vision, nor do I even have a background in computer science. So my interpretation may not be as insightful as ones from computer vision professionals. Feel free to let me know if I made any mistakes.
Best Paper Award
Unsupervised Learning of Probably Symmetric Deformable 3D Objects from Images in the Wild
The Core Idea of This Paper
Purpose
Reconstruction of the 3D model from 2D images usually needs external supervision and photos from multiple viewpoints. This paper aims to learn 3D deformable objects from one raw single-view image, without external supervision.
Method
A photo-geometric autoencoding system (shown below) is adopted. It is a network decomposing an image into components of depth, albedo, viewpoint, and lighting. Then the network reconstructs the 3D model from those components, projects the reconstructed 3D model on a plane to obtain the corresponding reconstructed 2D RGB image, and compares the reconstructed RGB image with the original input image. This process forms a closed loop. The training goal is to minimize the difference between the reconstructed image and the original image, and this goal can be achieved under unsupervised conditions.
However, this closed loop can lead to degraded image quality. The author adopts several measures, including ablation and confidence map, to maintain the image quality in the looping process.
Results
Compared with the fully supervised method and the general unsupervised baseline, the author’s method is significantly better than the baseline and is close to full supervision.
My Reflection on This Paper
What’s good about this paper
This method can recover accurately the 3D shape of human and cat faces from single-view images without any supervision or prior shape information.
What’s not so good about this paper
The reconstruction quality of the 3D model is lower in certain conditions, including extreme lighting, noisy texture, and extreme pose.
In addition, because the neural network takes advantage of the symmetry of object categories, this method is not suitable for asymmetric objects in future applications.
What are some possible business applications
Apps: Create your own Emoji (i.e. iPhone’s Animoji) by using a single photo
Device security: Face unlock in some cheap cellphones or other personal devices without a second front camera or other sensors
Best Student Paper Award
BSP-Net: Generating Compact Meshes via Binary Space Partitioning
The Core Idea of This Paper
Purpose
The author had some prior experience in the classical spatial data structure for computer graphics and thought it could be used for 3D learning to generate polygonal meshes for the shape of an object.
Method
The author introduced two concepts first. An Implicit field can be used to determine whether a point is inside or outside the shape. Binary space partitioning (BSP) is a method for subdividing a space into two convex sets by using partitions. The author developed a generative neural network named BSP-Net to form the polygonal meshes by combining every partition.
BSP-Net has 3 hidden layers. The first two hidden layers detect plane from the input, and the third finds convexes. In the output layer, shapes are produced by min-pooling. This network does not need supervision.
Results
The generated meshes are well suited to present the surface geometry of an object. The reconstruction quality by BSP-Net is competitive with other leading methods while using much fewer primitives.
My Reflection on This Paper
What’s good about this paper
BSP-Net is an unsupervised method. Compared with existing state-of-art methods, the number of polygons used by BSP-Net is significantly less. BSP-NET can synthesize the perfect and true 3D objects as much as possible.
What’s not so good about this paper
The time needed to train BSP-Net is quite long.
BSP-Net can only decompose an object into a collection of convexes. Although It works for concave objects, it only decomposes them into very small convexes.
What are some possible business applications
Home decoration: automatically produce a 3D sketch of furniture after its photo is taken during the shopping
Industrial design: automatically produce 3D meshes from the photo of a product, so that the meshes can be compared with the design model to spot defects
Hope you like my interpretation. All comments are welcome.
Image Attrition:
Featured Image from CVPR 2020 Official Site
Images in the text from the captioned papers