Notes on NU-MCC: Multiview Compressive Coding with Neighborhood Decoder and Repulsive UDF

Link to paper: https://arxiv.org/abs/2307.09112

Paper published on: 2023-07-18

Paper's authors: Stefan Lionar, Xiangyu Xu, Min Lin, Gim Hee Lee

GPT3 API Cost: $0.03

GPT4 API Cost: $0.10

Total Cost To Write This: $0.13

Time Savings: 13:1

The ELI5 TLDR:

NU-MCC is a new method for creating 3D reconstructions from pictures. It improves on the current best method, MCC, by being faster and more detailed. It does this by using a neighborhood decoder that focuses on important points and a repulsive function that makes the reconstructions more accurate. The method also uses an anchor predictor that learns from its mistakes and gets better over time. It combines different features and uses a new potential field to make the reconstructions look more realistic. The model was trained on a large dataset and achieved similar results to MCC but was 15 times faster. Overall, NU-MCC is a powerful tool for creating 3D reconstructions and shows that even the best methods can be improved.

The Deeper Dive:

A Novel Approach to 3D Reconstruction: Unpacking NU-MCC

In the realm of 3D reconstruction from single-view RGB-D inputs, the current state-of-the-art method is MCC, a model that combines vision Transformers with large-scale training. While effective, MCC has its limitations, particularly in terms of efficiency and the ability to recover high-fidelity details. This is where the newly proposed approach, NU-MCC, steps in.

Overcoming the Limitations of MCC with NU-MCC

NU-MCC addresses the limitations of MCC with a Neighborhood decoder and a Repulsive Unsigned Distance Function (Repulsive UDF). These two innovations allow for faster inference speed and improved recovery of 3D textures, overcoming the inefficiency of the Transformer decoder in MCC.

The Neighborhood decoder allows each query point to only attend to a small neighborhood, which results in faster inference speed and better recovery of 3D textures. In essence, it's a more focused approach that doesn't get bogged down in unnecessary data points.

The Repulsive UDF, on the other hand, is an alternative to the occupancy field used in MCC. It improves the quality of 3D object reconstruction by achieving more complete surface reconstruction. Essentially, it's a way of ensuring that the reconstructed 3D object is as detailed and accurate as possible.

The Power of the Anchor Predictor

The research introduces a Transformer-based anchor predictor for object-level single-view reconstruction. This anchor predictor is initialized with learnable positional embeddings and incorporates a global token. It is computationally efficient and can handle a reasonable number of anchors, which are directly supervised during training to ensure meaningful representations.

The anchor features themselves are not directly supervised but are trained with RGB and UDF prediction loss. This means that the anchor predictor is flexible and adaptable, capable of learning from its mistakes and improving over time.

Feature Aggregation and Fine Features

The research proposes a feature aggregation process that aggregates anchor features in the neighborhood of a query point. This process uses a linear projection layer and aggregation weights based on displacement, features, and a global token.

The research also incorporates fine-scale information from the input by constructing fine features and integrating them with the anchor features. This allows the Neighborhood decoder to provide flexibility in controlling the reconstruction quality by adjusting the resolution of fine input features and the number of features during training.

Introducing the Repulsive UDF

The research introduces the Repulsive UDF to encourage a more uniform distribution of query points on the reconstructed surface. The Repulsive UDF uses a new potential field with repulsive force between query points and selects k-nearest points for the gradient field.

The optimization details include supervision of anchor locations, UDF predictions, and colors using various loss functions. This ensures that the model is constantly improving and refining its predictions, leading to more accurate and detailed 3D reconstructions.

Training and Evaluation of NU-MCC

The model is trained on the CO3D-v2 dataset using an effective batch size of 512 and 4 NVIDIA A100 GPUs for 100 epochs. The training follows the optimizer and 3D data augmentation of MCC, using the Adam optimizer with a base learning rate of 10^-4 and cosine schedule.

The model employs 200 coarse anchor representations and uses the L1-RGB distance metric to evaluate colors. Results on the CO3D-v2 validation set show that NU-MCC achieves comparable metrics to MCC but with a 15x inference speedup.

Results and Implications

The incorporation of fine features in the feature aggregation process improves color accuracy and geometry. The use of the repulsive field in the UDF improves geometry and color by mitigating hole artifacts and creating surfaces with uniformly distributed points.

Qualitative comparisons with MCC show that NU-MCC captures higher details on the seen part and predicts occlusion more accurately. NU-MCC demonstrates zero-shot generalization in the wild, producing faithful reconstructions of RGB-D iPhone capture, AI generated images, and ImageNet images.

However, the model's performance degrades when there are significant outliers and noise in the input depth. Despite this, NU-MCC is capable of reconstructing challenging object classes such as lawnmower in ImageNet examples.

In summary, NU-MCC is a powerful new tool for 3D reconstruction, overcoming the limitations of MCC and offering faster, more accurate results. It's a testament to the power of innovation in the field of AI and machine learning, demonstrating that even state-of-the-art methods can be improved upon with the right approach.

Notes on NU-MCC: Multiview Compressive Coding with Neighborhood Decoder and Repulsive UDF

The ELI5 TLDR:

The Deeper Dive:

A Novel Approach to 3D Reconstruction: Unpacking NU-MCC