Merge branch 'main' into glomap-only

nerfstudio-project · Jan 9, 2025 · 737fb46 · 737fb46
2 parents f496734 + 5be513e
commit 737fb46
Show file tree

Hide file tree

Showing 15 changed files with 333 additions and 46 deletions.
diff --git a/docs/index.md b/docs/index.md
@@ -153,6 +153,7 @@ This documentation is organized into 3 parts:
 - [SIGNeRF](nerfology/methods/signerf.md): Controlled Generative Editing of NeRF Scenes
 - [K-Planes](nerfology/methods/kplanes.md): Unified 3D and 4D Radiance Fields
 - [LERF](nerfology/methods/lerf.md): Language Embedded Radiance Fields
+- [Feature Splatting](nerfology/methods/feature_splatting.md): Gaussian Feature Splatting based on GSplats
 - [Nerfbusters](nerfology/methods/nerfbusters.md): Removing Ghostly Artifacts from Casually Captured NeRFs
 - [NeRFPlayer](nerfology/methods/nerfplayer.md): 4D Radiance Fields by Streaming Feature Channels
 - [Tetra-NeRF](nerfology/methods/tetranerf.md): Representing Neural Radiance Fields Using Tetrahedra

diff --git a/docs/nerfology/methods/feature_splatting.md b/docs/nerfology/methods/feature_splatting.md
@@ -0,0 +1,87 @@
+# Feature Splatting
+
+<h4>Feature Splatting</h4>
+
+```{button-link} https://feature-splatting.github.io/
+:color: primary
+:outline:
+Paper Website
+```
+
+```{button-link} https://github.com/vuer-ai/feature-splatting/
+:color: primary
+:outline:
+Code
+```
+
+<video id="teaser" muted autoplay playsinline loop controls width="100%">
+    <source id="mp4" src="https://feature-splatting.github.io/resources/basic_ns_demo_feature_only.mp4" type="video/mp4">
+</video>
+
+**Feature Splatting distills SAM-enhanced CLIP features into 3DGS for segmentation and editing**
+
+## Installation
+
+First install nerfstudio dependencies. Then run:
+
+```bash
+pip install git+https://github.com/vuer-ai/feature-splatting
+```
+
+## Running Feature Splatting
+
+Details for running Feature Splatting (built with Nerfstudio!) can be found [here](https://github.com/vuer-ai/feature-splatting).
+Once installed, run:
+
+```bash
+ns-train feature-splatting --help
+```
+
+Currently, we provide the following variants:
+
+| Method              | Description                                                     | Memory | Quality |
+| -----------         | -----------------------------------------------                 | ------ | ------- |
+| `feature-splatting` | Feature Splatting with MaskCLIP ViT-L/14@336px and MobileSAMv2  | ~8 GB  | Good    |
+
+Note that the reference features used in this version are different from the version used in the paper in two ways
+
+- The SAM-enhanced CLIP features are computed using MobileSAMv2, which is much faster than original SAM but slightly less accurate.
+- The CLIP features are computed only on the image-level.
+
+## Method
+
+Feature splatting distills CLIP features into 3DGS by view-independent rasterization, which allows open-vocabulary 2D segmentation and open-vocabulary 3D segmentation of Gaussians directly in the 3D space. This implementation supports simple editing applications by directly manipulating Gaussians.
+
+### Reference feature computation and joint supervision
+
+Feature splatting computes high-quality SAM-enhanced CLIP features as reference features. Compared to coarse CLIP features (such as those used in LERF), Feature splatting performs an object-level masked average pooling of the features to refine the boundary of objects. While the original ECCV'24 paper uses SAM for part-level masks, this implementation uses MobileSAMv2 for much faster reference features computation, which we hope would encourage downstream applications that require real-time performance.
+
+In addition to SAM-enhanced features, we also found that using DINOv2 features as a joint supervision helps regularize the internal structure of objects, which is similar to findings in existing work.
+
+### Scene Editing
+
+Thanks to the explicit representation of 3DGS, grouped Gaussians can be easily manipulated. While the original ECCV'24 paper proposes a series of editing primitives, to avoid introducing excessive dependencies or hacks, we support a subset of editing primitives in this implementation:
+
+Rigid operations
+- Floor estimation (for intuitive rotation and gravity estimation)
+- Translation
+- Transparent (highlights segmented object and turns background Gaussians transparent)
+- Rotation (yaw only w.r.t. estimated ground)
+
+Non-rigid operations
+- Sand-like melting (based on Taichi MPM method)
+
+<video id="teaser" muted autoplay playsinline loop controls width="100%">
+    <source id="mp4" src="https://feature-splatting.github.io/resources/ns_editing_compressed.mp4" type="video/mp4">
+</video>
+
+If you find our work helpful for your research, please consider citing
+
+```none
+@inproceedings{qiu-2024-featuresplatting,
+    title={Language-Driven Physics-Based Scene Synthesis and Editing via Feature Splatting},
+    author={Ri-Zhao Qiu and Ge Yang and Weijia Zeng and Xiaolong Wang},
+    booktitle={European Conference on Computer Vision (ECCV)},
+    year={2024}
+}
+```
diff --git a/docs/nerfology/methods/index.md b/docs/nerfology/methods/index.md
@@ -34,6 +34,7 @@ The following methods are supported in nerfstudio:
     SIGNeRF<signerf.md>
     K-Planes<kplanes.md>
     LERF<lerf.md>
+    Feature-Splatting<feature_splatting.md>
     Mip-NeRF<mipnerf.md>
     NeRF<nerf.md>
     Nerfacto<nerfacto.md>

diff --git a/docs/quickstart/custom_dataset.md b/docs/quickstart/custom_dataset.md
@@ -268,6 +268,31 @@ ns-process-data record3d --data {data directory} --output-dir {output directory}
 ns-train nerfacto --data {output directory}
 ```
 
+### Adding a Point Cloud
+
+Adding a point cloud is useful for avoiding random initialization when training gaussian splats. To add a point cloud using Record3D follow these steps:
+
+1. Export a Zipped sequence of PLY point clouds from Record3D.
+
+<img src="imgs/record_3d_video_example.png" width=150>
+<img src="imgs/record_3d_export_button.png" width=150>
+<img src="imgs/record_3d_ply_selection.png" width=150>
+
+
+2. Move the exported zip file to your computer from your iPhone.
+
+
+3. Unzip the file and move all extracted `.ply` files to a directory.
+
+
+4. Convert the data to nerfstudio format with the `--ply` flag and the directory from step 3.
+
+```bash
+ns-process-data record3d --data {data directory} --ply {ply directory} --output-dir {output directory}
+```
+
+Additionally you can specify `--voxel-size {float}` which determines the level of sparsity when downsampling from the dense point clouds generated by Record3D to the sparse point cloud used in Nerfstudio. The default value is 0.8, lower is less sparse, higher is more sparse.
+
 (spectacularai)=
 
 ## Spectacular AI
@@ -292,13 +317,13 @@ pip install spectacularAI[full]
 2. Install FFmpeg. Linux: `apt install ffmpeg` (or similar, if using another package manager). Windows: [see here](https://www.editframe.com/guides/how-to-install-and-start-using-ffmpeg-in-under-10-minutes). FFmpeg must be in your `PATH` so that `ffmpeg` works on the command line.
 
 3. Data capture. See [here for specific instructions for each supported device](https://github.com/SpectacularAI/sdk-examples/tree/main/python/mapping#recording-data).
-  
+
 4. Process and export. Once you have recorded a dataset in Spectacular AI format and have it stored in `{data directory}` it can be converted into a Nerfstudio supported format with:
 
 ```bash
 sai-cli process {data directory} --preview3d --key_frame_distance=0.05 {output directory}
 ```
-The optional `--preview3d` flag shows a 3D preview of the point cloud and estimated trajectory live while VISLAM is running. The `--key_frame_distance` argument can be tuned based on the recorded scene size: 0.05 (5cm) is good for small scans and 0.15 for room-sized scans. If the processing gets slow, you can also try adding a --fast flag to `sai-cli process` to trade off quality for speed. 
+The optional `--preview3d` flag shows a 3D preview of the point cloud and estimated trajectory live while VISLAM is running. The `--key_frame_distance` argument can be tuned based on the recorded scene size: 0.05 (5cm) is good for small scans and 0.15 for room-sized scans. If the processing gets slow, you can also try adding a --fast flag to `sai-cli process` to trade off quality for speed.
 
 5. Train. No separate `ns-process-data` step is needed. The data in `{output directory}` can now be trained with Nerfstudio:
 
@@ -453,7 +478,7 @@ If cropping only needs to be done from the bottom, you can use the `--crop-botto
 
 ## 🥽 Render VR Video
 
-Stereo equirectangular rendering for VR video is supported as VR180 and omni-directional stereo (360 VR) Nerfstudio camera types for video and image rendering. 
+Stereo equirectangular rendering for VR video is supported as VR180 and omni-directional stereo (360 VR) Nerfstudio camera types for video and image rendering.
 
 ### Omni-directional Stereo (360 VR)
 This outputs two equirectangular renders vertically stacked, one for each eye. Omni-directional stereo (ODS) is a method to render VR 3D 360 videos, and may introduce slight depth distortions for close objects. For additional information on how ODS works, refer to this [writeup](https://developers.google.com/vr/jump/rendering-ods-content.pdf).
@@ -464,7 +489,7 @@ This outputs two equirectangular renders vertically stacked, one for each eye. O
 
 
 ### VR180
-This outputs two 180 deg equirectangular renders horizontally stacked, one for each eye. VR180 is a video format for VR 3D 180 videos. Unlike in omnidirectional stereo, VR180 content only displays front facing content. 
+This outputs two 180 deg equirectangular renders horizontally stacked, one for each eye. VR180 is a video format for VR 3D 180 videos. Unlike in omnidirectional stereo, VR180 content only displays front facing content.
 
 <center>
 <img img width="375" src="https://github-production-user-asset-6210df.s3.amazonaws.com/9502341/255379444-b90f5b3c-5021-4659-8732-17725669914e.jpeg">
@@ -524,4 +549,4 @@ If the depth of the scene is unviewable and looks too close or expanded when vie
  - The IPD can be modified in the `cameras.py` script as the variable `vr_ipd` (default is 64 mm).
  - Compositing with Blender Objects and VR180 or ODS Renders
    - Configure the Blender camera as panoramic and equirectangular. For the VR180 Blender camera, set the panoramic longitude min and max to -90 and 90.
-   - Change the Stereoscopy mode to "Parallel" set the Interocular Distance to 0.064 m. 
+   - Change the Stereoscopy mode to "Parallel" set the Interocular Distance to 0.064 m.
diff --git a/docs/quickstart/imgs/record_3d_export_button.png b/docs/quickstart/imgs/record_3d_export_button.png
diff --git a/docs/quickstart/imgs/record_3d_ply_selection.png b/docs/quickstart/imgs/record_3d_ply_selection.png
diff --git a/docs/quickstart/imgs/record_3d_video_example.png b/docs/quickstart/imgs/record_3d_video_example.png
diff --git a/nerfstudio/configs/external_methods.py b/nerfstudio/configs/external_methods.py
@@ -93,6 +93,21 @@ class ExternalMethod:
     )
 )
 
+# Feature Splatting
+external_methods.append(
+    ExternalMethod(
+        """[bold yellow]Feature-Splatting[/bold yellow]
+For more information visit: https://docs.nerf.studio/nerfology/methods/feature_splatting.html
+
+To enable Feature Splatting, you must install it first by running:
+  [grey]pip install git+https://github.com/vuer-ai/feature-splatting[/grey]""",
+        configurations=[
+            ("feature-splatting", "Feature Splatting with MaskCLIP ViT-L/14@336px, DINOv2 ViT-S/14, and MobileSAMv2"),
+        ],
+        pip_package="git+https://github.com/vuer-ai/feature-splatting",
+    )
+)
+
 # Tetra-NeRF
 external_methods.append(
     ExternalMethod(
@@ -213,7 +228,7 @@ class ExternalMethod:
 For more information visit https://docs.nerf.studio/nerfology/methods/zipnerf.html
 
 To enable Zip-NeRF, you must install it first by running:
-  [grey]pip install git+https://github.com/SuLvXiangXin/zipnerf-pytorch#subdirectory=extensions/cuda 
+  [grey]pip install git+https://github.com/SuLvXiangXin/zipnerf-pytorch#subdirectory=extensions/cuda
   and pip install git+https://github.com/SuLvXiangXin/zipnerf-pytorch[/grey]""",
         configurations=[
             ("zipnerf", "A pytorch implementation of 'Zip-NeRF: Anti-Aliased Grid-Based Neural Radiance Fields'")

diff --git a/nerfstudio/configs/method_configs.py b/nerfstudio/configs/method_configs.py
@@ -696,7 +696,68 @@
             ),
         },
         "bilateral_grid": {
-            "optimizer": AdamOptimizerConfig(lr=5e-3, eps=1e-15),
+            "optimizer": AdamOptimizerConfig(lr=2e-3, eps=1e-15),
+            "scheduler": ExponentialDecaySchedulerConfig(
+                lr_final=1e-4, max_steps=30000, warmup_steps=1000, lr_pre_warmup=0
+            ),
+        },
+    },
+    viewer=ViewerConfig(num_rays_per_chunk=1 << 15),
+    vis="viewer",
+)
+
+method_configs["splatfacto-mcmc"] = TrainerConfig(
+    method_name="splatfacto",
+    steps_per_eval_image=100,
+    steps_per_eval_batch=0,
+    steps_per_save=2000,
+    steps_per_eval_all_images=1000,
+    max_num_iterations=30000,
+    mixed_precision=False,
+    pipeline=VanillaPipelineConfig(
+        datamanager=FullImageDatamanagerConfig(
+            dataparser=NerfstudioDataParserConfig(load_3D_points=True),
+            cache_images_type="uint8",
+        ),
+        model=SplatfactoModelConfig(
+            strategy="mcmc",
+            cull_alpha_thresh=0.005,
+            stop_split_at=25000,
+        ),
+    ),
+    optimizers={
+        "means": {
+            "optimizer": AdamOptimizerConfig(lr=1.6e-4, eps=1e-15),
+            "scheduler": ExponentialDecaySchedulerConfig(
+                lr_final=1.6e-6,
+                max_steps=30000,
+            ),
+        },
+        "features_dc": {
+            "optimizer": AdamOptimizerConfig(lr=0.0025, eps=1e-15),
+            "scheduler": None,
+        },
+        "features_rest": {
+            "optimizer": AdamOptimizerConfig(lr=0.0025 / 20, eps=1e-15),
+            "scheduler": None,
+        },
+        "opacities": {
+            "optimizer": AdamOptimizerConfig(lr=0.05, eps=1e-15),
+            "scheduler": None,
+        },
+        "scales": {
+            "optimizer": AdamOptimizerConfig(lr=0.005, eps=1e-15),
+            "scheduler": None,
+        },
+        "quats": {"optimizer": AdamOptimizerConfig(lr=0.001, eps=1e-15), "scheduler": None},
+        "camera_opt": {
+            "optimizer": AdamOptimizerConfig(lr=1e-4, eps=1e-15),
+            "scheduler": ExponentialDecaySchedulerConfig(
+                lr_final=5e-7, max_steps=30000, warmup_steps=1000, lr_pre_warmup=0
+            ),
+        },
+        "bilateral_grid": {
+            "optimizer": AdamOptimizerConfig(lr=2e-3, eps=1e-15),
             "scheduler": ExponentialDecaySchedulerConfig(
                 lr_final=1e-4, max_steps=30000, warmup_steps=1000, lr_pre_warmup=0
             ),