-
If there isn't a colab demo already, will send a PR. Please let me know if there will be any OOM issues or any technical issues that I many face. |
Beta Was this translation helpful? Give feedback.
Replies: 22 comments 12 replies
-
No, we haven't looked into colab at all, actually. It would be great to have, thank you! The CPU memory usage is relatively tame. GPU memory usage is on the order of the size of the dataset (plus a few GB for temporary training, inference, and render buffers). Unfortunately, the codebase is not particularly optimized for being memory-economical. We've been spoiled by the 3090's 24GB. Example GPU RAM usages:
|
Beta Was this translation helpful? Give feedback.
-
with Colab, you might need to get lucky and get a V100 to get anywhere (might be Colab Pro only?)... the P100s and K80s don't have Tensor cores, and somebody else found you can't seem to build tiny-cuda-nn with Pascal or Maxwell: NVlabs/tiny-cuda-nn#10 Tensor cores were introduced in Volta? So you'd need a V100, Titan V, or RTX 20xx or better to try this project. What would be really cool is if tiny-cuda-nn and/or this project could provide a fused ops / network that does not require tensor cores and can work for the older GPU architectures-- it will be slower but would still probably be faster than altneratives (pytorch / tensorflow etc). TensorRT has fused ops for the older architectures and these might provide easy drop-ins (at least, likely for inference). |
Beta Was this translation helpful? Give feedback.
-
It should be possible to run on Colab now that lower compute capabilities are allowed, but I'm stuck at compilation with the following error: [ 98%] Linking CXX executable testbed
/usr/lib/gcc/x86_64-linux-gnu/7/../../../x86_64-linux-gnu/libGL.so: undefined reference to `_glapi_tls_Current'
collect2: error: ld returned 1 exit status
CMakeFiles/testbed.dir/build.make:115: recipe for target 'testbed' failed
make[2]: *** [testbed] Error 1
CMakeFiles/Makefile2:199: recipe for target 'CMakeFiles/testbed.dir/all' failed
make[1]: *** [CMakeFiles/testbed.dir/all] Error 2
make[1]: *** Waiting for unfinished jobs....
[100%] Linking CXX shared library pyngp.cpython-37m-x86_64-linux-gnu.so
[100%] Built target pyngp
Makefile:90: recipe for target 'all' failed
make: *** [all] Error 2 Here is a link for reproducing it. |
Beta Was this translation helpful? Give feedback.
-
Progress! Thanks for reporting!
Edit: you can now run |
Beta Was this translation helpful? Give feedback.
-
FWIW, at least EGL works in Colab, see e.g. the pyrender demo notebook: https://colab.research.google.com/drive/1pcndwqeY8vker3bLKQNJKr3B-7-SYenE?usp=sharing There's no X11 though. It would be pretty nice to have imgui over websocket for Colab / Jupyter (e.g. via https://github.com/ggerganov/imgui-ws -- see in-browser demos ) but I don't see anybody has tried that yet. |
Beta Was this translation helpful? Give feedback.
-
If someone with access to a K80 machine could check whether it runs now, that'd be appreciated. :) |
Beta Was this translation helpful? Give feedback.
-
🔥 🔥 🔥 Thanks @Tom94 !! 🔥 🔥 🔥 I had to remove Overall: The K80 is about 60x slower than a 30-series GPU, but also about 60x cheaper at the time of writing (YMMV but check ebay). I've seen 100x slowdown for pytorch stuff so 60x is pretty good. (What about K40? Note that K40 seems to be compute 35, while K80 is compute 37. A K80 is basically two K40s on the same card. At the time of writing, an AWS p2.xlarge with a single K80 (two separate devices, 11gb memory each) is ~$0.90/hr or $0.25/hr spot price. In Google Colab free version or Kaggle, you're likely to get a K80 or slightly better). Other than that one training change, here's what I see for Nerf lego train out of the box on a K80 :
Final train time (as reported) was 05:54 with nvidia-smi during training:
So the K80 is about 60x slower than a 30-series GPU (6 seconds -> 360 seconds). In my experience, pytorch stuff (high i/o) is a 50x-100x lag, so this is pretty nice! Clearly the implementation helps a ton. Once the model finishes training, I do get an OOM when rendering tries to start: For rendering, I did this:
I see moderate GPU memory usage:
Rendering is about 11/sec per frame: Most importantly, the render looks good, no different for 1000 iters as other GPU: |
Beta Was this translation helpful? Give feedback.
-
Awesome, thank you so much for testing! You don’t actually need to delete transforms_val.json et al. You can directly pass a path to the training transforms to testbed — then it will train from just that one .json file rather than all it finds in the folder. In the above, I believe you ended up training using also the testing transforms, so there’s more memory to be saved by not loading their respective images. |
Beta Was this translation helpful? Give feedback.
-
@Tom94 oh my bad! I have not been able to use the GUI yet so I didn't know
That does save memory, and, erm, results in more correct training too :) 🎉 |
Beta Was this translation helpful? Give feedback.
-
Can confirm it works in Colab (link) (with a T4), only downside is it takes some 5-10 min of compile time given that Colab allocates only 2 CPUs. Maybe an approach could be copying the compiled folder to the user's GDrive, so it could be reusable in next runs, and avoid recompilation, hoping you get the same GPU in the Colab lottery. |
Beta Was this translation helpful? Give feedback.
-
the repo builds & works in docker with 5-10 mins isn't that bad tho, there are many Colab notebooks like Nerfies ( https://colab.research.google.com/github/google/nerfies/blob/main/notebooks/Nerfies_Capture_Processing.ipynb ) that can take 30 mins or more to set up or run. Huggingface spaces wouldn't offer the notebook environment, but since this project has its own nice GUI, it might be a better match https://huggingface.co/spaces/launch |
Beta Was this translation helpful? Give feedback.
-
met exact issue in colab |
Beta Was this translation helpful? Give feedback.
-
Hi there, you can avoid this error by compiling testbed without GUI support cmake -DNGP_BUILD_WITH_GUI=OFF <remaining params> This way, it won't try to link to OpenGL, which you presumably don't need when running in colab. (You can still render out images as numpy arrays.) |
Beta Was this translation helpful? Give feedback.
-
how to see the rendering in colab? |
Beta Was this translation helpful? Give feedback.
-
It's gonna be really hard to do that :( There might be a path thru websockets (e.g. https://github.com/ggerganov/imgui-ws ) or perhaps some way of standing up an X server / VNC on colab. The GUI is pretty killer though, it could be worth the hassle. |
Beta Was this translation helpful? Give feedback.
-
If rendering only a single image (or a handful) is desired, you can call
Note that the returned colors will be sRGB if |
Beta Was this translation helpful? Give feedback.
-
Beta Was this translation helpful? Give feedback.
-
You'll have to first instantiate a testbed object and train it (or load a snapshot) before rendering makes sense. I recommend consulting |
Beta Was this translation helpful? Give feedback.
-
@myagues can you add a feature to stop at say 20k iterations and automatically save tghe .msgpack file? |
Beta Was this translation helpful? Give feedback.
-
I can train nicely on colab (Linux machine). Because i want to use the gui locally with the snapshot i downloaded *.msgpack to my windows build but the snapshot doesnt seem compatible. Need to go through the code to understand that rendering required the width flag, would be nice in the help. 08:42:31 INFO Loading network config from: d:\pworkspace\seeingSpace\nerfs\candidates\selected__\instant-ngp\data\nerf\fox\chair\chair\chairabb8.msgpack |
Beta Was this translation helpful? Give feedback.
-
is it possible to train the dataset and make those awesome fly-by videos right on the collab? if am not wrong collab cant support GUI right. |
Beta Was this translation helpful? Give feedback.
-
what about completely headless, i.e : upload .mp4 to colab , convert to images , get annotations using colmap , get .msgpack file , and then finally render it out in the browser using MobileNeRF? |
Beta Was this translation helpful? Give feedback.
Can confirm it works in Colab (link) (with a T4), only downside is it takes some 5-10 min of compile time given that Colab allocates only 2 CPUs.
Maybe an approach could be copying the compiled folder to the user's GDrive, so it could be reusable in next runs, and avoid recompilation, hoping you get the same GPU in the Colab lottery.