-
Notifications
You must be signed in to change notification settings - Fork 164
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RuntimeError: Cuda error: 700[cudaStreamSynchronize(stream);] when calling peeler.rasterize_next_layer()
#197
Comments
After further debugging, I identified the faulty input data that caused the CUDA error. Specifically, assuming mesh = network(input), I captured both the input causing the error and the network checkpoint saved closest to the error. Upon investigation, I found that the mesh had an extremely large number of vertices and faces—5 million vertices and 10 million faces. When debugging externally, I observed that nvdiffrast reported generating a 4GB buffer. Therefore, I suspect the issue might indeed be related to GPU memory. Could you suggest any strategies for handling scenarios where the vertex and face counts are exceptionally high? |
I created a synthetic merged mesh in a notebook by combining 5 meshes mentioned earlier. When I attempted to rasterize this merged mesh using nvdiffrast, I successfully reproduced the CUDA error. This confirms that the issue is indeed caused by the excessively large mesh, leading to GPU memory problems. Under normal network training conditions, such excessively large meshes wouldn't be generated, so this issue is likely more related to a bug in my network. I guess I need to focus more on network side. But would also be glad to see nvdiffrast handle extreme cases like this more gracefully (e.g., is there an example for just allocating a fixed size buffer at the beginning?). Anyway, thank you! |
Which of the buffers is the problem? The triangle/vertex buffers are reallocated to accommodate the incoming data if they're not large enough, so their size should always reflect the largest input seen thus far. The frame buffer is a bit different, as it's resized to accommodate the maximum over each dimension (width, height, minibatch) separately. The OpenGL/Cuda interop seems to run into problems when allocating and freeing buffers multiple times, leading to gradual accumulation of resource usage — not necessarily GPU memory per se — and an eventual crash. Presumably it is running out of some sort of driver-internal resource that isn't freed up until the process is terminated, so there isn't a lot that can be done on the application side except avoiding reallocations. To preallocate a buffer, all you need is to call the rasterizer once with the largest input you expect to encounter. The buffer sizes are never reduced, so this should remove the need to expand them later on. The buffers are local to the I would also suggest trying out the Cuda-based rasterizer (replace |
Thank you! It seems that I can now initialize a larger preallocated buffer. I have already been using |
Ah sorry, I didn't realize you were using the Cuda rasterizer already. Its memory usage is quite complicated and hard to predict, as it depends on how the triangles overlap with tiles and pixels on screen, how they clip against view frustum, and so on. The code detects cases where the internal buffers aren't large enough and resizes them automatically before retrying the operation in question (in function here), which also outputs the message about buffer resize. I'm guessing the large input leads to some internal indexing arithmetic overflowing, which could easily cause illegal memory accesses and Cuda error 700. The code wasn't designed to tolerate or even detect that situation, so in that sense this is a genuine bug/limitation, and for now the only workaround is to reduce the size of the input. That said, a |
Hi, I encountered the same CUDA out-of-memory (OOM) error in my project as well. In my code, I render a large mesh three times, and each time it generates numerous images without any backward operations. This error would occur for unknown reasons. My code: Or: |
Hi, I've encountered a strange bug when calling
peeler.rasterize_next_layer
. The code, which is part of a training script, is running in a multi-GPU server environment. Initially, everything was working fine, but as training progressed (around 3 hours), the error suddenly appeared. I looked into similar issues, and some suggest that the problem might be related to the progressively growing internal buffers.I added
dr.set_log_level(0)
to my code and observed that the internal buffer size gradually increased from 500MB to 1700MB (without triggering a CUDA error yet). I don't think it's a GPU memory issue, as the network itself uses around 60GB of memory, leaving up to 20GB available for nvdiffrast on a 80GB H100.I also doubt it's related to invalid data, as I tried some test cases in a notebook, like zero-length vertices and data containing nan or inf, but none of these caused the error. I'm currently really puzzled as to what could be causing this issue and would appreciate any insights. Thanks in advance!
Following is the full log:
The text was updated successfully, but these errors were encountered: