You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I recently did some profiling of GPU profiling, and I found that creating Cuda contexts is quite expensive because it because it requires device synchronization:
I am actually not even sure if this is correct code because underlying this quote it may create a CUDA context and it is illegal to access memory across CUDA contexts. Somehow the code still works, though.
For new versions of FFMPEG, you can pass in the flag to this creation so it reuses the CUDA context, and therefore memory can be passed to pytorch code (which initializes its own CUDA context).
But for older versions of FFMPEG like 4.1 we should reuse the CUDA context if possible.
The text was updated successfully, but these errors were encountered:
I recently did some profiling of GPU profiling, and I found that creating Cuda contexts is quite expensive because it because it requires device synchronization:
torchcodec/src/torchcodec/decoders/_core/VideoDecoder.cpp
Line 117 in f4065f1
trace_rank_0.json
I am actually not even sure if this is correct code because underlying this quote it may create a CUDA context and it is illegal to access memory across CUDA contexts. Somehow the code still works, though.
For new versions of FFMPEG, you can pass in the flag to this creation so it reuses the CUDA context, and therefore memory can be passed to pytorch code (which initializes its own CUDA context).
But for older versions of FFMPEG like 4.1 we should reuse the CUDA context if possible.
The text was updated successfully, but these errors were encountered: