You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I clear the asset cache and then try to load the big scene. The model is being processed, and then served. This is where BLAS is constructed. It schedules a bunch of transfers (for big meshes as well as textures that are loaded on the side), and then have this BLAS construction.all
This is a giant BLAS, and constructing it on GPU take some significant time. However, all the CPU threads are busy doing the texture compression of the assets that haven't been cached yet. So the AMD power management can't allocate enough power for the GPU operations. More to this, we are running on an integrated APU, which means the memory bandwidth is shared between the CPU and GPU operations. It's easy to starve this while heavy-loading assets on many threads.
This is also affected by: whether or not we run on battery, and what other kind of rendering is requested (UI may need some texture updates as well, and there are other apps like WezTerm consuming GPU). Result is - job gets too much time and is considered to be handing. Job is getting killed by the driver, I'm getting DEVICE_LOST. And all of the textures in process are dropped, meaning they will be converted again on the next run, repeating the cycle.
Workarounds
They might be a way to configure TDR? Probably locally only, which isn't going to help other users.
Mark texture loading to be dependent on the model being served. This would mean there are less (or no) things running during BLAS construction.
Detect if the system has an integrated GPU and limit the number of worker threads more, e.g. 1/2 instead of 2/3 of the logical cores.
Seeing them occasionally as
VK_ERROR_DEVICE_LOST
coming out ofvkQueueSubmit
.Symptoms:
The text was updated successfully, but these errors were encountered: