Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Arcanization of wgpu core resources #3626

Merged
merged 204 commits into from
Nov 20, 2023
Merged

Arcanization of wgpu core resources #3626

merged 204 commits into from
Nov 20, 2023

Conversation

gents83
Copy link
Contributor

@gents83 gents83 commented Mar 30, 2023

Connections

Closes #2710
Closes #4139
Closes #3193
Closes #4686

Steps done:

  • Removed Token and LifeTime related management
  • Removed RefCount and MultiRefCount in favour of using only Arc internal reference count
  • Removing mut from resources and added instead internal members locks on demand or atomics operations
  • Resources now implement Drop and destroy stuff when last Arc resources is released
  • Resources hold an Arc in order to be able to implement Drop
  • Resources have an utility to retrieve the id of the resource itself
  • Remove all guards and just retrive the Arc needed on-demand to unlock registry of resources asap
  • Verify correct resources release when unused or not needed
  • Check Web and Metal compliation (thanks to @niklaskorz)
  • Fix tests on all platforms
  • Test a multithreaded scenario
  • Storage is now holding only user-land resources, but Arc is keeping refcount for resources
  • When user unregister a resource, it's not dropped if still in use due to refcount inside wgpu
  • IdentityManager is now unique and free is called on resource drop instead of storage unregister
  • Identity changes due to Arcanization and Registry being just the user reference
  • Added MemLeaks test and fixing mem leaks

STATUS:

  • Ready to be REVIEWED and INTEGRATED 👍

@gents83 gents83 requested a review from crowlKats as a code owner April 1, 2023 16:09
@Elabajaba
Copy link
Contributor

Not sure if you're aware, but I believe staging buffers aren't being released currently (run bunnymark, spawn some bunnies and watch the GPU's shared memory usage skyrocket (on Vulkan it spams Memory block wasn't deallocated in the terminal, on DX12 it eventually crashes)).

@gents83
Copy link
Contributor Author

gents83 commented Apr 2, 2023

@Elabajaba it should be fixed now :)

@gents83 gents83 marked this pull request as draft April 4, 2023 17:32
niklaskorz added a commit to niklaskorz/linon that referenced this pull request Apr 7, 2023
@niklaskorz
Copy link
Contributor

niklaskorz commented Apr 7, 2023

I've fixed the compiler errors for the Metal backend (should I open a pull request on your branch @gents83?) but there's a final issue to solve: when shutting down a wgpu application using the Metal backend, the following errors occur:

[2023-04-07T09:42:44Z ERROR wgpu_hal::metal::device] No active command buffers for fence value 294
thread 'main' panicked at 'Error in Device::drop: Validation Error

Caused by:
    Parent device is lost
', wgpu/src/backend/direct.rs:1411:34
stack backtrace:
   0: rust_begin_unwind
             at /rustc/a55dd71d5fb0ec5a6a3a9e8c27b2127ba491ce52/library/std/src/panicking.rs:584:5
   1: core::panicking::panic_fmt
             at /rustc/a55dd71d5fb0ec5a6a3a9e8c27b2127ba491ce52/library/core/src/panicking.rs:142:14
   2: wgpu::backend::direct::Context::handle_error_fatal
             at ./src/backend/direct.rs:312:9
   3: <wgpu::backend::direct::Context as wgpu::context::Context>::device_drop
             at ./src/backend/direct.rs:1411:29
   4: <T as wgpu::context::DynContext>::device_drop
             at ./src/context.rs:2285:9
   5: <wgpu::Device as core::ops::drop::Drop>::drop
             at ./src/lib.rs:2290:13
   6: core::ptr::drop_in_place<wgpu::Device>
             at /rustc/a55dd71d5fb0ec5a6a3a9e8c27b2127ba491ce52/library/core/src/ptr/mod.rs:487:1
   7: core::ptr::drop_in_place<water::framework::start<water::Example>::{{closure}}>
             at /rustc/a55dd71d5fb0ec5a6a3a9e8c27b2127ba491ce52/library/core/src/ptr/mod.rs:487:1
   8: core::ptr::drop_in_place<core::cell::UnsafeCell<dyn core::ops::function::FnMut<(winit::event::Event<()>,&winit::event_loop::EventLoopWindowTarget<()>,&mut winit::event_loop::ControlFlow)>+Output = ()>>
             at /rustc/a55dd71d5fb0ec5a6a3a9e8c27b2127ba491ce52/library/core/src/ptr/mod.rs:487:1
   9: core::ptr::drop_in_place<core::cell::RefCell<dyn core::ops::function::FnMut<(winit::event::Event<()>,&winit::event_loop::EventLoopWindowTarget<()>,&mut winit::event_loop::ControlFlow)>+Output = ()>>
             at /rustc/a55dd71d5fb0ec5a6a3a9e8c27b2127ba491ce52/library/core/src/ptr/mod.rs:487:1
  10: <alloc::rc::Rc<T> as core::ops::drop::Drop>::drop
             at /rustc/a55dd71d5fb0ec5a6a3a9e8c27b2127ba491ce52/library/alloc/src/rc.rs:1559:17
  11: core::ptr::drop_in_place<alloc::rc::Rc<core::cell::RefCell<dyn core::ops::function::FnMut<(winit::event::Event<()>,&winit::event_loop::EventLoopWindowTarget<()>,&mut winit::event_loop::ControlFlow)>+Output = ()>>>
             at /rustc/a55dd71d5fb0ec5a6a3a9e8c27b2127ba491ce52/library/core/src/ptr/mod.rs:487:1
  12: core::ptr::drop_in_place<core::option::Option<alloc::rc::Rc<core::cell::RefCell<dyn core::ops::function::FnMut<(winit::event::Event<()>,&winit::event_loop::EventLoopWindowTarget<()>,&mut winit::event_loop::ControlFlow)>+Output = ()>>>>
             at /rustc/a55dd71d5fb0ec5a6a3a9e8c27b2127ba491ce52/library/core/src/ptr/mod.rs:487:1
  13: core::mem::drop
             at /rustc/a55dd71d5fb0ec5a6a3a9e8c27b2127ba491ce52/library/core/src/mem/mod.rs:988:24
  14: winit::platform_impl::platform::event_loop::EventLoop<T>::run_return
             at /Users/niklaskorz/.cargo/registry/src/github.com-1ecc6299db9ec823/winit-0.27.5/src/platform_impl/macos/event_loop.rs:230:9
  15: winit::platform_impl::platform::event_loop::EventLoop<T>::run
             at /Users/niklaskorz/.cargo/registry/src/github.com-1ecc6299db9ec823/winit-0.27.5/src/platform_impl/macos/event_loop.rs:191:25
  16: winit::event_loop::EventLoop<T>::run
             at /Users/niklaskorz/.cargo/registry/src/github.com-1ecc6299db9ec823/winit-0.27.5/src/event_loop.rs:278:9
  17: water::framework::start
             at ./examples/water/../framework.rs:289:5
  18: water::framework::run
             at ./examples/water/../framework.rs:441:5
  19: water::main
             at ./examples/water/main.rs:820:5
  20: core::ops::function::FnOnce::call_once
             at /rustc/a55dd71d5fb0ec5a6a3a9e8c27b2127ba491ce52/library/core/src/ops/function.rs:248:5
note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.
[2023-04-07T09:42:44Z ERROR wgpu_hal::metal::device] No active command buffers for fence value 294
[2023-04-07T09:42:44Z ERROR wgpu_core::device::resource] failed to wait for the device: Lost

I have checked the current trunk/master branch of upstream wgpu and can confirm these errors do not occur there.

@gents83
Copy link
Contributor Author

gents83 commented Apr 7, 2023

I've fixed the compiler errors for the Metal backend (should I open a pull request on your branch @gents83?)

Lovely! Sure - let's do that, I'll review it and merge it if possible 👍

but there's a final issue to solve: when shutting down a wgpu application using the Metal backend, the following errors occur:

Ah! Oki - it's possible that there is some missing Arc between fence and command buffer from the message and the Command Buffer is getting released before the fence (or maybe it's just a wrong destruction order).
Is it something that you can verify?

@niklaskorz
Copy link
Contributor

niklaskorz commented Apr 7, 2023

Ah! Oki - it's possible that there is some missing Arc between fence and command buffer from the message and the Command Buffer is getting released before the fence (or maybe it's just a wrong destruction order). Is it something that you can verify?

Trying to debug it right now, and actually a command buffer associated with the wait value (or higher) is never added to the fence. The last command buffer to be added always has a value of wait_value - 1.

On upstream, the last (value, command_buffer) tuple in the fence always has value == wait_value.

Edit 1:

Maybe I'm missing something, but I've debug-logged all increments to device.active_submission_index and I can't seem to find where it is increased to the final value used by the failing wait...

[wgpu-core/src/device/queue.rs:556] device.active_submission_index.fetch_add(1, Ordering::Relaxed) = 224
[wgpu-core/src/device/queue.rs:1087] device.active_submission_index.fetch_add(1, Ordering::Relaxed) = 225
[2023-04-07T10:33:15Z INFO  wgpu_hal::metal] Adding command buffer with value 225 to fence
[wgpu-core/src/device/queue.rs:556] device.active_submission_index.fetch_add(1, Ordering::Relaxed) = 226
[wgpu-core/src/device/queue.rs:1087] device.active_submission_index.fetch_add(1, Ordering::Relaxed) = 227
[2023-04-07T10:33:15Z INFO  wgpu_hal::metal] Adding command buffer with value 227 to fence
[2023-04-07T10:33:15Z ERROR wgpu_hal::metal::device] No active command buffers for fence value 228

Edit 2:

From https://doc.rust-lang.org/std/sync/atomic/struct.AtomicU64.html#method.fetch_add:

Adds to the current value, returning the previous value.

Am I right to assume that all calls to .fetch_add(1, Ordering::Relaxed) should be replaced by .fetch_add(1, Ordering::Relaxed) + 1 then?

Edit 2:

Yup, this seems to fix it! I'll open another PR on your branch so you can review @gents83

@niklaskorz
Copy link
Contributor

Tested on my own project: with my proposed changes this PR nows builds on all platforms: https://github.com/niklaskorz/linon/actions/runs/4638085286

Unfortunatly, wgpu still does not run on the web, but that is tracked in rust-lang/cargo#3430.

gents83 added 3 commits April 7, 2023 17:50
Fix compiler errors for Metal backend
Fix usage of active_submission_index.fetch_add
@nical nical mentioned this pull request Nov 17, 2023
3 tasks
@cwfitzgerald cwfitzgerald mentioned this pull request Nov 17, 2023
@Wumpf Wumpf mentioned this pull request Nov 18, 2023
@SludgePhD
Copy link
Contributor

I ran into some deadlocks that happen on both trunk and this branch in #4686, so I submitted a fix against this branch here: gents83#17

@cwfitzgerald
Copy link
Member

@SludgePhD Nice! The plan is to merge this on monday, so I suspect we should wait post merge to fix a pre-existing issue.

Shorten lock durations to avoid deadlocks - thanks a lot @SludgePhD :)
@nical
Copy link
Contributor

nical commented Nov 20, 2023

Alright, brace yourselves folks! 🥁

@nical nical merged commit 6e21f7a into gfx-rs:trunk Nov 20, 2023
27 checks passed
@teoxoy
Copy link
Member

teoxoy commented Nov 20, 2023

@gents83 were the changes to tests/tests/zero_init_texture_after_discard.rs required for arcanization to work?

We seem to no longer test multiple command encoder submissions.

Also, assert_buffers_are_zero was already waiting.

device.poll(Maintain::Wait);

@gents83
Copy link
Contributor Author

gents83 commented Nov 20, 2023

@gents83 were the changes to tests/tests/zero_init_texture_after_discard.rs required for arcanization to work?

We seem to no longer test multiple command encoder submissions.

Also, assert_buffers_are_zero was already waiting.

device.poll(Maintain::Wait);

@teoxoy with some latest changes from trunk I had that test in particular failing if separated in different sumbissions.
But it's not really a need for Arcanization itself

moz-v2v-gh pushed a commit to mozilla/gecko-dev that referenced this pull request Nov 21, 2023
…6978ae2d251. r=webgpu-reviewers,supply-chain-reviewers,ErichDonGubler,teoxoy

Note: This revision contains the arcanization work

# Changelog

 * #4702 Add `WasmNotSendSync`
   By daxpedda in gfx-rs/wgpu#4702
 * #4707 Add more metal keywords
   By fornwall in gfx-rs/wgpu#4707
 * #4706 [naga] remove `span` and `validate` features
   By teoxoy in gfx-rs/wgpu#4706
 * #4709 [dx12] filter out haswell iGPUs
   By teoxoy in gfx-rs/wgpu#4709
 * #4712 Fix typo in pull request template.
   By jimblandy in gfx-rs/wgpu#4712
 * #4598 Add more lints
   By daxpedda in gfx-rs/wgpu#4598
 * #4713 [naga wgsl-in] Include base when printing pointer and array types.
   By jimblandy in gfx-rs/wgpu#4713
 * #4718 [vk] check that adapters are Vulkan compliant
   By teoxoy in gfx-rs/wgpu#4718
 * #4719 [naga] Let constant evaluation of `As` preserve `Splat` expressions.
   By jimblandy in gfx-rs/wgpu#4719
 * #4725 Corrects typo in examples FrameCounter
   By cantudo in gfx-rs/wgpu#4725
 * #3626 Arcanization of wgpu core resources
   By gents83 in gfx-rs/wgpu#3626

Differential Revision: https://phabricator.services.mozilla.com/D194048
vinnydiehl pushed a commit to vinnydiehl/mozilla-unified that referenced this pull request Nov 21, 2023
…6978ae2d251. r=webgpu-reviewers,supply-chain-reviewers,ErichDonGubler,teoxoy

Note: This revision contains the arcanization work

# Changelog

 * #4702 Add `WasmNotSendSync`
   By daxpedda in gfx-rs/wgpu#4702
 * #4707 Add more metal keywords
   By fornwall in gfx-rs/wgpu#4707
 * #4706 [naga] remove `span` and `validate` features
   By teoxoy in gfx-rs/wgpu#4706
 * #4709 [dx12] filter out haswell iGPUs
   By teoxoy in gfx-rs/wgpu#4709
 * #4712 Fix typo in pull request template.
   By jimblandy in gfx-rs/wgpu#4712
 * #4598 Add more lints
   By daxpedda in gfx-rs/wgpu#4598
 * #4713 [naga wgsl-in] Include base when printing pointer and array types.
   By jimblandy in gfx-rs/wgpu#4713
 * #4718 [vk] check that adapters are Vulkan compliant
   By teoxoy in gfx-rs/wgpu#4718
 * #4719 [naga] Let constant evaluation of `As` preserve `Splat` expressions.
   By jimblandy in gfx-rs/wgpu#4719
 * #4725 Corrects typo in examples FrameCounter
   By cantudo in gfx-rs/wgpu#4725
 * #3626 Arcanization of wgpu core resources
   By gents83 in gfx-rs/wgpu#3626

Differential Revision: https://phabricator.services.mozilla.com/D194048
gecko-dev-updater pushed a commit to marco-c/gecko-dev-wordified-and-comments-removed that referenced this pull request Nov 21, 2023
…6978ae2d251. r=webgpu-reviewers,supply-chain-reviewers,ErichDonGubler,teoxoy

Note: This revision contains the arcanization work

# Changelog

 * #4702 Add `WasmNotSendSync`
   By daxpedda in gfx-rs/wgpu#4702
 * #4707 Add more metal keywords
   By fornwall in gfx-rs/wgpu#4707
 * #4706 [naga] remove `span` and `validate` features
   By teoxoy in gfx-rs/wgpu#4706
 * #4709 [dx12] filter out haswell iGPUs
   By teoxoy in gfx-rs/wgpu#4709
 * #4712 Fix typo in pull request template.
   By jimblandy in gfx-rs/wgpu#4712
 * #4598 Add more lints
   By daxpedda in gfx-rs/wgpu#4598
 * #4713 [naga wgsl-in] Include base when printing pointer and array types.
   By jimblandy in gfx-rs/wgpu#4713
 * #4718 [vk] check that adapters are Vulkan compliant
   By teoxoy in gfx-rs/wgpu#4718
 * #4719 [naga] Let constant evaluation of `As` preserve `Splat` expressions.
   By jimblandy in gfx-rs/wgpu#4719
 * #4725 Corrects typo in examples FrameCounter
   By cantudo in gfx-rs/wgpu#4725
 * #3626 Arcanization of wgpu core resources
   By gents83 in gfx-rs/wgpu#3626

Differential Revision: https://phabricator.services.mozilla.com/D194048

UltraBlame original commit: eb839abb42abde594497e419517260997fa1fc4d
gecko-dev-updater pushed a commit to marco-c/gecko-dev-wordified that referenced this pull request Nov 21, 2023
…6978ae2d251. r=webgpu-reviewers,supply-chain-reviewers,ErichDonGubler,teoxoy

Note: This revision contains the arcanization work

# Changelog

 * #4702 Add `WasmNotSendSync`
   By daxpedda in gfx-rs/wgpu#4702
 * #4707 Add more metal keywords
   By fornwall in gfx-rs/wgpu#4707
 * #4706 [naga] remove `span` and `validate` features
   By teoxoy in gfx-rs/wgpu#4706
 * #4709 [dx12] filter out haswell iGPUs
   By teoxoy in gfx-rs/wgpu#4709
 * #4712 Fix typo in pull request template.
   By jimblandy in gfx-rs/wgpu#4712
 * #4598 Add more lints
   By daxpedda in gfx-rs/wgpu#4598
 * #4713 [naga wgsl-in] Include base when printing pointer and array types.
   By jimblandy in gfx-rs/wgpu#4713
 * #4718 [vk] check that adapters are Vulkan compliant
   By teoxoy in gfx-rs/wgpu#4718
 * #4719 [naga] Let constant evaluation of `As` preserve `Splat` expressions.
   By jimblandy in gfx-rs/wgpu#4719
 * #4725 Corrects typo in examples FrameCounter
   By cantudo in gfx-rs/wgpu#4725
 * #3626 Arcanization of wgpu core resources
   By gents83 in gfx-rs/wgpu#3626

Differential Revision: https://phabricator.services.mozilla.com/D194048

UltraBlame original commit: eb839abb42abde594497e419517260997fa1fc4d
gecko-dev-updater pushed a commit to marco-c/gecko-dev-comments-removed that referenced this pull request Nov 21, 2023
…6978ae2d251. r=webgpu-reviewers,supply-chain-reviewers,ErichDonGubler,teoxoy

Note: This revision contains the arcanization work

# Changelog

 * #4702 Add `WasmNotSendSync`
   By daxpedda in gfx-rs/wgpu#4702
 * #4707 Add more metal keywords
   By fornwall in gfx-rs/wgpu#4707
 * #4706 [naga] remove `span` and `validate` features
   By teoxoy in gfx-rs/wgpu#4706
 * #4709 [dx12] filter out haswell iGPUs
   By teoxoy in gfx-rs/wgpu#4709
 * #4712 Fix typo in pull request template.
   By jimblandy in gfx-rs/wgpu#4712
 * #4598 Add more lints
   By daxpedda in gfx-rs/wgpu#4598
 * #4713 [naga wgsl-in] Include base when printing pointer and array types.
   By jimblandy in gfx-rs/wgpu#4713
 * #4718 [vk] check that adapters are Vulkan compliant
   By teoxoy in gfx-rs/wgpu#4718
 * #4719 [naga] Let constant evaluation of `As` preserve `Splat` expressions.
   By jimblandy in gfx-rs/wgpu#4719
 * #4725 Corrects typo in examples FrameCounter
   By cantudo in gfx-rs/wgpu#4725
 * #3626 Arcanization of wgpu core resources
   By gents83 in gfx-rs/wgpu#3626

Differential Revision: https://phabricator.services.mozilla.com/D194048

UltraBlame original commit: eb839abb42abde594497e419517260997fa1fc4d
@xiaopengli89
Copy link
Contributor

@gents83 Surface::as_hal has break change, we can't modify surface anymore, such as present_with_transaction for ui apps, is there any other way we can set this option?

@gents83
Copy link
Contributor Author

gents83 commented Nov 29, 2023

@gents83 Surface::as_hal has break change, we can't modify surface anymore, such as present_with_transaction for ui apps, is there any other way we can set this option?

Oh! I am guessing that is because I removed the surface_as_hal_mut()? Is it the only case where it's needed?
Otherwise I think it would be preferrable to have a config parameter at initialization time and manage it internally, no?
Looking at some past discussions I do not know exactly how it ended up to be exposed as mut:
#2711 (comment)

@xiaopengli89
Copy link
Contributor

Oh! I am guessing that is because I removed the surface_as_hal_mut()? Is it the only case where it's needed? Otherwise I think it would be preferrable to have a config parameter at initialization time and manage it internally, no? Looking at some past discussions I do not know exactly how it ended up to be exposed as mut: #2711 (comment)

As it's a platform feature, I exposed it as mut: #3123, I don't know if it is appropriate to add surface initialization parameter.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet