Buffers Leaked when Device Handle is Dropped Before Buffers #5529

jonmmease · 2024-04-13T14:32:50Z

Description
After repeated headless rendering with the metal backend, wgpu 0.19 crashes with a "Context leak detected, msgtracer returned -1" error where version 0.18.0 works without a problem.

My workflow is to perform headless rendering to PNG images following the approach in the Wgpu without a window learn-wgpu page.

Repro steps
I've created a repro repository in https://github.com/jonmmease/wgpu-memory-repro that reliable reproduces the crash on my M1 macbook pro. See the README for more details about the repro and full system details.

Expected vs observed behavior
In wgpu 0.18.0, everything works well. in 0.19 (I tried 0.19.0 through 0.19.3) I get this crash after ~100 iterations.

Context leak detected, msgtracer returned -1

thread 'main' panicked at /Users/jonmmease/.cargo/registry/src/index.crates.io-6f17d22bba15001f/foreign-types-shared-0.3.1/src/lib.rs:72:9:
assertion failed: !ptr.is_null()
stack backtrace:
   0: rust_begin_unwind
             at /rustc/07dca489ac2d933c78d3c5158e3f43beefeb02ce/library/std/src/panicking.rs:645:5
   1: core::panicking::panic_fmt
             at /rustc/07dca489ac2d933c78d3c5158e3f43beefeb02ce/library/core/src/panicking.rs:72:14
   2: core::panicking::panic
             at /rustc/07dca489ac2d933c78d3c5158e3f43beefeb02ce/library/core/src/panicking.rs:144:5
   3: foreign_types_shared::ForeignTypeRef::from_ptr
             at /Users/jonmmease/.cargo/registry/src/index.crates.io-6f17d22bba15001f/foreign-types-shared-0.3.1/src/lib.rs:72:9
   4: <metal::commandqueue::CommandQueue as core::ops::deref::Deref>::deref
             at /Users/jonmmease/.cargo/registry/src/index.crates.io-6f17d22bba15001f/metal-0.27.0/src/commandqueue.rs:13:1
   5: wgpu_hal::metal::command::<impl wgpu_hal::CommandEncoder<wgpu_hal::metal::Api> for wgpu_hal::metal::CommandEncoder>::begin_encoding::{{closure}}
             at /Users/jonmmease/.cargo/registry/src/index.crates.io-6f17d22bba15001f/wgpu-hal-0.19.3/src/metal/command.rs:179:17
   6: objc::rc::autorelease::autoreleasepool
             at /Users/jonmmease/.cargo/registry/src/index.crates.io-6f17d22bba15001f/objc-0.2.7/src/rc/autorelease.rs:29:5
   7: wgpu_hal::metal::command::<impl wgpu_hal::CommandEncoder<wgpu_hal::metal::Api> for wgpu_hal::metal::CommandEncoder>::begin_encoding
             at /Users/jonmmease/.cargo/registry/src/index.crates.io-6f17d22bba15001f/wgpu-hal-0.19.3/src/metal/command.rs:175:19
   8: wgpu_core::device::queue::PendingWrites<A>::activate
             at /Users/jonmmease/.cargo/registry/src/index.crates.io-6f17d22bba15001f/wgpu-core-0.19.3/src/device/queue.rs:265:17
   9: wgpu_core::device::resource::Device<A>::new
             at /Users/jonmmease/.cargo/registry/src/index.crates.io-6f17d22bba15001f/wgpu-core-0.19.3/src/device/resource.rs:230:9
  10: wgpu_core::instance::Adapter<A>::create_device_and_queue_from_hal
             at /Users/jonmmease/.cargo/registry/src/index.crates.io-6f17d22bba15001f/wgpu-core-0.19.3/src/instance.rs:306:29
  11: wgpu_core::instance::Adapter<A>::create_device_and_queue
             at /Users/jonmmease/.cargo/registry/src/index.crates.io-6f17d22bba15001f/wgpu-core-0.19.3/src/instance.rs:381:9
  12: wgpu_core::instance::<impl wgpu_core::global::Global<G>>::adapter_request_device
             at /Users/jonmmease/.cargo/registry/src/index.crates.io-6f17d22bba15001f/wgpu-core-0.19.3/src/instance.rs:1084:23
  13: <wgpu::backend::wgpu_core::ContextWgpuCore as wgpu::context::Context>::adapter_request_device
             at /Users/jonmmease/.cargo/registry/src/index.crates.io-6f17d22bba15001f/wgpu-0.19.3/src/backend/wgpu_core.rs:587:44
  14: <T as wgpu::context::DynContext>::adapter_request_device
             at /Users/jonmmease/.cargo/registry/src/index.crates.io-6f17d22bba15001f/wgpu-0.19.3/src/context.rs:2019:22
  15: wgpu::Adapter::request_device
             at /Users/jonmmease/.cargo/registry/src/index.crates.io-6f17d22bba15001f/wgpu-0.19.3/src/lib.rs:2119:22
  16: wgpu_memory_repro::State::new::{{closure}}
             at ./src/main.rs:103:31
  17: wgpu_memory_repro::run::{{closure}}
             at ./src/main.rs:300:34
  18: pollster::block_on
             at /Users/jonmmease/.cargo/registry/src/index.crates.io-6f17d22bba15001f/pollster-0.3.0/src/lib.rs:128:15
  19: wgpu_memory_repro::main
             at ./src/main.rs:309:9
  20: core::ops::function::FnOnce::call_once
             at /rustc/07dca489ac2d933c78d3c5158e3f43beefeb02ce/library/core/src/ops/function.rs:250:5
note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.

Extra materials
I went through the Metal section of https://github.com/gfx-rs/wgpu/wiki/Debugging-wgpu-Applications, creating the XML file and updating the executable with the codesign command. Then I ran the executable with METAL_DEVICE_WRAPPER_TYPE=1 ./target/debug/wgpu-memory-repro. The console printed out wgpu-memory-repro[4512:4643030] Metal API Validation Enabled, but no additional information was displayed. Let me know if that's not the correct process and there's anything more I can do to get more info.

Thanks a lot!

Platform

Metal backend
Chip: Apple M1 Pro
macOS: Sonoma 14.4.1
Memory 32 GB

The text was updated successfully, but these errors were encountered:

jonmmease · 2024-04-13T14:48:16Z

After reading through #3056, I tried setting OBJC_DEBUG_MISSING_POOLS=YES and I get pages and pages of messages like this:

 % OBJC_DEBUG_MISSING_POOLS=YES METAL_DEVICE_WRAPPER_TYPE=1 ./target/debug/wgpu-memory-repro

2024-04-13 10:41:33.085 wgpu-memory-repro[58773:4772502] Metal API Validation Enabled
objc[58773]: MISSING POOLS: (0x207fb3ac0) Object 0x600001a28060 of class __NSSingleObjectArrayI autoreleased with no pool in place - just leaking - break on objc_autoreleaseNoPool() to debug
objc[58773]: MISSING POOLS: (0x207fb3ac0) Object 0x600001624e70 of class __NSCFString autoreleased with no pool in place - just leaking - break on objc_autoreleaseNoPool() to debug
objc[58773]: MISSING POOLS: (0x207fb3ac0) Object 0x600001624e40 of class __NSCFString autoreleased with no pool in place - just leaking - break on objc_autoreleaseNoPool() to debug
objc[58773]: MISSING POOLS: (0x207fb3ac0) Object 0x600003b30000 of class __NSCFData autoreleased with no pool in place - just leaking - break on objc_autoreleaseNoPool() to debug
objc[58773]: MISSING POOLS: (0x207fb3ac0) Object 0x600001a281d0 of class __NSSingleObjectArrayI autoreleased with no pool in place - just leaking - break on objc_autoreleaseNoPool() to debug
objc[58773]: MISSING POOLS: (0x207fb3ac0) Object 0x600001624900 of class __NSCFString autoreleased with no pool in place - just leaking - break on objc_autoreleaseNoPool() to debug
objc[58773]: MISSING POOLS: (0x207fb3ac0) Object 0x600001624de0 of class __NSCFString autoreleased with no pool in place - just leaking - break on objc_autoreleaseNoPool() to debug
objc[58773]: MISSING POOLS: (0x207fb3ac0) Object 0x600003b24320 of class __NSCFData autoreleased with no pool in place - just leaking - break on objc_autoreleaseNoPool() to debug
objc[58773]: MISSING POOLS: (0x207fb3ac0) Object 0x600003b24370 of class MTLCommandQueueDescriptorInternal autoreleased with no pool in place - just leaking - break on objc_autoreleaseNoPool() to debug
objc[58773]: MISSING POOLS: (0x207fb3ac0) Object 0x600001a34020 of class __NSSingleObjectArrayI autoreleased with no pool in place - just leaking - break on objc_autoreleaseNoPool() to debug
objc[58773]: MISSING POOLS: (0x207fb3ac0) Object 0x600001634030 of class __NSCFString autoreleased with no pool in place - just leaking - break on objc_autoreleaseNoPool() to debug
objc[58773]: MISSING POOLS: (0x207fb3ac0) Object 0x600001634300 of class __NSCFString autoreleased with no pool in place - just leaking - break on objc_autoreleaseNoPool() to debug
objc[58773]: MISSING POOLS: (0x207fb3ac0) Object 0x600003b340a0 of class __NSCFData autoreleased with no pool in place - just leaking - break on objc_autoreleaseNoPool() to debug
objc[58773]: MISSING POOLS: (0x207fb3ac0) Object 0x600000d24b00 of class __NSCFString autoreleased with no pool in place - just leaking - break on objc_autoreleaseNoPool() to debug
objc[58773]: MISSING POOLS: (0x207fb3ac0) Object 0x600000d24ac0 of class __NSCFString autoreleased with no pool in place - just leaking - break on objc_autoreleaseNoPool() to debug
objc[58773]: MISSING POOLS: (0x207fb3ac0) Object 0x600000d24a80 of class __NSCFString autoreleased with no pool in place - just leaking - break on objc_autoreleaseNoPool() to debug
objc[58773]: MISSING POOLS: (0x207fb3ac0) Object 0x600000d24a40 of class __NSCFString autoreleased with no pool in place - just leaking - break on objc_autoreleaseNoPool() to debug
objc[58773]: MISSING POOLS: (0x207fb3ac0) Object 0x600000d24a00 of class __NSCFString autoreleased with no pool in place - just leaking - break on objc_autoreleaseNoPool() to debug
objc[58773]: MISSING POOLS: (0x207fb3ac0) Object 0x600000d249c0 of class __NSCFString autoreleased with no pool in place - just leaking - break on objc_autoreleaseNoPool() to debug
objc[58773]: MISSING POOLS: (0x207fb3ac0) Object 0x600000d24940 of class __NSCFString autoreleased with no pool in place - just leaking - break on objc_autoreleaseNoPool() to debug
objc[58773]: MISSING POOLS: (0x207fb3ac0) Object 0x600000d24600 of class __NSCFString autoreleased with no pool in place - just leaking - break on objc_autoreleaseNoPool() to debug
objc[58773]: MISSING POOLS: (0x207fb3ac0) Object 0x600000d246c0 of class __NSCFString autoreleased with no pool in place - just leaking - break on objc_autoreleaseNoPool() to debug
objc[58773]: MISSING POOLS: (0x207fb3ac0) Object 0x600000d24680 of class __NSCFString autoreleased with no pool in place - just leaking - break on objc_autoreleaseNoPool() to debug
objc[58773]: MISSING POOLS: (0x207fb3ac0) Object 0x600000d24740 of class __NSCFString autoreleased with no pool in place - just leaking - break on objc_autoreleaseNoPool() to debug
objc[58773]: MISSING POOLS: (0x207fb3ac0) Object 0x6000016260d0 of class __NSArrayM autoreleased with no pool in place - just leaking - break on objc_autoreleaseNoPool() to debug
...
thread 'main' panicked at /Users/jonmmease/.cargo/registry/src/index.crates.io-6f17d22bba15001f/foreign-types-shared-0.3.1/src/lib.rs:72:9:
assertion failed: !ptr.is_null()

But it looks like similar messages are printed with wgpu 0.18.0 (when the program doesn't crash), so I'm not sure whether this is significant.

jonmmease · 2024-04-15T18:18:16Z

According to git-bisect, and my repro above, this bug was introduced in #3626.

Friendly ping @gents83 and @nical in case anything here rings a bell based on your work/review on that PR

jonmmease · 2024-04-17T19:06:33Z

I was able to simplify the repro a lot more. It turns out that the png export workflow isn't a relevant factor, and the crash can be reproduced by repeatedly allocating a vertex buffer. For example:

use wgpu::util::DeviceExt;

pub async fn run() {
    let instance = wgpu::Instance::new(wgpu::InstanceDescriptor {
        backends: wgpu::Backends::METAL,
        ..Default::default()
    });

    let adapter = instance
        .request_adapter(&wgpu::RequestAdapterOptions {
            power_preference: wgpu::PowerPreference::default(),
            compatible_surface: None,
            force_fallback_adapter: false,
        })
        .await
        .unwrap();

    // // wgpu 0.18
    // let device_descriptor = wgpu::DeviceDescriptor {
    //     label: None,
    //     features: wgpu::Features::empty(),
    //     limits: wgpu::Limits::default(),
    // };

    // wgpu 0.19.3
    let device_descriptor = wgpu::DeviceDescriptor {
        label: None,
        required_features: wgpu::Features::empty(),
        required_limits: wgpu::Limits::default(),
    };

    let (device, _) = adapter
        .request_device(&device_descriptor, None)
        .await
        .unwrap();

    // Allocate a vertex buffer that should be dropped immediately.
    // Without this allocation, there is no leak/crash.
    let _ = device.create_buffer_init(&wgpu::util::BufferInitDescriptor {
        label: Some("Vertex Buffer"),
        contents: bytemuck::cast_slice(&[0; 1024]),
        usage: wgpu::BufferUsages::VERTEX,
    });
}

fn main() {
    for i in 0..1000 {
        if i % 10 == 0 {
            println!("{i}");
        }
        pollster::block_on(run());
    }
}

0
10
20
30
40
50
60
70
80
90
100
110
120
130
140
Context leak detected, msgtracer returned -1
150
160
170
180
190
Context leak detected, msgtracer returned -1
200
210
220
230
240
Context leak detected, msgtracer returned -1
250
260
270
280
290
Context leak detected, msgtracer returned -1
300
310
320
330
340
Context leak detected, msgtracer returned -1
350
360
370
380
390
Context leak detected, msgtracer returned -1
400
410
420
430
440
Context leak detected, msgtracer returned -1
450
460
470
480
490
Context leak detected, msgtracer returned -1
500
510
520
530
540
Context leak detected, msgtracer returned -1
550
560
570
580
590
Context leak detected, msgtracer returned -1
600
610
620
630
640
Context leak detected, msgtracer returned -1
650
660
670
680
690
Context leak detected, msgtracer returned -1
700
710
720
730
740
Context leak detected, msgtracer returned -1
750
760
770
780
790
thread 'main' panicked at 'assertion failed: !ptr.is_null()', /Users/jonmmease/.cargo/registry/src/index.crates.io-6f17d22bba15001f/foreign-types-shared-0.3.1/src/lib.rs:72:9
stack backtrace:
   0: rust_begin_unwind
             at /rustc/90c541806f23a127002de5b4038be731ba1458ca/library/std/src/panicking.rs:578:5
   1: core::panicking::panic_fmt
             at /rustc/90c541806f23a127002de5b4038be731ba1458ca/library/core/src/panicking.rs:67:14
   2: core::panicking::panic
             at /rustc/90c541806f23a127002de5b4038be731ba1458ca/library/core/src/panicking.rs:117:5
   3: foreign_types_shared::ForeignTypeRef::from_ptr
             at /Users/jonmmease/.cargo/registry/src/index.crates.io-6f17d22bba15001f/foreign-types-shared-0.3.1/src/lib.rs:72:9
   4: <metal::commandqueue::CommandQueue as core::ops::deref::Deref>::deref
             at /Users/jonmmease/.cargo/registry/src/index.crates.io-6f17d22bba15001f/metal-0.27.0/src/commandqueue.rs:13:1
   5: wgpu_hal::metal::command::<impl wgpu_hal::CommandEncoder<wgpu_hal::metal::Api> for wgpu_hal::metal::CommandEncoder>::begin_encoding::{{closure}}
             at /Users/jonmmease/VegaFusion/repos/wgpu/wgpu-hal/src/metal/command.rs:179:17
   6: objc::rc::autorelease::autoreleasepool
             at /Users/jonmmease/.cargo/registry/src/index.crates.io-6f17d22bba15001f/objc-0.2.7/src/rc/autorelease.rs:29:5
   7: wgpu_hal::metal::command::<impl wgpu_hal::CommandEncoder<wgpu_hal::metal::Api> for wgpu_hal::metal::CommandEncoder>::begin_encoding
             at /Users/jonmmease/VegaFusion/repos/wgpu/wgpu-hal/src/metal/command.rs:175:19
   8: wgpu_core::device::queue::PendingWrites<A>::activate
             at /Users/jonmmease/VegaFusion/repos/wgpu/wgpu-core/src/device/queue.rs:280:17
   9: wgpu_core::device::resource::Device<A>::new
             at /Users/jonmmease/VegaFusion/repos/wgpu/wgpu-core/src/device/resource.rs:226:9
  10: wgpu_core::instance::Adapter<A>::create_device_and_queue_from_hal
             at /Users/jonmmease/VegaFusion/repos/wgpu/wgpu-core/src/instance.rs:312:29
  11: wgpu_core::instance::Adapter<A>::create_device_and_queue
             at /Users/jonmmease/VegaFusion/repos/wgpu/wgpu-core/src/instance.rs:386:9
  12: wgpu_core::instance::<impl wgpu_core::global::Global<G>>::adapter_request_device
             at /Users/jonmmease/VegaFusion/repos/wgpu/wgpu-core/src/instance.rs:1200:23
  13: <wgpu::backend::direct::Context as wgpu::context::Context>::adapter_request_device
             at /Users/jonmmease/VegaFusion/repos/wgpu/wgpu/src/backend/direct.rs:635:44
  14: <T as wgpu::context::DynContext>::adapter_request_device
             at /Users/jonmmease/VegaFusion/repos/wgpu/wgpu/src/context.rs:2124:22
  15: wgpu::Adapter::request_device
             at /Users/jonmmease/VegaFusion/repos/wgpu/wgpu/src/lib.rs:2263:22
  16: wgpu_memory_repro2::run::{{closure}}
             at ./src/main.rs:32:23
  17: pollster::block_on
             at /Users/jonmmease/.cargo/registry/src/index.crates.io-6f17d22bba15001f/pollster-0.3.0/src/lib.rs:128:15
  18: wgpu_memory_repro2::main
             at ./src/main.rs:49:9
  19: core::ops::function::FnOnce::call_once
             at /rustc/90c541806f23a127002de5b4038be731ba1458ca/library/core/src/ops/function.rs:250:5

grovesNL · 2024-04-17T19:09:33Z

Does the leak happen if you wrap the entire thing in an autorelease pool? I guess you might be noticing this because it's running headless and nothing provides a global autorelease pool (vs. a windowed application that would have an autorelease pool automatically added by the windowing library).

jonmmease · 2024-04-17T22:38:52Z

Thanks for the suggestion @grovesNL

I'm getting the same result when I update the above like this:

        objc::rc::autoreleasepool(|| {
            pollster::block_on(run());
        })

or if I wrap the create_buffer_init call like this:

    objc::rc::autoreleasepool(|| {
        let _ = device.create_buffer_init(&wgpu::util::BufferInitDescriptor {
            label: Some("Vertex Buffer"),
            contents: bytemuck::cast_slice(&[0; 1024]),
            usage: wgpu::BufferUsages::VERTEX,
        });
    })

Let me know if that's not what you had in mind. Thanks again!

jonmmease · 2024-04-18T00:16:27Z

Debugging a bit. I see that when the buffer is created this create_buffer function is called.

wgpu/wgpu-hal/src/metal/device.rs

Line 285 in d30255f

    
           unsafe fn create_buffer(&self, desc: &crate::BufferDescriptor) -> DeviceResult<super::Buffer> {

But I haven't been able to figure out whether there is metal-specific cleanup logic that is supposed to be called when a buffer is dropped, or if the autoreleasepool in the create_buffer function above is supposed to take care of that.

grovesNL · 2024-04-18T01:10:40Z

Yeah that seems right. We basically need an autorelease pool to be active whenever types like Metal buffers are created/cloned (retained)/dropped (releaseed)/etc., so if there is a place that does that without an autorelease pool active then it could leak. I think the one you used for the entire program would be enough, but maybe there's something else going on.

I haven't worked with the Metal backend for a while, but maybe something else (internally to wgpu) holds a strong reference to the buffer you're creating? The Xcode debugger might be able to show you if anything's referencing it.

jonmmease · 2024-04-18T16:32:52Z

Thanks @grovesNL, that's helpful.

I started playing with Xcode Instruments to analyze leaks. It's a little overwhelming. For the commit before #3626, no leaks are detected (though memory usage does climb a little over time).

But for #3626 and beyond, there are dozens of leaks reported and memory usage increases dramatically over time until the program crashes.

When I have more time I'll try to look at some of these stack traces more carefully and see if I can make more sense of it.

jonmmease · 2024-04-18T23:49:03Z

I found something potentially suspicious. In my repro, each iteration results in a device being created using the new associated function at

wgpu/wgpu-core/src/device/resource.rs

Line 211 in ca729cc

pub(crate) fn new(

But the corresponding drop associated function is never called

wgpu/wgpu-core/src/device/resource.rs

Lines 162 to 176 in ca729cc

    
           impl<A: HalApi> Drop for Device<A> { 
        
               fn drop(&mut self) { 
        
                   resource_log!("Destroy raw Device {:?}", self.info.label()); 
        
                   let raw = self.raw.take().unwrap(); 
        
                   let pending_writes = self.pending_writes.lock().take().unwrap(); 
        
                   pending_writes.dispose(&raw); 
        
                   self.command_allocator.dispose(&raw); 
        
                   unsafe { 
        
                       raw.destroy_buffer(self.zero_buffer.take().unwrap()); 
        
                       raw.destroy_fence(self.fence.write().take().unwrap()); 
        
                       let queue = self.queue_to_drop.take().unwrap(); 
        
                       raw.exit(queue); 
        
                   } 
        
               } 
        
           }

I think the leak of buffers may be associated with the fact that the device's buffers aren't getting cleaned up when the top-level handle to the device is dropped. This would suggest that this might not be specifically a metal issue, and perhaps I'm hitting it more than most because I end up creating and dropping devices repeatedly.

cwfitzgerald · 2024-04-18T23:59:48Z

The ownership is actually the other way around, buffers have owning refs on the device - the buffer leak is probably causing the device to stay alive.

See #5120 for the ideal ownership diagram.

jonmmease · 2024-04-19T11:59:40Z

Thanks @cwfitzgerald, that's helpful. I think I'm honing in on a circular reference.

In the repro above, the call to device.create_buffer_init unmaps the buffer after writing to it. This buffer.unmap eventually calls Buffer.ummap_inner:

wgpu/wgpu-core/src/resource.rs

Line 435 in 54740d5

    
           fn unmap_inner(self: &Arc<Self>) -> Result<Option<BufferMapPendingClosure>, BufferAccessError> {

This performs a match on the current value of map_state, and in my case this takes the resource::BufferMapState::Init branch.

wgpu/wgpu-core/src/resource.rs

Lines 446 to 450 in 54740d5

    
           resource::BufferMapState::Init { 
        
               ptr, 
        
               stage_buffer, 
        
               needs_flush, 
        
           } => {

The second to last line of this branch calls pending_writes.consume_temp, passing in the stage_buffer.

wgpu/wgpu-core/src/resource.rs

Line 503 in 54740d5

pending_writes.consume_temp(queue::TempResource::Buffer(stage_buffer));

The stage buffer is an instance of this Buffer struct, which has a strong reference to the Device.

wgpu/wgpu-core/src/resource.rs

Lines 384 to 387 in 54740d5

    
           #[derive(Debug)] 
        
           pub struct Buffer<A: HalApi> { 
        
               pub(crate) raw: Snatchable<A::Buffer>, 
        
               pub(crate) device: Arc<Device<A>>,

This pending_writes.consume_temp call adds the Buffer to the temp_resources Vec of PendingWrites.

wgpu/wgpu-core/src/device/queue.rs

Lines 197 to 207 in 54740d5

    
           pub(crate) struct PendingWrites<A: HalApi> { 
        
               pub command_encoder: A::CommandEncoder, 
        
               /// True if `command_encoder` is in the "recording" state, as 
        
               /// described in the docs for the [`wgpu_hal::CommandEncoder`] 
        
               /// trait. 
        
               /// 
        
               /// [`wgpu_hal::CommandEncoder`]: hal::CommandEncoder 
        
               pub is_recording: bool, 
        
               pub temp_resources: Vec<TempResource<A>>,

This Vec is never cleared because in this repro I'm not submitting anything to the queue, and the dispose function is never called...

wgpu/wgpu-core/src/device/queue.rs

Lines 227 to 238 in 54740d5

    
           pub fn dispose(mut self, device: &A::Device) { 
        
               unsafe { 
        
                   if self.is_recording { 
        
                       self.command_encoder.discard_encoding(); 
        
                   } 
        
                   self.command_encoder 
        
                       .reset_all(self.executing_command_buffers.into_iter()); 
        
                   device.destroy_command_encoder(self.command_encoder); 
        
               } 
        
               self.temp_resources.clear(); 
        
           }

because the Device is never dropped, because a reference to it is in pending_writes.

wgpu/wgpu-core/src/device/resource.rs

Lines 162 to 167 in 54740d5

    
           impl<A: HalApi> Drop for Device<A> { 
        
               fn drop(&mut self) { 
        
                   resource_log!("Destroy raw Device {:?}", self.info.label()); 
        
                   let raw = self.raw.take().unwrap(); 
        
                   let pending_writes = self.pending_writes.lock().take().unwrap(); 
        
                   pending_writes.dispose(&raw);

If I add

let commands: Vec<CommandBuffer> = Vec::new();
queue.submit(commands.into_iter());

to the end of the repro then the device does get dropped successfully, and there is no leak/crash.

Unfortunately, in the original repro (and my real project) I am submitting work the the queue, so this isn't quite what's going on there.

jonmmease · 2024-04-19T13:56:15Z

Ok, I've made a bit more progress. It looks like order in which wgpu objects are dropped determines whether there is a leak or not.

Here's another example where I've flattened everything into a single render function (rather than storing the device and other resources in the State struct.

use std::iter;
use wgpu::util::DeviceExt;
use wgpu::{
    CommandBuffer, CommandEncoderDescriptor, Extent3d, ImageCopyBuffer, ImageCopyTexture,
    ImageDataLayout, Origin3d, Texture, TextureAspect, TextureDescriptor, TextureDimension,
    TextureFormat, TextureUsages, TextureView,
};

#[repr(C)]
#[derive(Copy, Clone, Debug, bytemuck::Pod, bytemuck::Zeroable)]
struct Vertex {
    position: [f32; 3],
    color: [f32; 3],
}

impl Vertex {
    fn desc() -> wgpu::VertexBufferLayout<'static> {
        wgpu::VertexBufferLayout {
            array_stride: std::mem::size_of::<Vertex>() as wgpu::BufferAddress,
            step_mode: wgpu::VertexStepMode::Vertex,
            attributes: &[
                wgpu::VertexAttribute {
                    offset: 0,
                    shader_location: 0,
                    format: wgpu::VertexFormat::Float32x3,
                },
                wgpu::VertexAttribute {
                    offset: std::mem::size_of::<[f32; 3]>() as wgpu::BufferAddress,
                    shader_location: 1,
                    format: wgpu::VertexFormat::Float32x3,
                },
            ],
        }
    }
}

const VERTICES: &[Vertex] = &[
    Vertex {
        position: [-0.0868241, 0.49240386, 0.0],
        color: [0.5, 0.0, 0.5],
    }, // A
    Vertex {
        position: [-0.49513406, 0.06958647, 0.0],
        color: [0.5, 0.0, 0.5],
    }, // B
    Vertex {
        position: [-0.21918549, -0.44939706, 0.0],
        color: [0.5, 0.0, 0.5],
    }, // C
    Vertex {
        position: [0.35966998, -0.3473291, 0.0],
        color: [0.5, 0.0, 0.5],
    }, // D
    Vertex {
        position: [0.44147372, 0.2347359, 0.0],
        color: [0.5, 0.0, 0.5],
    }, // E
];

const INDICES: &[u16] = &[0, 1, 4, 1, 2, 4, 2, 3, 4, /* padding */ 0];

async fn render() -> Result<(), wgpu::SurfaceError> {
    // The instance is a handle to our GPU
    let instance = wgpu::Instance::new(wgpu::InstanceDescriptor {
        backends: wgpu::Backends::METAL,
        ..Default::default()
    });

    let adapter = instance
        .request_adapter(&wgpu::RequestAdapterOptions {
            power_preference: wgpu::PowerPreference::default(),
            compatible_surface: None,
            force_fallback_adapter: false,
        })
        .await
        .unwrap();

    let device_descriptor = wgpu::DeviceDescriptor {
        label: None,
        required_features: wgpu::Features::empty(),
        required_limits: wgpu::Limits::default(),
    };

    let (device, queue) = adapter
        .request_device(&device_descriptor, None)
        .await
        .unwrap();

    // The instance is a handle to our GPU
    let instance = wgpu::Instance::new(wgpu::InstanceDescriptor {
        backends: wgpu::Backends::METAL,
        ..Default::default()
    });

    let adapter = instance
        .request_adapter(&wgpu::RequestAdapterOptions {
            power_preference: wgpu::PowerPreference::default(),
            compatible_surface: None,
            force_fallback_adapter: false,
        })
        .await
        .unwrap();

    // wgpu 0.19.3
    let device_descriptor = wgpu::DeviceDescriptor {
        label: None,
        required_features: wgpu::Features::empty(),
        required_limits: wgpu::Limits::default(),
    };

    let (device, queue) = adapter
        .request_device(&device_descriptor, None)
        .await
        .unwrap();

    let texture_format = TextureFormat::Rgba8Unorm;
    let size = 256u32;
    let texture_desc = TextureDescriptor {
        size: Extent3d {
            width: size,
            height: size,
            depth_or_array_layers: 1,
        },
        mip_level_count: 1,
        sample_count: 1,
        dimension: TextureDimension::D2,
        format: texture_format,
        usage: TextureUsages::COPY_SRC | TextureUsages::RENDER_ATTACHMENT,
        label: None,
        view_formats: &[texture_format],
    };

    let texture = device.create_texture(&texture_desc);
    let texture_view = texture.create_view(&Default::default());

    let shader = device.create_shader_module(wgpu::ShaderModuleDescriptor {
        label: Some("Shader"),
        source: wgpu::ShaderSource::Wgsl(include_str!("shader.wgsl").into()),
    });

    let render_pipeline_layout = device.create_pipeline_layout(&wgpu::PipelineLayoutDescriptor {
        label: Some("Render Pipeline Layout"),
        bind_group_layouts: &[],
        push_constant_ranges: &[],
    });

    let render_pipeline = device.create_render_pipeline(&wgpu::RenderPipelineDescriptor {
        label: Some("Render Pipeline"),
        layout: Some(&render_pipeline_layout),
        vertex: wgpu::VertexState {
            module: &shader,
            entry_point: "vs_main",
            buffers: &[Vertex::desc()],
        },
        fragment: Some(wgpu::FragmentState {
            module: &shader,
            entry_point: "fs_main",
            targets: &[Some(wgpu::ColorTargetState {
                format: texture_format,
                blend: Some(wgpu::BlendState {
                    color: wgpu::BlendComponent::REPLACE,
                    alpha: wgpu::BlendComponent::REPLACE,
                }),
                write_mask: wgpu::ColorWrites::ALL,
            })],
        }),
        primitive: wgpu::PrimitiveState {
            topology: wgpu::PrimitiveTopology::TriangleList,
            strip_index_format: None,
            front_face: wgpu::FrontFace::Ccw,
            cull_mode: Some(wgpu::Face::Back),
            polygon_mode: wgpu::PolygonMode::Fill,
            unclipped_depth: false,
            conservative: false,
        },
        depth_stencil: None,
        multisample: wgpu::MultisampleState {
            count: 1,
            mask: !0,
            alpha_to_coverage_enabled: false,
        },
        multiview: None,
    });

    let vertex_buffer = device.create_buffer_init(&wgpu::util::BufferInitDescriptor {
        label: Some("Vertex Buffer"),
        contents: bytemuck::cast_slice(VERTICES),
        usage: wgpu::BufferUsages::VERTEX,
    });
    let index_buffer = device.create_buffer_init(&wgpu::util::BufferInitDescriptor {
        label: Some("Index Buffer"),
        contents: bytemuck::cast_slice(INDICES),
        usage: wgpu::BufferUsages::INDEX,
    });
    let num_indices = INDICES.len() as u32;

    let mut encoder = device.create_command_encoder(&wgpu::CommandEncoderDescriptor {
        label: Some("Render Encoder"),
    });

    {
        let mut render_pass = encoder.begin_render_pass(&wgpu::RenderPassDescriptor {
            label: Some("Render Pass"),
            color_attachments: &[Some(wgpu::RenderPassColorAttachment {
                view: &texture_view,
                resolve_target: None,
                ops: wgpu::Operations {
                    load: wgpu::LoadOp::Clear(wgpu::Color {
                        r: 0.1,
                        g: 0.2,
                        b: 0.3,
                        a: 1.0,
                    }),
                    store: wgpu::StoreOp::Store,
                },
            })],
            depth_stencil_attachment: None,
            occlusion_query_set: None,
            timestamp_writes: None,
        });

        render_pass.set_pipeline(&render_pipeline);
        render_pass.set_vertex_buffer(0, vertex_buffer.slice(..));
        render_pass.set_index_buffer(index_buffer.slice(..), wgpu::IndexFormat::Uint16);
        render_pass.draw_indexed(0..num_indices, 0, 0..1);
    }
    queue.submit(iter::once(encoder.finish()));
    
    // // Dropping device first triggers a memory leak
    // drop(device);

    Ok(())
}

pub async fn run() {
    render().await.unwrap();
}

fn main() {
    for i in 0..1000 {
        if i % 10 == 0 {
            println!("{i}");
        }
        pollster::block_on(run());
    }
}

This works without a leak/crash. But if you uncomment the drop(device); line at the bottom of the function (which causes the device to be dropped before the texture and buffers), then the leak and crash behavior is present. My original repro (and my project) triggered this because the device property was the first struct property of the State struct, and so it was getting dropped before the Texture and TextureView. When I reorder the struct properties in the repro like this then there is no leak:

struct State {
    size: u32,
    vertex_buffer: wgpu::Buffer,
    index_buffer: wgpu::Buffer,
    num_indices: u32,
    texture_size: Extent3d,
    render_pipeline: wgpu::RenderPipeline,
    texture_view: TextureView,
    texture: Texture,
    queue: wgpu::Queue,
    device: wgpu::Device,
}

Figuring out that this is how things currently work is enough to unblock me, but I am wondering if this should be considered a bug or just something that should be documented (and apologies if it is and I didn't come across it).

cwfitzgerald · 2024-04-19T15:17:19Z

Oh yes, that's definitely a bug. Well done tracking it down!

gents83 · 2024-04-19T15:51:48Z

Sorry for not being very reactive or present in this period :(
Thanks a lot for your investigation @jonmmease
Apparently it seems that dropping the device doesn't force the queue submit to be finished and the release of resources before destroying, isn't it?
It could be even beneficial to add it to the mem_leaks.rs test in order to keep track of this issue too - while fixing it.

Icemic · 2024-05-17T05:55:24Z

Confirmed on Windows dx12, and btw, it has a great impact under dx12.

I have a scenario where I need to create multiple windows in a process, each using a separate instance of wgpu. These windows will be created and destroyed dynamically.
This problem has troubled me for a long time: every time I close the window, the memory does not return to the original level, but always leaks 256MiB every time (it seems to be the size of the dx12 framebuffer?). For vulkan or opengl, the amount of memory leaked is much smaller than dx12, I think it may be to do with theirs difference on default buffer size.

Having just read the discussion here, I've tried to change the way I destroy my “window and wgpu instance struct”, let's say called “instance”. From drop(instance) to let device = instance.device.clone(); drop(instance); drop(device);

Problem solved perfectly!!!!1

So I guess that's a temporary workaround solution as well.

FL33TW00D · 2024-06-03T19:31:56Z

Context leak detected, msgtracer returned -1 had a few of these kicking around my logs on Metal.
Changed the order of my parent struct to have the wgpu::Device dropped last and they all vanished! Thanks for the deep dive @jonmmease

paxbun · 2024-07-06T09:20:56Z

I don't have a minimal example, but this also happens on iOS, both by wgpu 0.19 and 0.20. Downgrading wgpu to 0.18 solved the issue. On the other hand, this does not happen on Android with Vulkan.

Here's an event list made with Instruments:

I created a depth buffer texture, and resized the screen by rotating the device several times.

The deallocations happen in wgpu::Queue::submit (stacktrace below), which is weird because Queue::submit does not deallocate the one passed to the render pass but another one, which should have been destroyed in another function when creating a new depth buffer with the new screen size.

For example, in the event list, Depth Buffer 0x27397a4c has never been used by any render pass (I checked this by logging the global ID), because it is replaced by the new buffer 0x27397a4e due to the rotation, but it is destroyed in Queue::submit.

I also tried manually destroying the textures by calling Texture::destroy, but the result was the same.

Sorry for not giving you a concrete example, but if you need further information, feel free to ask me!

teoxoy · 2024-07-31T15:25:08Z

This has been fixed by 4255268 (#5910).

git bisect start '--term-new=fixed' '--term-old=unfixed'
# status: waiting for both good and bad commits
# fixed: [6351a75b0cd9ec26948ca4934a5316fd781c567e] remove implemented TODO
git bisect fixed 6351a75b0cd9ec26948ca4934a5316fd781c567e
# status: waiting for good commit(s), bad commit known
# unfixed: [87576b72b37c6b78b41104eb25fc31893af94092] Versions and Changelog
git bisect unfixed 87576b72b37c6b78b41104eb25fc31893af94092
# unfixed: [484457d95993b00b91905fae0e539a093423cc28] Fix wgpu-hal build
git bisect unfixed 484457d95993b00b91905fae0e539a093423cc28
# unfixed: [588950110af8aca278516ec15d33ef6b7b66588c] Remove lifetime dependency of `ComputePass` to its parent command encoder (#5620)
git bisect unfixed 588950110af8aca278516ec15d33ef6b7b66588c
# unfixed: [985968fd38924fabda8776e2dde8f65213a8f156] remove unused `Resource` bound
git bisect unfixed 985968fd38924fabda8776e2dde8f65213a8f156
# fixed: [a8b0f2f6a6abdb60cbd9ba58f41d07eaef502666] remove device arg from `StagingBuffer.flush()`
git bisect fixed a8b0f2f6a6abdb60cbd9ba58f41d07eaef502666
# fixed: [a8b0f2f6a6abdb60cbd9ba58f41d07eaef502666] remove device arg from `StagingBuffer.flush()`
git bisect fixed a8b0f2f6a6abdb60cbd9ba58f41d07eaef502666
# unfixed: [e26d2d776334d51f5c5f0d43a8d0b1ba17a8275e] move command buffer resolving in `Global`'s methods
git bisect unfixed e26d2d776334d51f5c5f0d43a8d0b1ba17a8275e
# unfixed: [140495006ed3f7a88a5609f965e668a226893d98] remove `device_maintain_ids`
git bisect unfixed 140495006ed3f7a88a5609f965e668a226893d98
# fixed: [b68966eea5aa6bb5514b2f4438321363b6811e7a] [wgpu-core] use the view's format not the texture's format
git bisect fixed b68966eea5aa6bb5514b2f4438321363b6811e7a
# unfixed: [0e1c1f7c07a36b0f77a1401eece652d637c2376e] replace the tracker in `Device` with a new `DeviceTracker`
git bisect unfixed 0e1c1f7c07a36b0f77a1401eece652d637c2376e
# unfixed: [ac88c738c0abe983a24810c05bd14a6435988133] remove unused `BufferTracker.get()`
git bisect unfixed ac88c738c0abe983a24810c05bd14a6435988133
# unfixed: [aa9cb71a541970f6c509a0936b47964a9d398b71] introduce `DeviceBufferTracker` which holds weak references to buffers
git bisect unfixed aa9cb71a541970f6c509a0936b47964a9d398b71
# fixed: [425526828f738c95ec50b016c6a761bc00d2fb25] introduce `DeviceTextureTracker` which holds weak references to textures
git bisect fixed 425526828f738c95ec50b016c6a761bc00d2fb25
# first fixed commit: [425526828f738c95ec50b016c6a761bc00d2fb25] introduce `DeviceTextureTracker` which holds weak references to textures

gents83 · 2024-07-31T16:20:30Z

@teoxoy you're doing an amazing work! Tnx a lot!!!

jonmmease mentioned this issue Apr 13, 2024

Memory leak on boids example #5394

Closed

teoxoy added this to WebGPU for Firefox Apr 15, 2024

jonmmease mentioned this issue Apr 18, 2024

Multiple leaks on Metal backend #5448

Closed

jonmmease mentioned this issue Apr 19, 2024

Change wgpu object drop order to fix memory leak jonmmease/avenger#67

Merged

cwfitzgerald changed the title ~~0.19 metal backend regression: Crashes with "Context leak detected, msgtracer returned -1"~~ Buffers Leaked when Device Handle is Dropped Before Buffers Apr 19, 2024

cwfitzgerald added type: bug Something isn't working help required We need community help to make this happen. area: correctness We're behaving incorrectly labels Apr 19, 2024

teoxoy mentioned this issue Jul 26, 2024

Fix leaks caused by mapped buffers #6053

Closed

teoxoy self-assigned this Jul 26, 2024

teoxoy moved this to Todo in WebGPU for Firefox Jul 27, 2024

teoxoy closed this as completed Jul 31, 2024

github-project-automation bot moved this from Todo to Done in WebGPU for Firefox Jul 31, 2024

teoxoy mentioned this issue Jul 31, 2024

Some buffer memory does not get freed after the device is dropped #5648

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Buffers Leaked when Device Handle is Dropped Before Buffers #5529

Buffers Leaked when Device Handle is Dropped Before Buffers #5529

jonmmease commented Apr 13, 2024

jonmmease commented Apr 13, 2024

jonmmease commented Apr 15, 2024

jonmmease commented Apr 17, 2024

grovesNL commented Apr 17, 2024

jonmmease commented Apr 17, 2024

jonmmease commented Apr 18, 2024

grovesNL commented Apr 18, 2024

jonmmease commented Apr 18, 2024

jonmmease commented Apr 18, 2024

cwfitzgerald commented Apr 18, 2024 •

edited

Loading

jonmmease commented Apr 19, 2024

jonmmease commented Apr 19, 2024

cwfitzgerald commented Apr 19, 2024

gents83 commented Apr 19, 2024

Icemic commented May 17, 2024

FL33TW00D commented Jun 3, 2024

paxbun commented Jul 6, 2024 •

edited

Loading

teoxoy commented Jul 31, 2024

gents83 commented Jul 31, 2024

Buffers Leaked when Device Handle is Dropped Before Buffers #5529

Buffers Leaked when Device Handle is Dropped Before Buffers #5529

Comments

jonmmease commented Apr 13, 2024

jonmmease commented Apr 13, 2024

jonmmease commented Apr 15, 2024

jonmmease commented Apr 17, 2024

grovesNL commented Apr 17, 2024

jonmmease commented Apr 17, 2024

jonmmease commented Apr 18, 2024

grovesNL commented Apr 18, 2024

jonmmease commented Apr 18, 2024

jonmmease commented Apr 18, 2024

cwfitzgerald commented Apr 18, 2024 • edited Loading

jonmmease commented Apr 19, 2024

jonmmease commented Apr 19, 2024

cwfitzgerald commented Apr 19, 2024

gents83 commented Apr 19, 2024

Icemic commented May 17, 2024

FL33TW00D commented Jun 3, 2024

paxbun commented Jul 6, 2024 • edited Loading

teoxoy commented Jul 31, 2024

gents83 commented Jul 31, 2024

cwfitzgerald commented Apr 18, 2024 •

edited

Loading

paxbun commented Jul 6, 2024 •

edited

Loading