Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Buffers Leaked when Device Handle is Dropped Before Buffers #5529

Closed
jonmmease opened this issue Apr 13, 2024 · 19 comments
Closed

Buffers Leaked when Device Handle is Dropped Before Buffers #5529

jonmmease opened this issue Apr 13, 2024 · 19 comments
Assignees
Labels
area: correctness We're behaving incorrectly help required We need community help to make this happen. type: bug Something isn't working

Comments

@jonmmease
Copy link

Description
After repeated headless rendering with the metal backend, wgpu 0.19 crashes with a "Context leak detected, msgtracer returned -1" error where version 0.18.0 works without a problem.

My workflow is to perform headless rendering to PNG images following the approach in the Wgpu without a window learn-wgpu page.

Repro steps
I've created a repro repository in https://github.com/jonmmease/wgpu-memory-repro that reliable reproduces the crash on my M1 macbook pro. See the README for more details about the repro and full system details.

Expected vs observed behavior
In wgpu 0.18.0, everything works well. in 0.19 (I tried 0.19.0 through 0.19.3) I get this crash after ~100 iterations.

Context leak detected, msgtracer returned -1

thread 'main' panicked at /Users/jonmmease/.cargo/registry/src/index.crates.io-6f17d22bba15001f/foreign-types-shared-0.3.1/src/lib.rs:72:9:
assertion failed: !ptr.is_null()
stack backtrace:
   0: rust_begin_unwind
             at /rustc/07dca489ac2d933c78d3c5158e3f43beefeb02ce/library/std/src/panicking.rs:645:5
   1: core::panicking::panic_fmt
             at /rustc/07dca489ac2d933c78d3c5158e3f43beefeb02ce/library/core/src/panicking.rs:72:14
   2: core::panicking::panic
             at /rustc/07dca489ac2d933c78d3c5158e3f43beefeb02ce/library/core/src/panicking.rs:144:5
   3: foreign_types_shared::ForeignTypeRef::from_ptr
             at /Users/jonmmease/.cargo/registry/src/index.crates.io-6f17d22bba15001f/foreign-types-shared-0.3.1/src/lib.rs:72:9
   4: <metal::commandqueue::CommandQueue as core::ops::deref::Deref>::deref
             at /Users/jonmmease/.cargo/registry/src/index.crates.io-6f17d22bba15001f/metal-0.27.0/src/commandqueue.rs:13:1
   5: wgpu_hal::metal::command::<impl wgpu_hal::CommandEncoder<wgpu_hal::metal::Api> for wgpu_hal::metal::CommandEncoder>::begin_encoding::{{closure}}
             at /Users/jonmmease/.cargo/registry/src/index.crates.io-6f17d22bba15001f/wgpu-hal-0.19.3/src/metal/command.rs:179:17
   6: objc::rc::autorelease::autoreleasepool
             at /Users/jonmmease/.cargo/registry/src/index.crates.io-6f17d22bba15001f/objc-0.2.7/src/rc/autorelease.rs:29:5
   7: wgpu_hal::metal::command::<impl wgpu_hal::CommandEncoder<wgpu_hal::metal::Api> for wgpu_hal::metal::CommandEncoder>::begin_encoding
             at /Users/jonmmease/.cargo/registry/src/index.crates.io-6f17d22bba15001f/wgpu-hal-0.19.3/src/metal/command.rs:175:19
   8: wgpu_core::device::queue::PendingWrites<A>::activate
             at /Users/jonmmease/.cargo/registry/src/index.crates.io-6f17d22bba15001f/wgpu-core-0.19.3/src/device/queue.rs:265:17
   9: wgpu_core::device::resource::Device<A>::new
             at /Users/jonmmease/.cargo/registry/src/index.crates.io-6f17d22bba15001f/wgpu-core-0.19.3/src/device/resource.rs:230:9
  10: wgpu_core::instance::Adapter<A>::create_device_and_queue_from_hal
             at /Users/jonmmease/.cargo/registry/src/index.crates.io-6f17d22bba15001f/wgpu-core-0.19.3/src/instance.rs:306:29
  11: wgpu_core::instance::Adapter<A>::create_device_and_queue
             at /Users/jonmmease/.cargo/registry/src/index.crates.io-6f17d22bba15001f/wgpu-core-0.19.3/src/instance.rs:381:9
  12: wgpu_core::instance::<impl wgpu_core::global::Global<G>>::adapter_request_device
             at /Users/jonmmease/.cargo/registry/src/index.crates.io-6f17d22bba15001f/wgpu-core-0.19.3/src/instance.rs:1084:23
  13: <wgpu::backend::wgpu_core::ContextWgpuCore as wgpu::context::Context>::adapter_request_device
             at /Users/jonmmease/.cargo/registry/src/index.crates.io-6f17d22bba15001f/wgpu-0.19.3/src/backend/wgpu_core.rs:587:44
  14: <T as wgpu::context::DynContext>::adapter_request_device
             at /Users/jonmmease/.cargo/registry/src/index.crates.io-6f17d22bba15001f/wgpu-0.19.3/src/context.rs:2019:22
  15: wgpu::Adapter::request_device
             at /Users/jonmmease/.cargo/registry/src/index.crates.io-6f17d22bba15001f/wgpu-0.19.3/src/lib.rs:2119:22
  16: wgpu_memory_repro::State::new::{{closure}}
             at ./src/main.rs:103:31
  17: wgpu_memory_repro::run::{{closure}}
             at ./src/main.rs:300:34
  18: pollster::block_on
             at /Users/jonmmease/.cargo/registry/src/index.crates.io-6f17d22bba15001f/pollster-0.3.0/src/lib.rs:128:15
  19: wgpu_memory_repro::main
             at ./src/main.rs:309:9
  20: core::ops::function::FnOnce::call_once
             at /rustc/07dca489ac2d933c78d3c5158e3f43beefeb02ce/library/core/src/ops/function.rs:250:5
note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.

Extra materials
I went through the Metal section of https://github.com/gfx-rs/wgpu/wiki/Debugging-wgpu-Applications, creating the XML file and updating the executable with the codesign command. Then I ran the executable with METAL_DEVICE_WRAPPER_TYPE=1 ./target/debug/wgpu-memory-repro. The console printed out wgpu-memory-repro[4512:4643030] Metal API Validation Enabled, but no additional information was displayed. Let me know if that's not the correct process and there's anything more I can do to get more info.

Thanks a lot!

Platform

  • Metal backend
  • Chip: Apple M1 Pro
  • macOS: Sonoma 14.4.1
  • Memory 32 GB
@jonmmease
Copy link
Author

After reading through #3056, I tried setting OBJC_DEBUG_MISSING_POOLS=YES and I get pages and pages of messages like this:

 % OBJC_DEBUG_MISSING_POOLS=YES METAL_DEVICE_WRAPPER_TYPE=1 ./target/debug/wgpu-memory-repro

2024-04-13 10:41:33.085 wgpu-memory-repro[58773:4772502] Metal API Validation Enabled
objc[58773]: MISSING POOLS: (0x207fb3ac0) Object 0x600001a28060 of class __NSSingleObjectArrayI autoreleased with no pool in place - just leaking - break on objc_autoreleaseNoPool() to debug
objc[58773]: MISSING POOLS: (0x207fb3ac0) Object 0x600001624e70 of class __NSCFString autoreleased with no pool in place - just leaking - break on objc_autoreleaseNoPool() to debug
objc[58773]: MISSING POOLS: (0x207fb3ac0) Object 0x600001624e40 of class __NSCFString autoreleased with no pool in place - just leaking - break on objc_autoreleaseNoPool() to debug
objc[58773]: MISSING POOLS: (0x207fb3ac0) Object 0x600003b30000 of class __NSCFData autoreleased with no pool in place - just leaking - break on objc_autoreleaseNoPool() to debug
objc[58773]: MISSING POOLS: (0x207fb3ac0) Object 0x600001a281d0 of class __NSSingleObjectArrayI autoreleased with no pool in place - just leaking - break on objc_autoreleaseNoPool() to debug
objc[58773]: MISSING POOLS: (0x207fb3ac0) Object 0x600001624900 of class __NSCFString autoreleased with no pool in place - just leaking - break on objc_autoreleaseNoPool() to debug
objc[58773]: MISSING POOLS: (0x207fb3ac0) Object 0x600001624de0 of class __NSCFString autoreleased with no pool in place - just leaking - break on objc_autoreleaseNoPool() to debug
objc[58773]: MISSING POOLS: (0x207fb3ac0) Object 0x600003b24320 of class __NSCFData autoreleased with no pool in place - just leaking - break on objc_autoreleaseNoPool() to debug
objc[58773]: MISSING POOLS: (0x207fb3ac0) Object 0x600003b24370 of class MTLCommandQueueDescriptorInternal autoreleased with no pool in place - just leaking - break on objc_autoreleaseNoPool() to debug
objc[58773]: MISSING POOLS: (0x207fb3ac0) Object 0x600001a34020 of class __NSSingleObjectArrayI autoreleased with no pool in place - just leaking - break on objc_autoreleaseNoPool() to debug
objc[58773]: MISSING POOLS: (0x207fb3ac0) Object 0x600001634030 of class __NSCFString autoreleased with no pool in place - just leaking - break on objc_autoreleaseNoPool() to debug
objc[58773]: MISSING POOLS: (0x207fb3ac0) Object 0x600001634300 of class __NSCFString autoreleased with no pool in place - just leaking - break on objc_autoreleaseNoPool() to debug
objc[58773]: MISSING POOLS: (0x207fb3ac0) Object 0x600003b340a0 of class __NSCFData autoreleased with no pool in place - just leaking - break on objc_autoreleaseNoPool() to debug
objc[58773]: MISSING POOLS: (0x207fb3ac0) Object 0x600000d24b00 of class __NSCFString autoreleased with no pool in place - just leaking - break on objc_autoreleaseNoPool() to debug
objc[58773]: MISSING POOLS: (0x207fb3ac0) Object 0x600000d24ac0 of class __NSCFString autoreleased with no pool in place - just leaking - break on objc_autoreleaseNoPool() to debug
objc[58773]: MISSING POOLS: (0x207fb3ac0) Object 0x600000d24a80 of class __NSCFString autoreleased with no pool in place - just leaking - break on objc_autoreleaseNoPool() to debug
objc[58773]: MISSING POOLS: (0x207fb3ac0) Object 0x600000d24a40 of class __NSCFString autoreleased with no pool in place - just leaking - break on objc_autoreleaseNoPool() to debug
objc[58773]: MISSING POOLS: (0x207fb3ac0) Object 0x600000d24a00 of class __NSCFString autoreleased with no pool in place - just leaking - break on objc_autoreleaseNoPool() to debug
objc[58773]: MISSING POOLS: (0x207fb3ac0) Object 0x600000d249c0 of class __NSCFString autoreleased with no pool in place - just leaking - break on objc_autoreleaseNoPool() to debug
objc[58773]: MISSING POOLS: (0x207fb3ac0) Object 0x600000d24940 of class __NSCFString autoreleased with no pool in place - just leaking - break on objc_autoreleaseNoPool() to debug
objc[58773]: MISSING POOLS: (0x207fb3ac0) Object 0x600000d24600 of class __NSCFString autoreleased with no pool in place - just leaking - break on objc_autoreleaseNoPool() to debug
objc[58773]: MISSING POOLS: (0x207fb3ac0) Object 0x600000d246c0 of class __NSCFString autoreleased with no pool in place - just leaking - break on objc_autoreleaseNoPool() to debug
objc[58773]: MISSING POOLS: (0x207fb3ac0) Object 0x600000d24680 of class __NSCFString autoreleased with no pool in place - just leaking - break on objc_autoreleaseNoPool() to debug
objc[58773]: MISSING POOLS: (0x207fb3ac0) Object 0x600000d24740 of class __NSCFString autoreleased with no pool in place - just leaking - break on objc_autoreleaseNoPool() to debug
objc[58773]: MISSING POOLS: (0x207fb3ac0) Object 0x6000016260d0 of class __NSArrayM autoreleased with no pool in place - just leaking - break on objc_autoreleaseNoPool() to debug
...
thread 'main' panicked at /Users/jonmmease/.cargo/registry/src/index.crates.io-6f17d22bba15001f/foreign-types-shared-0.3.1/src/lib.rs:72:9:
assertion failed: !ptr.is_null()

But it looks like similar messages are printed with wgpu 0.18.0 (when the program doesn't crash), so I'm not sure whether this is significant.

@jonmmease
Copy link
Author

According to git-bisect, and my repro above, this bug was introduced in #3626.

Friendly ping @gents83 and @nical in case anything here rings a bell based on your work/review on that PR

@jonmmease
Copy link
Author

I was able to simplify the repro a lot more. It turns out that the png export workflow isn't a relevant factor, and the crash can be reproduced by repeatedly allocating a vertex buffer. For example:

use wgpu::util::DeviceExt;

pub async fn run() {
    let instance = wgpu::Instance::new(wgpu::InstanceDescriptor {
        backends: wgpu::Backends::METAL,
        ..Default::default()
    });

    let adapter = instance
        .request_adapter(&wgpu::RequestAdapterOptions {
            power_preference: wgpu::PowerPreference::default(),
            compatible_surface: None,
            force_fallback_adapter: false,
        })
        .await
        .unwrap();

    // // wgpu 0.18
    // let device_descriptor = wgpu::DeviceDescriptor {
    //     label: None,
    //     features: wgpu::Features::empty(),
    //     limits: wgpu::Limits::default(),
    // };

    // wgpu 0.19.3
    let device_descriptor = wgpu::DeviceDescriptor {
        label: None,
        required_features: wgpu::Features::empty(),
        required_limits: wgpu::Limits::default(),
    };

    let (device, _) = adapter
        .request_device(&device_descriptor, None)
        .await
        .unwrap();

    // Allocate a vertex buffer that should be dropped immediately.
    // Without this allocation, there is no leak/crash.
    let _ = device.create_buffer_init(&wgpu::util::BufferInitDescriptor {
        label: Some("Vertex Buffer"),
        contents: bytemuck::cast_slice(&[0; 1024]),
        usage: wgpu::BufferUsages::VERTEX,
    });
}

fn main() {
    for i in 0..1000 {
        if i % 10 == 0 {
            println!("{i}");
        }
        pollster::block_on(run());
    }
}
0
10
20
30
40
50
60
70
80
90
100
110
120
130
140
Context leak detected, msgtracer returned -1
150
160
170
180
190
Context leak detected, msgtracer returned -1
200
210
220
230
240
Context leak detected, msgtracer returned -1
250
260
270
280
290
Context leak detected, msgtracer returned -1
300
310
320
330
340
Context leak detected, msgtracer returned -1
350
360
370
380
390
Context leak detected, msgtracer returned -1
400
410
420
430
440
Context leak detected, msgtracer returned -1
450
460
470
480
490
Context leak detected, msgtracer returned -1
500
510
520
530
540
Context leak detected, msgtracer returned -1
550
560
570
580
590
Context leak detected, msgtracer returned -1
600
610
620
630
640
Context leak detected, msgtracer returned -1
650
660
670
680
690
Context leak detected, msgtracer returned -1
700
710
720
730
740
Context leak detected, msgtracer returned -1
750
760
770
780
790
thread 'main' panicked at 'assertion failed: !ptr.is_null()', /Users/jonmmease/.cargo/registry/src/index.crates.io-6f17d22bba15001f/foreign-types-shared-0.3.1/src/lib.rs:72:9
stack backtrace:
   0: rust_begin_unwind
             at /rustc/90c541806f23a127002de5b4038be731ba1458ca/library/std/src/panicking.rs:578:5
   1: core::panicking::panic_fmt
             at /rustc/90c541806f23a127002de5b4038be731ba1458ca/library/core/src/panicking.rs:67:14
   2: core::panicking::panic
             at /rustc/90c541806f23a127002de5b4038be731ba1458ca/library/core/src/panicking.rs:117:5
   3: foreign_types_shared::ForeignTypeRef::from_ptr
             at /Users/jonmmease/.cargo/registry/src/index.crates.io-6f17d22bba15001f/foreign-types-shared-0.3.1/src/lib.rs:72:9
   4: <metal::commandqueue::CommandQueue as core::ops::deref::Deref>::deref
             at /Users/jonmmease/.cargo/registry/src/index.crates.io-6f17d22bba15001f/metal-0.27.0/src/commandqueue.rs:13:1
   5: wgpu_hal::metal::command::<impl wgpu_hal::CommandEncoder<wgpu_hal::metal::Api> for wgpu_hal::metal::CommandEncoder>::begin_encoding::{{closure}}
             at /Users/jonmmease/VegaFusion/repos/wgpu/wgpu-hal/src/metal/command.rs:179:17
   6: objc::rc::autorelease::autoreleasepool
             at /Users/jonmmease/.cargo/registry/src/index.crates.io-6f17d22bba15001f/objc-0.2.7/src/rc/autorelease.rs:29:5
   7: wgpu_hal::metal::command::<impl wgpu_hal::CommandEncoder<wgpu_hal::metal::Api> for wgpu_hal::metal::CommandEncoder>::begin_encoding
             at /Users/jonmmease/VegaFusion/repos/wgpu/wgpu-hal/src/metal/command.rs:175:19
   8: wgpu_core::device::queue::PendingWrites<A>::activate
             at /Users/jonmmease/VegaFusion/repos/wgpu/wgpu-core/src/device/queue.rs:280:17
   9: wgpu_core::device::resource::Device<A>::new
             at /Users/jonmmease/VegaFusion/repos/wgpu/wgpu-core/src/device/resource.rs:226:9
  10: wgpu_core::instance::Adapter<A>::create_device_and_queue_from_hal
             at /Users/jonmmease/VegaFusion/repos/wgpu/wgpu-core/src/instance.rs:312:29
  11: wgpu_core::instance::Adapter<A>::create_device_and_queue
             at /Users/jonmmease/VegaFusion/repos/wgpu/wgpu-core/src/instance.rs:386:9
  12: wgpu_core::instance::<impl wgpu_core::global::Global<G>>::adapter_request_device
             at /Users/jonmmease/VegaFusion/repos/wgpu/wgpu-core/src/instance.rs:1200:23
  13: <wgpu::backend::direct::Context as wgpu::context::Context>::adapter_request_device
             at /Users/jonmmease/VegaFusion/repos/wgpu/wgpu/src/backend/direct.rs:635:44
  14: <T as wgpu::context::DynContext>::adapter_request_device
             at /Users/jonmmease/VegaFusion/repos/wgpu/wgpu/src/context.rs:2124:22
  15: wgpu::Adapter::request_device
             at /Users/jonmmease/VegaFusion/repos/wgpu/wgpu/src/lib.rs:2263:22
  16: wgpu_memory_repro2::run::{{closure}}
             at ./src/main.rs:32:23
  17: pollster::block_on
             at /Users/jonmmease/.cargo/registry/src/index.crates.io-6f17d22bba15001f/pollster-0.3.0/src/lib.rs:128:15
  18: wgpu_memory_repro2::main
             at ./src/main.rs:49:9
  19: core::ops::function::FnOnce::call_once
             at /rustc/90c541806f23a127002de5b4038be731ba1458ca/library/core/src/ops/function.rs:250:5

@grovesNL
Copy link
Collaborator

Does the leak happen if you wrap the entire thing in an autorelease pool? I guess you might be noticing this because it's running headless and nothing provides a global autorelease pool (vs. a windowed application that would have an autorelease pool automatically added by the windowing library).

@jonmmease
Copy link
Author

Thanks for the suggestion @grovesNL

I'm getting the same result when I update the above like this:

        objc::rc::autoreleasepool(|| {
            pollster::block_on(run());
        })

or if I wrap the create_buffer_init call like this:

    objc::rc::autoreleasepool(|| {
        let _ = device.create_buffer_init(&wgpu::util::BufferInitDescriptor {
            label: Some("Vertex Buffer"),
            contents: bytemuck::cast_slice(&[0; 1024]),
            usage: wgpu::BufferUsages::VERTEX,
        });
    })

Let me know if that's not what you had in mind. Thanks again!

@jonmmease
Copy link
Author

Debugging a bit. I see that when the buffer is created this create_buffer function is called.

unsafe fn create_buffer(&self, desc: &crate::BufferDescriptor) -> DeviceResult<super::Buffer> {

But I haven't been able to figure out whether there is metal-specific cleanup logic that is supposed to be called when a buffer is dropped, or if the autoreleasepool in the create_buffer function above is supposed to take care of that.

@grovesNL
Copy link
Collaborator

Yeah that seems right. We basically need an autorelease pool to be active whenever types like Metal buffers are created/cloned (retained)/dropped (releaseed)/etc., so if there is a place that does that without an autorelease pool active then it could leak. I think the one you used for the entire program would be enough, but maybe there's something else going on.

I haven't worked with the Metal backend for a while, but maybe something else (internally to wgpu) holds a strong reference to the buffer you're creating? The Xcode debugger might be able to show you if anything's referencing it.

@jonmmease
Copy link
Author

Thanks @grovesNL, that's helpful.

I started playing with Xcode Instruments to analyze leaks. It's a little overwhelming. For the commit before #3626, no leaks are detected (though memory usage does climb a little over time).

Screenshot 2024-04-18 at 11 30 44 AM

But for #3626 and beyond, there are dozens of leaks reported and memory usage increases dramatically over time until the program crashes.

Screenshot 2024-04-18 at 11 29 55 AM

When I have more time I'll try to look at some of these stack traces more carefully and see if I can make more sense of it.

@jonmmease
Copy link
Author

I found something potentially suspicious. In my repro, each iteration results in a device being created using the new associated function at

pub(crate) fn new(

But the corresponding drop associated function is never called

impl<A: HalApi> Drop for Device<A> {
fn drop(&mut self) {
resource_log!("Destroy raw Device {:?}", self.info.label());
let raw = self.raw.take().unwrap();
let pending_writes = self.pending_writes.lock().take().unwrap();
pending_writes.dispose(&raw);
self.command_allocator.dispose(&raw);
unsafe {
raw.destroy_buffer(self.zero_buffer.take().unwrap());
raw.destroy_fence(self.fence.write().take().unwrap());
let queue = self.queue_to_drop.take().unwrap();
raw.exit(queue);
}
}
}

I think the leak of buffers may be associated with the fact that the device's buffers aren't getting cleaned up when the top-level handle to the device is dropped. This would suggest that this might not be specifically a metal issue, and perhaps I'm hitting it more than most because I end up creating and dropping devices repeatedly.

@cwfitzgerald
Copy link
Member

cwfitzgerald commented Apr 18, 2024

The ownership is actually the other way around, buffers have owning refs on the device - the buffer leak is probably causing the device to stay alive.

See #5120 for the ideal ownership diagram.

@jonmmease
Copy link
Author

Thanks @cwfitzgerald, that's helpful. I think I'm honing in on a circular reference.

In the repro above, the call to device.create_buffer_init unmaps the buffer after writing to it. This buffer.unmap eventually calls Buffer.ummap_inner:

fn unmap_inner(self: &Arc<Self>) -> Result<Option<BufferMapPendingClosure>, BufferAccessError> {

This performs a match on the current value of map_state, and in my case this takes the resource::BufferMapState::Init branch.

resource::BufferMapState::Init {
ptr,
stage_buffer,
needs_flush,
} => {

The second to last line of this branch calls pending_writes.consume_temp, passing in the stage_buffer.

pending_writes.consume_temp(queue::TempResource::Buffer(stage_buffer));

The stage buffer is an instance of this Buffer struct, which has a strong reference to the Device.

#[derive(Debug)]
pub struct Buffer<A: HalApi> {
pub(crate) raw: Snatchable<A::Buffer>,
pub(crate) device: Arc<Device<A>>,

This pending_writes.consume_temp call adds the Buffer to the temp_resources Vec of PendingWrites.

pub(crate) struct PendingWrites<A: HalApi> {
pub command_encoder: A::CommandEncoder,
/// True if `command_encoder` is in the "recording" state, as
/// described in the docs for the [`wgpu_hal::CommandEncoder`]
/// trait.
///
/// [`wgpu_hal::CommandEncoder`]: hal::CommandEncoder
pub is_recording: bool,
pub temp_resources: Vec<TempResource<A>>,

This Vec is never cleared because in this repro I'm not submitting anything to the queue, and the dispose function is never called...

pub fn dispose(mut self, device: &A::Device) {
unsafe {
if self.is_recording {
self.command_encoder.discard_encoding();
}
self.command_encoder
.reset_all(self.executing_command_buffers.into_iter());
device.destroy_command_encoder(self.command_encoder);
}
self.temp_resources.clear();
}

because the Device is never dropped, because a reference to it is in pending_writes.

impl<A: HalApi> Drop for Device<A> {
fn drop(&mut self) {
resource_log!("Destroy raw Device {:?}", self.info.label());
let raw = self.raw.take().unwrap();
let pending_writes = self.pending_writes.lock().take().unwrap();
pending_writes.dispose(&raw);

If I add

let commands: Vec<CommandBuffer> = Vec::new();
queue.submit(commands.into_iter());

to the end of the repro then the device does get dropped successfully, and there is no leak/crash.

Unfortunately, in the original repro (and my real project) I am submitting work the the queue, so this isn't quite what's going on there.

@jonmmease
Copy link
Author

Ok, I've made a bit more progress. It looks like order in which wgpu objects are dropped determines whether there is a leak or not.

Here's another example where I've flattened everything into a single render function (rather than storing the device and other resources in the State struct.

use std::iter;
use wgpu::util::DeviceExt;
use wgpu::{
    CommandBuffer, CommandEncoderDescriptor, Extent3d, ImageCopyBuffer, ImageCopyTexture,
    ImageDataLayout, Origin3d, Texture, TextureAspect, TextureDescriptor, TextureDimension,
    TextureFormat, TextureUsages, TextureView,
};

#[repr(C)]
#[derive(Copy, Clone, Debug, bytemuck::Pod, bytemuck::Zeroable)]
struct Vertex {
    position: [f32; 3],
    color: [f32; 3],
}

impl Vertex {
    fn desc() -> wgpu::VertexBufferLayout<'static> {
        wgpu::VertexBufferLayout {
            array_stride: std::mem::size_of::<Vertex>() as wgpu::BufferAddress,
            step_mode: wgpu::VertexStepMode::Vertex,
            attributes: &[
                wgpu::VertexAttribute {
                    offset: 0,
                    shader_location: 0,
                    format: wgpu::VertexFormat::Float32x3,
                },
                wgpu::VertexAttribute {
                    offset: std::mem::size_of::<[f32; 3]>() as wgpu::BufferAddress,
                    shader_location: 1,
                    format: wgpu::VertexFormat::Float32x3,
                },
            ],
        }
    }
}

const VERTICES: &[Vertex] = &[
    Vertex {
        position: [-0.0868241, 0.49240386, 0.0],
        color: [0.5, 0.0, 0.5],
    }, // A
    Vertex {
        position: [-0.49513406, 0.06958647, 0.0],
        color: [0.5, 0.0, 0.5],
    }, // B
    Vertex {
        position: [-0.21918549, -0.44939706, 0.0],
        color: [0.5, 0.0, 0.5],
    }, // C
    Vertex {
        position: [0.35966998, -0.3473291, 0.0],
        color: [0.5, 0.0, 0.5],
    }, // D
    Vertex {
        position: [0.44147372, 0.2347359, 0.0],
        color: [0.5, 0.0, 0.5],
    }, // E
];

const INDICES: &[u16] = &[0, 1, 4, 1, 2, 4, 2, 3, 4, /* padding */ 0];

async fn render() -> Result<(), wgpu::SurfaceError> {
    // The instance is a handle to our GPU
    let instance = wgpu::Instance::new(wgpu::InstanceDescriptor {
        backends: wgpu::Backends::METAL,
        ..Default::default()
    });

    let adapter = instance
        .request_adapter(&wgpu::RequestAdapterOptions {
            power_preference: wgpu::PowerPreference::default(),
            compatible_surface: None,
            force_fallback_adapter: false,
        })
        .await
        .unwrap();

    let device_descriptor = wgpu::DeviceDescriptor {
        label: None,
        required_features: wgpu::Features::empty(),
        required_limits: wgpu::Limits::default(),
    };

    let (device, queue) = adapter
        .request_device(&device_descriptor, None)
        .await
        .unwrap();

    // The instance is a handle to our GPU
    let instance = wgpu::Instance::new(wgpu::InstanceDescriptor {
        backends: wgpu::Backends::METAL,
        ..Default::default()
    });

    let adapter = instance
        .request_adapter(&wgpu::RequestAdapterOptions {
            power_preference: wgpu::PowerPreference::default(),
            compatible_surface: None,
            force_fallback_adapter: false,
        })
        .await
        .unwrap();

    // wgpu 0.19.3
    let device_descriptor = wgpu::DeviceDescriptor {
        label: None,
        required_features: wgpu::Features::empty(),
        required_limits: wgpu::Limits::default(),
    };

    let (device, queue) = adapter
        .request_device(&device_descriptor, None)
        .await
        .unwrap();

    let texture_format = TextureFormat::Rgba8Unorm;
    let size = 256u32;
    let texture_desc = TextureDescriptor {
        size: Extent3d {
            width: size,
            height: size,
            depth_or_array_layers: 1,
        },
        mip_level_count: 1,
        sample_count: 1,
        dimension: TextureDimension::D2,
        format: texture_format,
        usage: TextureUsages::COPY_SRC | TextureUsages::RENDER_ATTACHMENT,
        label: None,
        view_formats: &[texture_format],
    };

    let texture = device.create_texture(&texture_desc);
    let texture_view = texture.create_view(&Default::default());

    let shader = device.create_shader_module(wgpu::ShaderModuleDescriptor {
        label: Some("Shader"),
        source: wgpu::ShaderSource::Wgsl(include_str!("shader.wgsl").into()),
    });

    let render_pipeline_layout = device.create_pipeline_layout(&wgpu::PipelineLayoutDescriptor {
        label: Some("Render Pipeline Layout"),
        bind_group_layouts: &[],
        push_constant_ranges: &[],
    });

    let render_pipeline = device.create_render_pipeline(&wgpu::RenderPipelineDescriptor {
        label: Some("Render Pipeline"),
        layout: Some(&render_pipeline_layout),
        vertex: wgpu::VertexState {
            module: &shader,
            entry_point: "vs_main",
            buffers: &[Vertex::desc()],
        },
        fragment: Some(wgpu::FragmentState {
            module: &shader,
            entry_point: "fs_main",
            targets: &[Some(wgpu::ColorTargetState {
                format: texture_format,
                blend: Some(wgpu::BlendState {
                    color: wgpu::BlendComponent::REPLACE,
                    alpha: wgpu::BlendComponent::REPLACE,
                }),
                write_mask: wgpu::ColorWrites::ALL,
            })],
        }),
        primitive: wgpu::PrimitiveState {
            topology: wgpu::PrimitiveTopology::TriangleList,
            strip_index_format: None,
            front_face: wgpu::FrontFace::Ccw,
            cull_mode: Some(wgpu::Face::Back),
            polygon_mode: wgpu::PolygonMode::Fill,
            unclipped_depth: false,
            conservative: false,
        },
        depth_stencil: None,
        multisample: wgpu::MultisampleState {
            count: 1,
            mask: !0,
            alpha_to_coverage_enabled: false,
        },
        multiview: None,
    });

    let vertex_buffer = device.create_buffer_init(&wgpu::util::BufferInitDescriptor {
        label: Some("Vertex Buffer"),
        contents: bytemuck::cast_slice(VERTICES),
        usage: wgpu::BufferUsages::VERTEX,
    });
    let index_buffer = device.create_buffer_init(&wgpu::util::BufferInitDescriptor {
        label: Some("Index Buffer"),
        contents: bytemuck::cast_slice(INDICES),
        usage: wgpu::BufferUsages::INDEX,
    });
    let num_indices = INDICES.len() as u32;

    let mut encoder = device.create_command_encoder(&wgpu::CommandEncoderDescriptor {
        label: Some("Render Encoder"),
    });

    {
        let mut render_pass = encoder.begin_render_pass(&wgpu::RenderPassDescriptor {
            label: Some("Render Pass"),
            color_attachments: &[Some(wgpu::RenderPassColorAttachment {
                view: &texture_view,
                resolve_target: None,
                ops: wgpu::Operations {
                    load: wgpu::LoadOp::Clear(wgpu::Color {
                        r: 0.1,
                        g: 0.2,
                        b: 0.3,
                        a: 1.0,
                    }),
                    store: wgpu::StoreOp::Store,
                },
            })],
            depth_stencil_attachment: None,
            occlusion_query_set: None,
            timestamp_writes: None,
        });

        render_pass.set_pipeline(&render_pipeline);
        render_pass.set_vertex_buffer(0, vertex_buffer.slice(..));
        render_pass.set_index_buffer(index_buffer.slice(..), wgpu::IndexFormat::Uint16);
        render_pass.draw_indexed(0..num_indices, 0, 0..1);
    }
    queue.submit(iter::once(encoder.finish()));
    
    // // Dropping device first triggers a memory leak
    // drop(device);

    Ok(())
}

pub async fn run() {
    render().await.unwrap();
}

fn main() {
    for i in 0..1000 {
        if i % 10 == 0 {
            println!("{i}");
        }
        pollster::block_on(run());
    }
}

This works without a leak/crash. But if you uncomment the drop(device); line at the bottom of the function (which causes the device to be dropped before the texture and buffers), then the leak and crash behavior is present. My original repro (and my project) triggered this because the device property was the first struct property of the State struct, and so it was getting dropped before the Texture and TextureView. When I reorder the struct properties in the repro like this then there is no leak:

struct State {
    size: u32,
    vertex_buffer: wgpu::Buffer,
    index_buffer: wgpu::Buffer,
    num_indices: u32,
    texture_size: Extent3d,
    render_pipeline: wgpu::RenderPipeline,
    texture_view: TextureView,
    texture: Texture,
    queue: wgpu::Queue,
    device: wgpu::Device,
}

Figuring out that this is how things currently work is enough to unblock me, but I am wondering if this should be considered a bug or just something that should be documented (and apologies if it is and I didn't come across it).

@cwfitzgerald
Copy link
Member

Oh yes, that's definitely a bug. Well done tracking it down!

@cwfitzgerald cwfitzgerald changed the title 0.19 metal backend regression: Crashes with "Context leak detected, msgtracer returned -1" Buffers Leaked when Device Handle is Dropped Before Buffers Apr 19, 2024
@cwfitzgerald cwfitzgerald added type: bug Something isn't working help required We need community help to make this happen. area: correctness We're behaving incorrectly labels Apr 19, 2024
@gents83
Copy link
Contributor

gents83 commented Apr 19, 2024

Sorry for not being very reactive or present in this period :(
Thanks a lot for your investigation @jonmmease
Apparently it seems that dropping the device doesn't force the queue submit to be finished and the release of resources before destroying, isn't it?
It could be even beneficial to add it to the mem_leaks.rs test in order to keep track of this issue too - while fixing it.

@Icemic
Copy link

Icemic commented May 17, 2024

Confirmed on Windows dx12, and btw, it has a great impact under dx12.

I have a scenario where I need to create multiple windows in a process, each using a separate instance of wgpu. These windows will be created and destroyed dynamically.
This problem has troubled me for a long time: every time I close the window, the memory does not return to the original level, but always leaks 256MiB every time (it seems to be the size of the dx12 framebuffer?). For vulkan or opengl, the amount of memory leaked is much smaller than dx12, I think it may be to do with theirs difference on default buffer size.

Having just read the discussion here, I've tried to change the way I destroy my “window and wgpu instance struct”, let's say called “instance”. From drop(instance) to let device = instance.device.clone(); drop(instance); drop(device);

Problem solved perfectly!!!!1

So I guess that's a temporary workaround solution as well.

@FL33TW00D
Copy link
Contributor

Context leak detected, msgtracer returned -1 had a few of these kicking around my logs on Metal.
Changed the order of my parent struct to have the wgpu::Device dropped last and they all vanished! Thanks for the deep dive @jonmmease

@paxbun
Copy link

paxbun commented Jul 6, 2024

I don't have a minimal example, but this also happens on iOS, both by wgpu 0.19 and 0.20. Downgrading wgpu to 0.18 solved the issue. On the other hand, this does not happen on Android with Vulkan.

Here's an event list made with Instruments:
image

I created a depth buffer texture, and resized the screen by rotating the device several times.

The deallocations happen in wgpu::Queue::submit (stacktrace below), which is weird because Queue::submit does not deallocate the one passed to the render pass but another one, which should have been destroyed in another function when creating a new depth buffer with the new screen size.

image

For example, in the event list, Depth Buffer 0x27397a4c has never been used by any render pass (I checked this by logging the global ID), because it is replaced by the new buffer 0x27397a4e due to the rotation, but it is destroyed in Queue::submit.

I also tried manually destroying the textures by calling Texture::destroy, but the result was the same.

Sorry for not giving you a concrete example, but if you need further information, feel free to ask me!

@teoxoy
Copy link
Member

teoxoy commented Jul 31, 2024

This has been fixed by 4255268 (#5910).

git bisect start '--term-new=fixed' '--term-old=unfixed'
# status: waiting for both good and bad commits
# fixed: [6351a75b0cd9ec26948ca4934a5316fd781c567e] remove implemented TODO
git bisect fixed 6351a75b0cd9ec26948ca4934a5316fd781c567e
# status: waiting for good commit(s), bad commit known
# unfixed: [87576b72b37c6b78b41104eb25fc31893af94092] Versions and Changelog
git bisect unfixed 87576b72b37c6b78b41104eb25fc31893af94092
# unfixed: [484457d95993b00b91905fae0e539a093423cc28] Fix wgpu-hal build
git bisect unfixed 484457d95993b00b91905fae0e539a093423cc28
# unfixed: [588950110af8aca278516ec15d33ef6b7b66588c] Remove lifetime dependency of `ComputePass` to its parent command encoder (#5620)
git bisect unfixed 588950110af8aca278516ec15d33ef6b7b66588c
# unfixed: [985968fd38924fabda8776e2dde8f65213a8f156] remove unused `Resource` bound
git bisect unfixed 985968fd38924fabda8776e2dde8f65213a8f156
# fixed: [a8b0f2f6a6abdb60cbd9ba58f41d07eaef502666] remove device arg from `StagingBuffer.flush()`
git bisect fixed a8b0f2f6a6abdb60cbd9ba58f41d07eaef502666
# fixed: [a8b0f2f6a6abdb60cbd9ba58f41d07eaef502666] remove device arg from `StagingBuffer.flush()`
git bisect fixed a8b0f2f6a6abdb60cbd9ba58f41d07eaef502666
# unfixed: [e26d2d776334d51f5c5f0d43a8d0b1ba17a8275e] move command buffer resolving in `Global`'s methods
git bisect unfixed e26d2d776334d51f5c5f0d43a8d0b1ba17a8275e
# unfixed: [140495006ed3f7a88a5609f965e668a226893d98] remove `device_maintain_ids`
git bisect unfixed 140495006ed3f7a88a5609f965e668a226893d98
# fixed: [b68966eea5aa6bb5514b2f4438321363b6811e7a] [wgpu-core] use the view's format not the texture's format
git bisect fixed b68966eea5aa6bb5514b2f4438321363b6811e7a
# unfixed: [0e1c1f7c07a36b0f77a1401eece652d637c2376e] replace the tracker in `Device` with a new `DeviceTracker`
git bisect unfixed 0e1c1f7c07a36b0f77a1401eece652d637c2376e
# unfixed: [ac88c738c0abe983a24810c05bd14a6435988133] remove unused `BufferTracker.get()`
git bisect unfixed ac88c738c0abe983a24810c05bd14a6435988133
# unfixed: [aa9cb71a541970f6c509a0936b47964a9d398b71] introduce `DeviceBufferTracker` which holds weak references to buffers
git bisect unfixed aa9cb71a541970f6c509a0936b47964a9d398b71
# fixed: [425526828f738c95ec50b016c6a761bc00d2fb25] introduce `DeviceTextureTracker` which holds weak references to textures
git bisect fixed 425526828f738c95ec50b016c6a761bc00d2fb25
# first fixed commit: [425526828f738c95ec50b016c6a761bc00d2fb25] introduce `DeviceTextureTracker` which holds weak references to textures

@teoxoy teoxoy closed this as completed Jul 31, 2024
@gents83
Copy link
Contributor

gents83 commented Jul 31, 2024

@teoxoy you're doing an amazing work! Tnx a lot!!!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area: correctness We're behaving incorrectly help required We need community help to make this happen. type: bug Something isn't working
Projects
Status: Done
Development

No branches or pull requests

8 participants