Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue with timestamp queries with multi-pass #480

Open
alienself opened this issue Nov 23, 2024 · 7 comments
Open

Issue with timestamp queries with multi-pass #480

alienself opened this issue Nov 23, 2024 · 7 comments

Comments

@alienself
Copy link

Hi,

I encountered an issue with the TimestampQueryManager while working with the timestamp query example. It functions correctly when measuring a single pass. However, when multiple passes are measured from the same command encoder, the timing results become inconsistent.

For example, in my project, I have two passes:

  • One pass takes approximately 5ms when measured individually.
  • The other pass takes 0.3ms when measured individually.

However, when both passes are measured simultaneously, the results indicate that both passes take around 5ms, which is unexpected unless perhaps I am missing something?

@greggman Any idea what could cause this?

@kainino0x
Copy link
Collaborator

TimestampQueryManager as written only manages one pair of timestamps at a time. If you are using one manager for two passes then the second pass overwrites the values in the timestampQuerySet that were from the first pass. Even if you call resolve() in between that just causes the overwrite to happen in the timestampMapBuffer instead of the timestampQuerySet.

It's just sample code, not a library, so if you need it to do more than it does, then you should modify it accordingly :)

@alienself
Copy link
Author

@kainino0x ah sorry I should have mentioned that I am using the version from this commit by @greggman which seems to support multiple passes. You are right that the current version only supports a single query.

Code from:
https://github.com/greggman/webgpu-samples/blob/20cf82deb2a10014532166f71ffa826c1eb55fec/sample/timestampQuery/TimestampQueryManager.ts

  // Add both a start and end timestamp.
  addTimestampWrite(
    renderPassDescriptor: GPURenderPassDescriptor,
    pairId: number
  ) {
    if (this.timestampSupported) {
      // We instruct the render pass to write to the timestamp query before/after
      const ndx = pairId * 2;
      renderPassDescriptor.timestampWrites = {
        querySet: this.timestampQuerySet,
        beginningOfPassWriteIndex: ndx,
        endOfPassWriteIndex: ndx + 1,
      };
    }
    return renderPassDescriptor;
  }

@kainino0x
Copy link
Collaborator

That seems to be one of the intermediate states of #476, you might want to try the version before that instead:
https://github.com/webgpu/webgpu-samples/pull/472/files
(Not sure if it will make a difference or not)

@greggman
Copy link
Collaborator

greggman commented Nov 23, 2024

I'm definiately seeing something strange on my M1

Here's 9 pairs. Each pair is timing drawing a bunch of instances of cubes (at the same location so you will only see 9 cubes even though up to 3000 are being drawn just exactly in the same place. Also, no depth buffer). The number of cubes per timing pair changes over time from 1 to 3000. The first pair is always fast, the last pair is always slow, even when the first pair is drawing 3000 instances and the last pair is drawing 1.

https://jsgist.org/?src=fa2aab1a359720cf1a57a97444657a5d

I don't see the same issue in my Windows 11 NVidia. On that one the timings match the number of instances. More instances = longer time.

@greggman
Copy link
Collaborator

Filed a bug: https://issues.chromium.org/issues/380508203 (with a different example - turned on blending, rotated each instance slightly. Hoping to defeat any tiling optimizations?

@alienself
Copy link
Author

alienself commented Nov 23, 2024

Thank you for the link and for the bug report! I am encountering the exact same issue on a Mac M2 using Chrome 131, the first pass is always fast and the last one is always slow.

Interestingly, I tested the same demo on my iPhone XS (a 6-year-old device) using Safari with the WebGPU flag enabled. Surprisingly, the iPhone delivered 4x better performance compared to my Mac, even with a comparable viewport resolution.

Also instead of (Math.sin(i + time) * 0.5 + 0.5) * 3000 + 1 | 0; I tried with (10 * 0.5 + 0.5) * 3000 + 1 | 0;, mac: 3000μs/pass, iphone: 700μs per pass

I find this discrepancy puzzling—how could an older device outperform a newer one in this context? Unless the measurements returned by Chrome are not correct or adds crazy overhead?

@greggman
Copy link
Collaborator

greggman commented Nov 23, 2024

There have been other reports of timestamp-query issues on M1/M2/M3

I notice Safari has the same issue so it seems unlikely to be a chrome bug

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants