Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Meshlet single pass depth downsampling (SPD) #13003

Merged
merged 18 commits into from
Jun 3, 2024

Conversation

JMS55
Copy link
Contributor

@JMS55 JMS55 commented Apr 17, 2024

Objective

  • Using multiple raster passes to generate the depth pyramid is extremely slow
  • Pulling data from the source image is the largest bottleneck, it's important to sample in a cache-aware pattern
  • Barriers and pipeline drain between the raster passes is the second largest bottleneck
  • Each separate RenderPass on the CPU is really expensive

Solution

  • Port FidelityFX SPD to WGSL, replacing meshlet's existing multiple raster passes with a single two compute dispatches. Lack of coherent buffers means we have to do the the last 64x64 tile from mip 7+ in a separate dispatch to ensure the mip 6 writes were flushed :(
  • Workgroup shared memory version only at the moment, as the subgroup operation is blocked by our upgrade to wgpu 0.20 Wgpu 0.20 #13186
  • Don't enforce a power-of-2 depth pyramid texture size, simply scaling by 0.5 is fine

@JMS55 JMS55 added A-Rendering Drawing game state to the screen C-Performance A change motivated by improving speed, memory usage or compile times labels Apr 17, 2024
@JMS55 JMS55 added this to the 0.14 milestone Apr 17, 2024
@JMS55 JMS55 requested a review from pcwalton April 17, 2024 01:54
@JMS55 JMS55 changed the title Meshlet single pass depth generation (SPD) Meshlet single pass depth downsampling (SPD) Apr 17, 2024
@pcwalton
Copy link
Contributor

Which parts do you want me to review? Presumably downsample_depth.wgsl? Anything else?

@JMS55
Copy link
Contributor Author

JMS55 commented Apr 17, 2024

The downsample shader, yeah. I can't post a link ATM, but you can find the full PR diff by changing the GitHub diff to compare against my meshlet-previous-frame-depth-pyramid branch.

Copy link
Contributor

@pcwalton pcwalton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall a straightforward port of FidelityFX, but I'm concerned about the lack of texture barrier and some more comments explaining the approach would be nice. Seems fine other than that.

crates/bevy_pbr/src/meshlet/downsample_depth.wgsl Outdated Show resolved Hide resolved
tex = vec2(workgroup_id * 64u) + vec2(x * 2u + 32u, y * 2u + 32u);
pix = vec2(workgroup_id * 32u) + vec2(x + 16u, y + 16u);
v[3] = reduce_load_mip_0(tex);
textureStore(mip_1, pix, vec4(v[3]));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not a big fan of the way there's so much repetition here, but I see that you're copying from FidelityFX, so I'm OK with it.

for (var i = 0u; i < 4u; i++) {
intermediate_memory[x][y] = v[i];
workgroupBarrier();
if local_invocation_index < 64u {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I had to stare at this a while. I assume what you're doing here is to work around the lack of subgroupQuadSwapX intrinsics. That's why you do local_invocation_index < 64 instead of local_invocation_index % 16 == 0 like FidelityFX does.

Can you add a comment saying that this is a workaround for lack of subgroup quad swap operations?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh wait, I guess this is similar to SpdDownsampleMips_0_1_LDS. OK, I see. Maybe add a TODO here so we can remember to update it when we have subgroup ops.

@group(0) @binding(0) var input_depth: texture_2d<f32>;
@group(0) @binding(1) var samplr: sampler;
/// Generates a hierarchal depth buffer.
/// Based on FidelityFX SPD https://github.com/GPUOpen-LibrariesAndSDKs/FidelityFX-SDK/blob/d7531ae47d8b36a5d4025663e731a47a38be882f/sdk/include/FidelityFX/gpu/spd/ffx_spd.h#L528
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd add a brief comment here explaining the overall approach. Perhaps:

Every thread we dispatch is responsible for four 2x2 quads in the original depth buffer. Each workgroup has a shared 16x16 texel tile to work with. Different approaches are used at each mip level:
Mip level 0 (64x64): Use texture gather instructions to sample at the center of each 2x2 quad. Do this 4 times for the 4 quads we're responsible for. Save each downsampled depth value, effectively slicing each 32x32 piece into four 16x16 pieces.
Mip level 1 (32x32): For each of the four 16x16 pieces we have, copy it into the tile memory, then downsample it and barrier. This results in a 16x16 tile, which we store in our tile memory.
Mip levels 2-5 (16x16 to 2x2): The entire tile fits in memory, so we just downsample it in place.
Mip levels 6+: Load from the global texture and downsample 4 pixels at a time. Only one workgroup remains at this point.

@JMS55 JMS55 marked this pull request as draft April 24, 2024 01:11
@JMS55
Copy link
Contributor Author

JMS55 commented Apr 24, 2024

Downscale is broken, will need to debug. Might be the barrier issue.

@JMS55 JMS55 marked this pull request as ready for review May 4, 2024 21:29
@JMS55 JMS55 requested a review from pcwalton May 4, 2024 21:31
@alice-i-cecile alice-i-cecile added S-Needs-Review Needs reviewer attention (from anyone!) to move forward S-Waiting-on-Author The author needs to make changes or address concerns before this can be merged labels May 16, 2024
@pcwalton pcwalton self-requested a review May 16, 2024 22:30
Copy link
Contributor

@pcwalton pcwalton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good now. Looking forward to this!

pcwalton added a commit to pcwalton/bevy that referenced this pull request May 22, 2024
[*Percentage-closer soft shadows*] are a technique from 2004 that allow
shadows to become blurrier farther from the objects that cast them. It
works by introducing a *blocker search* step that runs before the normal
shadow map sampling. The blocker search step detects the difference
between the depth of the fragment being rasterized and the depth of the
nearby samples in the depth buffer. Larger depth differences result in a
larger penumbra and therefore a blurrier shadow.

To enable PCSS, fill in the `soft_shadow_size` value in
`DirectionalLight` or `PointLight`. This shadow size value represents
the size of the light and should be tuned as appropriate for your scene.
Higher values result in a wider penumbra (i.e. blurrier shadows).

When using PCSS, temporal shadow maps
(`ShadowFilteringMethod::Temporal`) are recommended. If you don't use
`ShadowFilteringMethod::Temporal` and instead use
`ShadowFilteringMethod::Gaussian`, Bevy will use the same technique as
`Temporal`, but the result won't vary over time. This produces a rather
noisy result. Doing better would likely require downsampling the shadow
map, which would be complex and slower (and would require PR bevyengine#13003 to
land first).

A new example, `pcss`, has been added. It demonstrates the
percentage-closer soft shadow technique with directional lights, point
lights, non-temporal operation, and temporal operation. The assets are
my original work.

Fixes bevyengine#3631.

[*Percentage-closer soft shadows*]:
https://developer.download.nvidia.com/shaderlibrary/docs/shadow_PCSS.pdf
pcwalton added a commit to pcwalton/bevy that referenced this pull request May 22, 2024
[*Percentage-closer soft shadows*] are a technique from 2004 that allow
shadows to become blurrier farther from the objects that cast them. It
works by introducing a *blocker search* step that runs before the normal
shadow map sampling. The blocker search step detects the difference
between the depth of the fragment being rasterized and the depth of the
nearby samples in the depth buffer. Larger depth differences result in a
larger penumbra and therefore a blurrier shadow.

To enable PCSS, fill in the `soft_shadow_size` value in
`DirectionalLight` or `PointLight`. This shadow size value represents
the size of the light and should be tuned as appropriate for your scene.
Higher values result in a wider penumbra (i.e. blurrier shadows).

When using PCSS, temporal shadow maps
(`ShadowFilteringMethod::Temporal`) are recommended. If you don't use
`ShadowFilteringMethod::Temporal` and instead use
`ShadowFilteringMethod::Gaussian`, Bevy will use the same technique as
`Temporal`, but the result won't vary over time. This produces a rather
noisy result. Doing better would likely require downsampling the shadow
map, which would be complex and slower (and would require PR bevyengine#13003 to
land first).

A new example, `pcss`, has been added. It demonstrates the
percentage-closer soft shadow technique with directional lights, point
lights, non-temporal operation, and temporal operation. The assets are
my original work.

Fixes bevyengine#3631.

[*Percentage-closer soft shadows*]:
https://developer.download.nvidia.com/shaderlibrary/docs/shadow_PCSS.pdf
pcwalton added a commit to pcwalton/bevy that referenced this pull request May 22, 2024
[*Percentage-closer soft shadows*] are a technique from 2004 that allow
shadows to become blurrier farther from the objects that cast them. It
works by introducing a *blocker search* step that runs before the normal
shadow map sampling. The blocker search step detects the difference
between the depth of the fragment being rasterized and the depth of the
nearby samples in the depth buffer. Larger depth differences result in a
larger penumbra and therefore a blurrier shadow.

To enable PCSS, fill in the `soft_shadow_size` value in
`DirectionalLight` or `PointLight`. This shadow size value represents
the size of the light and should be tuned as appropriate for your scene.
Higher values result in a wider penumbra (i.e. blurrier shadows).

When using PCSS, temporal shadow maps
(`ShadowFilteringMethod::Temporal`) are recommended. If you don't use
`ShadowFilteringMethod::Temporal` and instead use
`ShadowFilteringMethod::Gaussian`, Bevy will use the same technique as
`Temporal`, but the result won't vary over time. This produces a rather
noisy result. Doing better would likely require downsampling the shadow
map, which would be complex and slower (and would require PR bevyengine#13003 to
land first).

A new example, `pcss`, has been added. It demonstrates the
percentage-closer soft shadow technique with directional lights, point
lights, non-temporal operation, and temporal operation. The assets are
my original work.

Fixes bevyengine#3631.

[*Percentage-closer soft shadows*]:
https://developer.download.nvidia.com/shaderlibrary/docs/shadow_PCSS.pdf
pcwalton added a commit to pcwalton/bevy that referenced this pull request May 22, 2024
[*Percentage-closer soft shadows*] are a technique from 2004 that allow
shadows to become blurrier farther from the objects that cast them. It
works by introducing a *blocker search* step that runs before the normal
shadow map sampling. The blocker search step detects the difference
between the depth of the fragment being rasterized and the depth of the
nearby samples in the depth buffer. Larger depth differences result in a
larger penumbra and therefore a blurrier shadow.

To enable PCSS, fill in the `soft_shadow_size` value in
`DirectionalLight` or `PointLight`. This shadow size value represents
the size of the light and should be tuned as appropriate for your scene.
Higher values result in a wider penumbra (i.e. blurrier shadows).

When using PCSS, temporal shadow maps
(`ShadowFilteringMethod::Temporal`) are recommended. If you don't use
`ShadowFilteringMethod::Temporal` and instead use
`ShadowFilteringMethod::Gaussian`, Bevy will use the same technique as
`Temporal`, but the result won't vary over time. This produces a rather
noisy result. Doing better would likely require downsampling the shadow
map, which would be complex and slower (and would require PR bevyengine#13003 to
land first).

A new example, `pcss`, has been added. It demonstrates the
percentage-closer soft shadow technique with directional lights, point
lights, non-temporal operation, and temporal operation. The assets are
my original work.

Fixes bevyengine#3631.

[*Percentage-closer soft shadows*]:
https://developer.download.nvidia.com/shaderlibrary/docs/shadow_PCSS.pdf
pcwalton added a commit to pcwalton/bevy that referenced this pull request May 24, 2024
[*Percentage-closer soft shadows*] are a technique from 2004 that allow
shadows to become blurrier farther from the objects that cast them. It
works by introducing a *blocker search* step that runs before the normal
shadow map sampling. The blocker search step detects the difference
between the depth of the fragment being rasterized and the depth of the
nearby samples in the depth buffer. Larger depth differences result in a
larger penumbra and therefore a blurrier shadow.

To enable PCSS, fill in the `soft_shadow_size` value in
`DirectionalLight` or `PointLight`. This shadow size value represents
the size of the light and should be tuned as appropriate for your scene.
Higher values result in a wider penumbra (i.e. blurrier shadows).

When using PCSS, temporal shadow maps
(`ShadowFilteringMethod::Temporal`) are recommended. If you don't use
`ShadowFilteringMethod::Temporal` and instead use
`ShadowFilteringMethod::Gaussian`, Bevy will use the same technique as
`Temporal`, but the result won't vary over time. This produces a rather
noisy result. Doing better would likely require downsampling the shadow
map, which would be complex and slower (and would require PR bevyengine#13003 to
land first).

A new example, `pcss`, has been added. It demonstrates the
percentage-closer soft shadow technique with directional lights, point
lights, spot lights, non-temporal operation, and temporal operation. The
assets are my original work.

Fixes bevyengine#3631.

[*Percentage-closer soft shadows*]:
https://developer.download.nvidia.com/shaderlibrary/docs/shadow_PCSS.pdf
pcwalton added a commit to pcwalton/bevy that referenced this pull request May 24, 2024
[*Percentage-closer soft shadows*] are a technique from 2004 that allow
shadows to become blurrier farther from the objects that cast them. It
works by introducing a *blocker search* step that runs before the normal
shadow map sampling. The blocker search step detects the difference
between the depth of the fragment being rasterized and the depth of the
nearby samples in the depth buffer. Larger depth differences result in a
larger penumbra and therefore a blurrier shadow.

To enable PCSS, fill in the `soft_shadow_size` value in
`DirectionalLight` or `PointLight`. This shadow size value represents
the size of the light and should be tuned as appropriate for your scene.
Higher values result in a wider penumbra (i.e. blurrier shadows).

When using PCSS, temporal shadow maps
(`ShadowFilteringMethod::Temporal`) are recommended. If you don't use
`ShadowFilteringMethod::Temporal` and instead use
`ShadowFilteringMethod::Gaussian`, Bevy will use the same technique as
`Temporal`, but the result won't vary over time. This produces a rather
noisy result. Doing better would likely require downsampling the shadow
map, which would be complex and slower (and would require PR bevyengine#13003 to
land first).

In addition to PCSS, this commit makes the near Z plane for the shadow
map configurable on a per-light basis. Previously, it had been hardcoded
to 0.1. This was necessary to make the point light shadow map in the
example look reasonable, as otherwise the shadows appeared far too
aliased.

A new example, `pcss`, has been added. It demonstrates the
percentage-closer soft shadow technique with directional lights, point
lights, spot lights, non-temporal operation, and temporal operation. The
assets are my original work.

Fixes bevyengine#3631.

[*Percentage-closer soft shadows*]:
https://developer.download.nvidia.com/shaderlibrary/docs/shadow_PCSS.pdf
pcwalton added a commit to pcwalton/bevy that referenced this pull request May 24, 2024
[*Percentage-closer soft shadows*] are a technique from 2004 that allow
shadows to become blurrier farther from the objects that cast them. It
works by introducing a *blocker search* step that runs before the normal
shadow map sampling. The blocker search step detects the difference
between the depth of the fragment being rasterized and the depth of the
nearby samples in the depth buffer. Larger depth differences result in a
larger penumbra and therefore a blurrier shadow.

To enable PCSS, fill in the `soft_shadow_size` value in
`DirectionalLight`, `PointLight`, or `SpotLight`, as appropriate. This
shadow size value represents the size of the light and should be tuned
as appropriate for your scene.  Higher values result in a wider penumbra
(i.e. blurrier shadows).

When using PCSS, temporal shadow maps
(`ShadowFilteringMethod::Temporal`) are recommended. If you don't use
`ShadowFilteringMethod::Temporal` and instead use
`ShadowFilteringMethod::Gaussian`, Bevy will use the same technique as
`Temporal`, but the result won't vary over time. This produces a rather
noisy result. Doing better would likely require downsampling the shadow
map, which would be complex and slower (and would require PR bevyengine#13003 to
land first).

In addition to PCSS, this commit makes the near Z plane for the shadow
map configurable on a per-light basis. Previously, it had been hardcoded
to 0.1 meters. This change was necessary to make the point light shadow
map in the example look reasonable, as otherwise the shadows appeared
far too aliased.

A new example, `pcss`, has been added. It demonstrates the
percentage-closer soft shadow technique with directional lights, point
lights, spot lights, non-temporal operation, and temporal operation. The
assets are my original work.

Fixes bevyengine#3631.

[*Percentage-closer soft shadows*]:
https://developer.download.nvidia.com/shaderlibrary/docs/shadow_PCSS.pdf
pcwalton added a commit to pcwalton/bevy that referenced this pull request May 24, 2024
[*Percentage-closer soft shadows*] are a technique from 2004 that allow
shadows to become blurrier farther from the objects that cast them. It
works by introducing a *blocker search* step that runs before the normal
shadow map sampling. The blocker search step detects the difference
between the depth of the fragment being rasterized and the depth of the
nearby samples in the depth buffer. Larger depth differences result in a
larger penumbra and therefore a blurrier shadow.

To enable PCSS, fill in the `soft_shadow_size` value in
`DirectionalLight`, `PointLight`, or `SpotLight`, as appropriate. This
shadow size value represents the size of the light and should be tuned
as appropriate for your scene.  Higher values result in a wider penumbra
(i.e. blurrier shadows).

When using PCSS, temporal shadow maps
(`ShadowFilteringMethod::Temporal`) are recommended. If you don't use
`ShadowFilteringMethod::Temporal` and instead use
`ShadowFilteringMethod::Gaussian`, Bevy will use the same technique as
`Temporal`, but the result won't vary over time. This produces a rather
noisy result. Doing better would likely require downsampling the shadow
map, which would be complex and slower (and would require PR bevyengine#13003 to
land first).

In addition to PCSS, this commit makes the near Z plane for the shadow
map configurable on a per-light basis. Previously, it had been hardcoded
to 0.1 meters. This change was necessary to make the point light shadow
map in the example look reasonable, as otherwise the shadows appeared
far too aliased.

A new example, `pcss`, has been added. It demonstrates the
percentage-closer soft shadow technique with directional lights, point
lights, spot lights, non-temporal operation, and temporal operation. The
assets are my original work.

Both temporal and non-temporal shadows are rather noisy in the example,
and, as mentioned before, this is unavoidable without downsampling the
depth buffer, which we can't do yet. Note also that the shadows don't
look particularly great for point lights; the example simply isn't an
ideal scene for them. Nevertheless, I felt that the benefits of the
ability to do a side-by-side comparison of directional and point lights
outweighed the unsightliness of the point light shadows in that example,
so I kept the point light feature in.

Fixes bevyengine#3631.

[*Percentage-closer soft shadows*]:
https://developer.download.nvidia.com/shaderlibrary/docs/shadow_PCSS.pdf
pcwalton added a commit to pcwalton/bevy that referenced this pull request May 24, 2024
[*Percentage-closer soft shadows*] are a technique from 2004 that allow
shadows to become blurrier farther from the objects that cast them. It
works by introducing a *blocker search* step that runs before the normal
shadow map sampling. The blocker search step detects the difference
between the depth of the fragment being rasterized and the depth of the
nearby samples in the depth buffer. Larger depth differences result in a
larger penumbra and therefore a blurrier shadow.

To enable PCSS, fill in the `soft_shadow_size` value in
`DirectionalLight`, `PointLight`, or `SpotLight`, as appropriate. This
shadow size value represents the size of the light and should be tuned
as appropriate for your scene.  Higher values result in a wider penumbra
(i.e. blurrier shadows).

When using PCSS, temporal shadow maps
(`ShadowFilteringMethod::Temporal`) are recommended. If you don't use
`ShadowFilteringMethod::Temporal` and instead use
`ShadowFilteringMethod::Gaussian`, Bevy will use the same technique as
`Temporal`, but the result won't vary over time. This produces a rather
noisy result. Doing better would likely require downsampling the shadow
map, which would be complex and slower (and would require PR bevyengine#13003 to
land first).

In addition to PCSS, this commit makes the near Z plane for the shadow
map configurable on a per-light basis. Previously, it had been hardcoded
to 0.1 meters. This change was necessary to make the point light shadow
map in the example look reasonable, as otherwise the shadows appeared
far too aliased.

A new example, `pcss`, has been added. It demonstrates the
percentage-closer soft shadow technique with directional lights, point
lights, spot lights, non-temporal operation, and temporal operation. The
assets are my original work.

Both temporal and non-temporal shadows are rather noisy in the example,
and, as mentioned before, this is unavoidable without downsampling the
depth buffer, which we can't do yet. Note also that the shadows don't
look particularly great for point lights; the example simply isn't an
ideal scene for them. Nevertheless, I felt that the benefits of the
ability to do a side-by-side comparison of directional and point lights
outweighed the unsightliness of the point light shadows in that example,
so I kept the point light feature in.

Fixes bevyengine#3631.

[*Percentage-closer soft shadows*]:
https://developer.download.nvidia.com/shaderlibrary/docs/shadow_PCSS.pdf
pcwalton added a commit to pcwalton/bevy that referenced this pull request May 24, 2024
[*Percentage-closer soft shadows*] are a technique from 2004 that allow
shadows to become blurrier farther from the objects that cast them. It
works by introducing a *blocker search* step that runs before the normal
shadow map sampling. The blocker search step detects the difference
between the depth of the fragment being rasterized and the depth of the
nearby samples in the depth buffer. Larger depth differences result in a
larger penumbra and therefore a blurrier shadow.

To enable PCSS, fill in the `soft_shadow_size` value in
`DirectionalLight`, `PointLight`, or `SpotLight`, as appropriate. This
shadow size value represents the size of the light and should be tuned
as appropriate for your scene.  Higher values result in a wider penumbra
(i.e. blurrier shadows).

When using PCSS, temporal shadow maps
(`ShadowFilteringMethod::Temporal`) are recommended. If you don't use
`ShadowFilteringMethod::Temporal` and instead use
`ShadowFilteringMethod::Gaussian`, Bevy will use the same technique as
`Temporal`, but the result won't vary over time. This produces a rather
noisy result. Doing better would likely require downsampling the shadow
map, which would be complex and slower (and would require PR bevyengine#13003 to
land first).

In addition to PCSS, this commit makes the near Z plane for the shadow
map configurable on a per-light basis. Previously, it had been hardcoded
to 0.1 meters. This change was necessary to make the point light shadow
map in the example look reasonable, as otherwise the shadows appeared
far too aliased.

A new example, `pcss`, has been added. It demonstrates the
percentage-closer soft shadow technique with directional lights, point
lights, spot lights, non-temporal operation, and temporal operation. The
assets are my original work.

Both temporal and non-temporal shadows are rather noisy in the example,
and, as mentioned before, this is unavoidable without downsampling the
depth buffer, which we can't do yet. Note also that the shadows don't
look particularly great for point lights; the example simply isn't an
ideal scene for them. Nevertheless, I felt that the benefits of the
ability to do a side-by-side comparison of directional and point lights
outweighed the unsightliness of the point light shadows in that example,
so I kept the point light feature in.

Fixes bevyengine#3631.

[*Percentage-closer soft shadows*]:
https://developer.download.nvidia.com/shaderlibrary/docs/shadow_PCSS.pdf
pcwalton added a commit to pcwalton/bevy that referenced this pull request May 24, 2024
[*Percentage-closer soft shadows*] are a technique from 2004 that allow
shadows to become blurrier farther from the objects that cast them. It
works by introducing a *blocker search* step that runs before the normal
shadow map sampling. The blocker search step detects the difference
between the depth of the fragment being rasterized and the depth of the
nearby samples in the depth buffer. Larger depth differences result in a
larger penumbra and therefore a blurrier shadow.

To enable PCSS, fill in the `soft_shadow_size` value in
`DirectionalLight`, `PointLight`, or `SpotLight`, as appropriate. This
shadow size value represents the size of the light and should be tuned
as appropriate for your scene.  Higher values result in a wider penumbra
(i.e. blurrier shadows).

When using PCSS, temporal shadow maps
(`ShadowFilteringMethod::Temporal`) are recommended. If you don't use
`ShadowFilteringMethod::Temporal` and instead use
`ShadowFilteringMethod::Gaussian`, Bevy will use the same technique as
`Temporal`, but the result won't vary over time. This produces a rather
noisy result. Doing better would likely require downsampling the shadow
map, which would be complex and slower (and would require PR bevyengine#13003 to
land first).

In addition to PCSS, this commit makes the near Z plane for the shadow
map configurable on a per-light basis. Previously, it had been hardcoded
to 0.1 meters. This change was necessary to make the point light shadow
map in the example look reasonable, as otherwise the shadows appeared
far too aliased.

A new example, `pcss`, has been added. It demonstrates the
percentage-closer soft shadow technique with directional lights, point
lights, spot lights, non-temporal operation, and temporal operation. The
assets are my original work.

Both temporal and non-temporal shadows are rather noisy in the example,
and, as mentioned before, this is unavoidable without downsampling the
depth buffer, which we can't do yet. Note also that the shadows don't
look particularly great for point lights; the example simply isn't an
ideal scene for them. Nevertheless, I felt that the benefits of the
ability to do a side-by-side comparison of directional and point lights
outweighed the unsightliness of the point light shadows in that example,
so I kept the point light feature in.

Fixes bevyengine#3631.

[*Percentage-closer soft shadows*]:
https://developer.download.nvidia.com/shaderlibrary/docs/shadow_PCSS.pdf
pcwalton added a commit to pcwalton/bevy that referenced this pull request May 25, 2024
[*Percentage-closer soft shadows*] are a technique from 2004 that allow
shadows to become blurrier farther from the objects that cast them. It
works by introducing a *blocker search* step that runs before the normal
shadow map sampling. The blocker search step detects the difference
between the depth of the fragment being rasterized and the depth of the
nearby samples in the depth buffer. Larger depth differences result in a
larger penumbra and therefore a blurrier shadow.

To enable PCSS, fill in the `soft_shadow_size` value in
`DirectionalLight`, `PointLight`, or `SpotLight`, as appropriate. This
shadow size value represents the size of the light and should be tuned
as appropriate for your scene.  Higher values result in a wider penumbra
(i.e. blurrier shadows).

When using PCSS, temporal shadow maps
(`ShadowFilteringMethod::Temporal`) are recommended. If you don't use
`ShadowFilteringMethod::Temporal` and instead use
`ShadowFilteringMethod::Gaussian`, Bevy will use the same technique as
`Temporal`, but the result won't vary over time. This produces a rather
noisy result. Doing better would likely require downsampling the shadow
map, which would be complex and slower (and would require PR bevyengine#13003 to
land first).

In addition to PCSS, this commit makes the near Z plane for the shadow
map configurable on a per-light basis. Previously, it had been hardcoded
to 0.1 meters. This change was necessary to make the point light shadow
map in the example look reasonable, as otherwise the shadows appeared
far too aliased.

A new example, `pcss`, has been added. It demonstrates the
percentage-closer soft shadow technique with directional lights, point
lights, spot lights, non-temporal operation, and temporal operation. The
assets are my original work.

Both temporal and non-temporal shadows are rather noisy in the example,
and, as mentioned before, this is unavoidable without downsampling the
depth buffer, which we can't do yet. Note also that the shadows don't
look particularly great for point lights; the example simply isn't an
ideal scene for them. Nevertheless, I felt that the benefits of the
ability to do a side-by-side comparison of directional and point lights
outweighed the unsightliness of the point light shadows in that example,
so I kept the point light feature in.

Fixes bevyengine#3631.

[*Percentage-closer soft shadows*]:
https://developer.download.nvidia.com/shaderlibrary/docs/shadow_PCSS.pdf
pcwalton added a commit to pcwalton/bevy that referenced this pull request May 25, 2024
[*Percentage-closer soft shadows*] are a technique from 2004 that allow
shadows to become blurrier farther from the objects that cast them. It
works by introducing a *blocker search* step that runs before the normal
shadow map sampling. The blocker search step detects the difference
between the depth of the fragment being rasterized and the depth of the
nearby samples in the depth buffer. Larger depth differences result in a
larger penumbra and therefore a blurrier shadow.

To enable PCSS, fill in the `soft_shadow_size` value in
`DirectionalLight`, `PointLight`, or `SpotLight`, as appropriate. This
shadow size value represents the size of the light and should be tuned
as appropriate for your scene.  Higher values result in a wider penumbra
(i.e. blurrier shadows).

When using PCSS, temporal shadow maps
(`ShadowFilteringMethod::Temporal`) are recommended. If you don't use
`ShadowFilteringMethod::Temporal` and instead use
`ShadowFilteringMethod::Gaussian`, Bevy will use the same technique as
`Temporal`, but the result won't vary over time. This produces a rather
noisy result. Doing better would likely require downsampling the shadow
map, which would be complex and slower (and would require PR bevyengine#13003 to
land first).

In addition to PCSS, this commit makes the near Z plane for the shadow
map configurable on a per-light basis. Previously, it had been hardcoded
to 0.1 meters. This change was necessary to make the point light shadow
map in the example look reasonable, as otherwise the shadows appeared
far too aliased.

A new example, `pcss`, has been added. It demonstrates the
percentage-closer soft shadow technique with directional lights, point
lights, spot lights, non-temporal operation, and temporal operation. The
assets are my original work.

Both temporal and non-temporal shadows are rather noisy in the example,
and, as mentioned before, this is unavoidable without downsampling the
depth buffer, which we can't do yet. Note also that the shadows don't
look particularly great for point lights; the example simply isn't an
ideal scene for them. Nevertheless, I felt that the benefits of the
ability to do a side-by-side comparison of directional and point lights
outweighed the unsightliness of the point light shadows in that example,
so I kept the point light feature in.

Fixes bevyengine#3631.

[*Percentage-closer soft shadows*]:
https://developer.download.nvidia.com/shaderlibrary/docs/shadow_PCSS.pdf
@alice-i-cecile alice-i-cecile removed the S-Waiting-on-Author The author needs to make changes or address concerns before this can be merged label May 30, 2024
Copy link
Contributor

@atlv24 atlv24 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good. i've already looked through this code a few times before, just never left an approve

@IceSentry IceSentry added S-Ready-For-Final-Review This PR has been approved by the community. It's ready for a maintainer to consider merging it and removed S-Needs-Review Needs reviewer attention (from anyone!) to move forward labels Jun 2, 2024
Comment on lines +798 to +813
texture_depth_2d(),
write_only_r32float(),
write_only_r32float(),
write_only_r32float(),
write_only_r32float(),
write_only_r32float(),
texture_storage_2d(
TextureFormat::R32Float,
StorageTextureAccess::ReadWrite,
),
write_only_r32float(),
write_only_r32float(),
write_only_r32float(),
write_only_r32float(),
write_only_r32float(),
write_only_r32float(),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess this could help a bit

Suggested change
texture_depth_2d(),
write_only_r32float(),
write_only_r32float(),
write_only_r32float(),
write_only_r32float(),
write_only_r32float(),
texture_storage_2d(
TextureFormat::R32Float,
StorageTextureAccess::ReadWrite,
),
write_only_r32float(),
write_only_r32float(),
write_only_r32float(),
write_only_r32float(),
write_only_r32float(),
write_only_r32float(),
// view depth
texture_depth_2d(),
// mip 1
write_only_r32float(),
// mip 2
write_only_r32float(),
// mip 3
write_only_r32float(),
// mip 4
write_only_r32float(),
// mip 5
write_only_r32float(),
// mip 6
texture_storage_2d(
TextureFormat::R32Float,
StorageTextureAccess::ReadWrite,
),
// mip 7
write_only_r32float(),
// mip 8
write_only_r32float(),
// mip 9
write_only_r32float(),
// mip 10
write_only_r32float(),
// mip 11
write_only_r32float(),
// mip 12
write_only_r32float(),

Copy link
Contributor

@IceSentry IceSentry left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't go over the shader code, but I trust the other reviewers on that.

@alice-i-cecile alice-i-cecile added this pull request to the merge queue Jun 3, 2024
Merged via the queue into bevyengine:main with commit 5536079 Jun 3, 2024
32 checks passed
@otoomey
Copy link

otoomey commented Jun 26, 2024

I believe that there might be an improvement that can be made here:

fn reduce_load_mip_0(tex: vec2u) -> f32 {
    let uv = (vec2f(tex) + 0.5) / vec2f(textureDimensions(mip_0));
    return reduce_4(textureGather(mip_0, samplr, uv));
}

From what I understand from the spec, textureGather already computes the four component minimum and stores it in the w channel. So there is no need to reduce, just do the following:

fn reduce_load_mip_0(tex: vec2u) -> f32 {
    let uv = (vec2f(tex) + 0.5) / vec2f(textureDimensions(mip_0));
    return textureGather(mip_0, samplr, uv).w;
}

It is very possible that the compiler spots this optimization already.

Frankly I'm not sure how textureGather works exactly, in Vulkan you needed a VK_SAMPLER_REDUCTION_MODE_MIN sampler and corresponding extension to do this.

@JMS55
Copy link
Contributor Author

JMS55 commented Jun 27, 2024

textureGather does not return the minimum of the 4 values. The w component is the (u_min, v_min) value of the sample footprint. I.e. given 4 texels (the sample footprint) arranged in a 2x2 quad, the w component is the value at location (u_min, v_min).

Yes, ideally I would be able to use VK_SAMPLER_REDUCTION_MODE_MIN, but unfortunately wgpu does not support it.

github-merge-queue bot pushed a commit that referenced this pull request Sep 18, 2024
[*Percentage-closer soft shadows*] are a technique from 2004 that allow
shadows to become blurrier farther from the objects that cast them. It
works by introducing a *blocker search* step that runs before the normal
shadow map sampling. The blocker search step detects the difference
between the depth of the fragment being rasterized and the depth of the
nearby samples in the depth buffer. Larger depth differences result in a
larger penumbra and therefore a blurrier shadow.

To enable PCSS, fill in the `soft_shadow_size` value in
`DirectionalLight`, `PointLight`, or `SpotLight`, as appropriate. This
shadow size value represents the size of the light and should be tuned
as appropriate for your scene. Higher values result in a wider penumbra
(i.e. blurrier shadows).

When using PCSS, temporal shadow maps
(`ShadowFilteringMethod::Temporal`) are recommended. If you don't use
`ShadowFilteringMethod::Temporal` and instead use
`ShadowFilteringMethod::Gaussian`, Bevy will use the same technique as
`Temporal`, but the result won't vary over time. This produces a rather
noisy result. Doing better would likely require downsampling the shadow
map, which would be complex and slower (and would require PR #13003 to
land first).

In addition to PCSS, this commit makes the near Z plane for the shadow
map configurable on a per-light basis. Previously, it had been hardcoded
to 0.1 meters. This change was necessary to make the point light shadow
map in the example look reasonable, as otherwise the shadows appeared
far too aliased.

A new example, `pcss`, has been added. It demonstrates the
percentage-closer soft shadow technique with directional lights, point
lights, spot lights, non-temporal operation, and temporal operation. The
assets are my original work.

Both temporal and non-temporal shadows are rather noisy in the example,
and, as mentioned before, this is unavoidable without downsampling the
depth buffer, which we can't do yet. Note also that the shadows don't
look particularly great for point lights; the example simply isn't an
ideal scene for them. Nevertheless, I felt that the benefits of the
ability to do a side-by-side comparison of directional and point lights
outweighed the unsightliness of the point light shadows in that example,
so I kept the point light feature in.

Fixes #3631.

[*Percentage-closer soft shadows*]:
https://developer.download.nvidia.com/shaderlibrary/docs/shadow_PCSS.pdf

## Changelog

### Added

* Percentage-closer soft shadows (PCSS) are now supported, allowing
shadows to become blurrier as they stretch away from objects. To use
them, set the `soft_shadow_size` field in `DirectionalLight`,
`PointLight`, or `SpotLight`, as applicable.

* The near Z value for shadow maps is now customizable via the
`shadow_map_near_z` field in `DirectionalLight`, `PointLight`, and
`SpotLight`.

## Screenshots

PCSS off:
![Screenshot 2024-05-24
120012](https://github.com/bevyengine/bevy/assets/157897/0d35fe98-245b-44fb-8a43-8d0272a73b86)

PCSS on:
![Screenshot 2024-05-24
115959](https://github.com/bevyengine/bevy/assets/157897/83397ef8-1317-49dd-bfb3-f8286d7610cd)

---------

Co-authored-by: Alice Cecile <[email protected]>
Co-authored-by: Torstein Grindvik <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-Rendering Drawing game state to the screen C-Performance A change motivated by improving speed, memory usage or compile times D-Modest A "normal" level of difficulty; suitable for simple features or challenging fixes S-Ready-For-Final-Review This PR has been approved by the community. It's ready for a maintainer to consider merging it X-Uncontroversial This work is generally agreed upon
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants