Add FFT implementation for SIMD backend #602

andrewmilson · 2024-05-02T18:49:10Z

This change is

andrewmilson · 2024-05-02T18:49:38Z

This stack of pull requests is managed by Graphite. Learn more about stacking.

Join @andrewmilson and the rest of your teammates on Graphite

codecov-commenter · 2024-05-02T20:51:32Z

Codecov Report

Attention: Patch coverage is 98.76325% with 14 lines in your changes are missing coverage. Please review.

Project coverage is 91.98%. Comparing base (64fd01c) to head (1ade01d).

Files	Patch %	Lines
crates/prover/src/core/backend/simd/fft/ifft.rs	99.04%	5 Missing ⚠️
crates/prover/src/core/backend/simd/fft/rfft.rs	99.05%	5 Missing ⚠️
crates/prover/src/core/backend/simd/m31.rs	33.33%	4 Missing ⚠️

Additional details and impacted files

@@                                  Coverage Diff                                  @@
##           04-30-Add_blake2s_implementation_for_SIMD_backend     #602      +/-   ##
=====================================================================================
+ Coverage                                              91.09%   91.98%   +0.89%     
=====================================================================================
  Files                                                     77       80       +3     
  Lines                                                   9790    10916    +1126     
  Branches                                                9790    10916    +1126     
=====================================================================================
+ Hits                                                    8918    10041    +1123     
- Misses                                                   803      806       +3     
  Partials                                                  69       69

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

spapinistarkware

Reviewed 1 of 3 files at r2.
Reviewable status: 1 of 4 files reviewed, 8 unresolved discussions (waiting on @andrewmilson)

crates/prover/src/core/backend/simd/fft/ifft.rs line 156 at r2 (raw file):

    for index_l in 0..(1 << loop_bits) {
        let index = (index_h << loop_bits) + index_l;
        let mut val0 = PackedBaseField::load(values.add(index * 32).cast_const() as *const u32);

Perhaps it's better to change the type to u32 in the callers as well?
I see there's a lot of "as" casting here, which we probably want to avoid if possible.

crates/prover/src/core/backend/simd/fft/ifft.rs line 291 at r2 (raw file):

                super::_mul_twiddle_simd(r1, twiddle_dbl)
            }
        }

Can you extract this block to a function mul_twiddle?

Code quote:

        cfg_if::cfg_if! {
            if #[cfg(all(target_feature = "neon", target_arch = "aarch64"))] {
                super::_mul_twiddle_neon(r1, twiddle_dbl)
            } else if #[cfg(all(target_feature = "simd128", target_arch = "wasm32"))] {
                super::_mul_twiddle_wasm(r1, twiddle_dbl)
            } else if #[cfg(all(target_arch = "x86_64", target_feature = "avx512f"))] {
                super::_mul_twiddle_avx512(r1, twiddle_dbl)
            } else if #[cfg(all(target_arch = "x86_64", target_feature = "avx2f"))] {
                super::_mul_twiddle_avx2(r1, twiddle_dbl)
            } else {
                super::_mul_twiddle_simd(r1, twiddle_dbl)
            }
        }

crates/prover/src/core/backend/simd/fft/ifft.rs line 346 at r2 (raw file):

    // The twiddles for layer 2 are replicated in the following pattern:
    //   0 1 2 3 0 1 2 3 0 1 2 3 0 1 2 3

The explicit number below make this comment redundant. Remove it.

crates/prover/src/core/backend/simd/fft/mod.rs line 65 at r2 (raw file):

}

unsafe fn _mm512_load_epi32(mem_addr: *const i32) -> u32x16 {

I think this name should change, as it is a specific intel intrinsic. Perhaps just inline it?

Code quote:

_mm512_load_epi32

crates/prover/src/core/backend/simd/fft/mod.rs line 79 at r2 (raw file):

    // Start by loading the twiddles for the second layer (layer 1):
    // The twiddles for layer 1 are replicated in the following pattern:
    //   0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7

Same here. Remove line.

crates/prover/src/core/backend/simd/fft/mod.rs line 117 at r2 (raw file):

        ]))
    };
    let t0 = IndicesFromT1::swizzle(t1) ^ NEGATION_MASK;

why simd_swizzle! doesnt work here?

crates/prover/src/core/backend/simd/fft/mod.rs line 122 at r2 (raw file):

#[cfg(target_arch = "aarch64")]
fn _mul_twiddle_neon(a: PackedBaseField, twiddle_dbl: u32x16) -> PackedBaseField {

I don't think we should implement these again.
For architectures that in regular mul, need to double one argument, they should be refactored and the part after doubling extracted.
For other architectures, there's no need to save the twiddle as soubled at all. This was just done to save ops.

crates/prover/src/core/backend/simd/fft/rfft.rs line 312 at r2 (raw file):

    twiddle_dbl: u32x16,
) -> (PackedBaseField, PackedBaseField) {
    let prod = {

Same

andrewmilson

Reviewable status: 1 of 4 files reviewed, 8 unresolved discussions (waiting on @spapinistarkware)

crates/prover/src/core/backend/simd/fft/ifft.rs line 156 at r2 (raw file):