Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add FFT implementation for SIMD backend #602

Merged
merged 2 commits into from
May 16, 2024

Conversation

andrewmilson
Copy link
Contributor

@andrewmilson andrewmilson commented May 2, 2024

This change is Reviewable

@andrewmilson andrewmilson force-pushed the 04-30-Add_blake2s_implementation_for_SIMD_backend branch from 5a46ff8 to d52a58c Compare May 2, 2024 19:09
@andrewmilson andrewmilson force-pushed the 05-02-Add_FFT_implementation_for_SIMD_backend branch from 7f4ba87 to 930295e Compare May 2, 2024 19:09
@andrewmilson andrewmilson force-pushed the 04-30-Add_blake2s_implementation_for_SIMD_backend branch from d52a58c to 1bc944f Compare May 2, 2024 19:24
@andrewmilson andrewmilson force-pushed the 05-02-Add_FFT_implementation_for_SIMD_backend branch from 930295e to cb6c004 Compare May 2, 2024 19:24
@andrewmilson andrewmilson force-pushed the 04-30-Add_blake2s_implementation_for_SIMD_backend branch from 1bc944f to dbca87e Compare May 2, 2024 20:09
@andrewmilson andrewmilson force-pushed the 05-02-Add_FFT_implementation_for_SIMD_backend branch from cb6c004 to 73877f6 Compare May 2, 2024 20:09
@andrewmilson andrewmilson force-pushed the 04-30-Add_blake2s_implementation_for_SIMD_backend branch from dbca87e to eb5afee Compare May 2, 2024 20:34
@andrewmilson andrewmilson force-pushed the 05-02-Add_FFT_implementation_for_SIMD_backend branch from 73877f6 to cf887ea Compare May 2, 2024 20:34
@andrewmilson andrewmilson force-pushed the 05-02-Add_FFT_implementation_for_SIMD_backend branch from cf887ea to b4c6ec9 Compare May 2, 2024 20:48
@codecov-commenter
Copy link

codecov-commenter commented May 2, 2024

Codecov Report

Attention: Patch coverage is 98.76325% with 14 lines in your changes are missing coverage. Please review.

Project coverage is 91.98%. Comparing base (64fd01c) to head (1ade01d).

Files Patch % Lines
crates/prover/src/core/backend/simd/fft/ifft.rs 99.04% 5 Missing ⚠️
crates/prover/src/core/backend/simd/fft/rfft.rs 99.05% 5 Missing ⚠️
crates/prover/src/core/backend/simd/m31.rs 33.33% 4 Missing ⚠️
Additional details and impacted files
@@                                  Coverage Diff                                  @@
##           04-30-Add_blake2s_implementation_for_SIMD_backend     #602      +/-   ##
=====================================================================================
+ Coverage                                              91.09%   91.98%   +0.89%     
=====================================================================================
  Files                                                     77       80       +3     
  Lines                                                   9790    10916    +1126     
  Branches                                                9790    10916    +1126     
=====================================================================================
+ Hits                                                    8918    10041    +1123     
- Misses                                                   803      806       +3     
  Partials                                                  69       69              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Contributor

@spapinistarkware spapinistarkware left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 1 of 3 files at r2.
Reviewable status: 1 of 4 files reviewed, 8 unresolved discussions (waiting on @andrewmilson)


crates/prover/src/core/backend/simd/fft/ifft.rs line 156 at r2 (raw file):

    for index_l in 0..(1 << loop_bits) {
        let index = (index_h << loop_bits) + index_l;
        let mut val0 = PackedBaseField::load(values.add(index * 32).cast_const() as *const u32);

Perhaps it's better to change the type to u32 in the callers as well?
I see there's a lot of "as" casting here, which we probably want to avoid if possible.


crates/prover/src/core/backend/simd/fft/ifft.rs line 291 at r2 (raw file):

                super::_mul_twiddle_simd(r1, twiddle_dbl)
            }
        }

Can you extract this block to a function mul_twiddle?

Code quote:

        cfg_if::cfg_if! {
            if #[cfg(all(target_feature = "neon", target_arch = "aarch64"))] {
                super::_mul_twiddle_neon(r1, twiddle_dbl)
            } else if #[cfg(all(target_feature = "simd128", target_arch = "wasm32"))] {
                super::_mul_twiddle_wasm(r1, twiddle_dbl)
            } else if #[cfg(all(target_arch = "x86_64", target_feature = "avx512f"))] {
                super::_mul_twiddle_avx512(r1, twiddle_dbl)
            } else if #[cfg(all(target_arch = "x86_64", target_feature = "avx2f"))] {
                super::_mul_twiddle_avx2(r1, twiddle_dbl)
            } else {
                super::_mul_twiddle_simd(r1, twiddle_dbl)
            }
        }

crates/prover/src/core/backend/simd/fft/ifft.rs line 346 at r2 (raw file):

    // The twiddles for layer 2 are replicated in the following pattern:
    //   0 1 2 3 0 1 2 3 0 1 2 3 0 1 2 3

The explicit number below make this comment redundant. Remove it.


crates/prover/src/core/backend/simd/fft/mod.rs line 65 at r2 (raw file):

}

unsafe fn _mm512_load_epi32(mem_addr: *const i32) -> u32x16 {

I think this name should change, as it is a specific intel intrinsic. Perhaps just inline it?

Code quote:

_mm512_load_epi32

crates/prover/src/core/backend/simd/fft/mod.rs line 79 at r2 (raw file):

    // Start by loading the twiddles for the second layer (layer 1):
    // The twiddles for layer 1 are replicated in the following pattern:
    //   0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7

Same here. Remove line.


crates/prover/src/core/backend/simd/fft/mod.rs line 117 at r2 (raw file):

        ]))
    };
    let t0 = IndicesFromT1::swizzle(t1) ^ NEGATION_MASK;

why simd_swizzle! doesnt work here?


crates/prover/src/core/backend/simd/fft/mod.rs line 122 at r2 (raw file):

#[cfg(target_arch = "aarch64")]
fn _mul_twiddle_neon(a: PackedBaseField, twiddle_dbl: u32x16) -> PackedBaseField {

I don't think we should implement these again.
For architectures that in regular mul, need to double one argument, they should be refactored and the part after doubling extracted.
For other architectures, there's no need to save the twiddle as soubled at all. This was just done to save ops.


crates/prover/src/core/backend/simd/fft/rfft.rs line 312 at r2 (raw file):

    twiddle_dbl: u32x16,
) -> (PackedBaseField, PackedBaseField) {
    let prod = {

Same

Copy link
Contributor Author

@andrewmilson andrewmilson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 1 of 4 files reviewed, 8 unresolved discussions (waiting on @spapinistarkware)


crates/prover/src/core/backend/simd/fft/ifft.rs line 156 at r2 (raw file):

Previously, spapinistarkware (Shahar Papini) wrote…

Perhaps it's better to change the type to u32 in the callers as well?
I see there's a lot of "as" casting here, which we probably want to avoid if possible.

These are changed in the next PR.
https://reviewable.io/reviews/starkware-libs/stwo/596


crates/prover/src/core/backend/simd/fft/ifft.rs line 291 at r2 (raw file):

Previously, spapinistarkware (Shahar Papini) wrote…

Can you extract this block to a function mul_twiddle?

Done.


crates/prover/src/core/backend/simd/fft/ifft.rs line 346 at r2 (raw file):

Previously, spapinistarkware (Shahar Papini) wrote…

The explicit number below make this comment redundant. Remove it.

Done.


crates/prover/src/core/backend/simd/fft/mod.rs line 65 at r2 (raw file):

Previously, spapinistarkware (Shahar Papini) wrote…

I think this name should change, as it is a specific intel intrinsic. Perhaps just inline it?

renamed in the next PR https://reviewable.io/reviews/starkware-libs/stwo/596


crates/prover/src/core/backend/simd/fft/mod.rs line 117 at r2 (raw file):

Previously, spapinistarkware (Shahar Papini) wrote…

why simd_swizzle! doesnt work here?

Updated.


crates/prover/src/core/backend/simd/fft/mod.rs line 122 at r2 (raw file):

Previously, spapinistarkware (Shahar Papini) wrote…

I don't think we should implement these again.
For architectures that in regular mul, need to double one argument, they should be refactored and the part after doubling extracted.
For other architectures, there's no need to save the twiddle as soubled at all. This was just done to save ops.

Extracting the mul part after the doubling has removed a lot of the code duplication but WDYT about keeping the twiddles double for all platforms at the moment? Pros: All docs and naming is already for double twiddles, No performance cost to any platform leaving as is. Cons: we have two very similar looking mul implementations for Neon (as AFAIK this is the only architechture that has the doubling mul instruction).


crates/prover/src/core/backend/simd/fft/rfft.rs line 312 at r2 (raw file):

Previously, spapinistarkware (Shahar Papini) wrote…

Same

Done.

Copy link
Contributor Author

@andrewmilson andrewmilson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 1 of 4 files reviewed, 8 unresolved discussions (waiting on @spapinistarkware)


crates/prover/src/core/backend/simd/fft/mod.rs line 79 at r2 (raw file):

Previously, spapinistarkware (Shahar Papini) wrote…

Same here. Remove line.

Done.

@andrewmilson andrewmilson force-pushed the 04-30-Add_blake2s_implementation_for_SIMD_backend branch from 6cbca64 to f6c5046 Compare May 10, 2024 21:12
@andrewmilson andrewmilson force-pushed the 05-02-Add_FFT_implementation_for_SIMD_backend branch 2 times, most recently from 941402f to 0fdf4c7 Compare May 11, 2024 00:35
@andrewmilson andrewmilson force-pushed the 04-30-Add_blake2s_implementation_for_SIMD_backend branch from f6c5046 to 6a263ed Compare May 11, 2024 02:41
@andrewmilson andrewmilson force-pushed the 05-02-Add_FFT_implementation_for_SIMD_backend branch from 0fdf4c7 to d68543b Compare May 11, 2024 02:41
@andrewmilson andrewmilson force-pushed the 04-30-Add_blake2s_implementation_for_SIMD_backend branch from 6a263ed to dc11de3 Compare May 15, 2024 04:34
@andrewmilson andrewmilson force-pushed the 05-02-Add_FFT_implementation_for_SIMD_backend branch from d68543b to 3cb07a8 Compare May 15, 2024 04:34
@andrewmilson andrewmilson force-pushed the 04-30-Add_blake2s_implementation_for_SIMD_backend branch 2 times, most recently from 1b95f6d to aa6e648 Compare May 15, 2024 19:37
@andrewmilson andrewmilson force-pushed the 05-02-Add_FFT_implementation_for_SIMD_backend branch from 3cb07a8 to 16f412d Compare May 15, 2024 19:37
Copy link
Contributor Author

@andrewmilson andrewmilson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 0 of 5 files reviewed, 8 unresolved discussions (waiting on @spapinistarkware)


crates/prover/src/core/backend/simd/fft/mod.rs line 122 at r2 (raw file):

Previously, andrewmilson (Andrew Milson) wrote…

Extracting the mul part after the doubling has removed a lot of the code duplication but WDYT about keeping the twiddles double for all platforms at the moment? Pros: All docs and naming is already for double twiddles, No performance cost to any platform leaving as is. Cons: we have two very similar looking mul implementations for Neon (as AFAIK this is the only architechture that has the doubling mul instruction).

Added as a TODO. Will follow up in another PR.

@andrewmilson andrewmilson force-pushed the 05-02-Add_FFT_implementation_for_SIMD_backend branch from 16f412d to 83fd68d Compare May 16, 2024 02:53
@andrewmilson andrewmilson force-pushed the 04-30-Add_blake2s_implementation_for_SIMD_backend branch from aa6e648 to 64fd01c Compare May 16, 2024 03:56
@andrewmilson andrewmilson force-pushed the 05-02-Add_FFT_implementation_for_SIMD_backend branch from 83fd68d to 1ade01d Compare May 16, 2024 03:56
Copy link
Contributor

@spapinistarkware spapinistarkware left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewed 1 of 4 files at r1, 1 of 4 files at r5, 1 of 4 files at r6.
Reviewable status: 2 of 5 files reviewed, all discussions resolved

Base automatically changed from 04-30-Add_blake2s_implementation_for_SIMD_backend to dev May 16, 2024 12:17
@andrewmilson andrewmilson force-pushed the 05-02-Add_FFT_implementation_for_SIMD_backend branch from 1ade01d to 42df366 Compare May 16, 2024 12:18
@andrewmilson andrewmilson merged commit 25bf368 into dev May 16, 2024
21 of 22 checks passed
@andrewmilson andrewmilson deleted the 05-02-Add_FFT_implementation_for_SIMD_backend branch May 16, 2024 12:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants