-
Notifications
You must be signed in to change notification settings - Fork 109
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add FFT implementation for SIMD backend #602
Add FFT implementation for SIMD backend #602
Conversation
5a46ff8
to
d52a58c
Compare
7f4ba87
to
930295e
Compare
d52a58c
to
1bc944f
Compare
930295e
to
cb6c004
Compare
1bc944f
to
dbca87e
Compare
cb6c004
to
73877f6
Compare
dbca87e
to
eb5afee
Compare
73877f6
to
cf887ea
Compare
cf887ea
to
b4c6ec9
Compare
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## 04-30-Add_blake2s_implementation_for_SIMD_backend #602 +/- ##
=====================================================================================
+ Coverage 91.09% 91.98% +0.89%
=====================================================================================
Files 77 80 +3
Lines 9790 10916 +1126
Branches 9790 10916 +1126
=====================================================================================
+ Hits 8918 10041 +1123
- Misses 803 806 +3
Partials 69 69 ☔ View full report in Codecov by Sentry. |
df21476
to
4979fb3
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 1 of 3 files at r2.
Reviewable status: 1 of 4 files reviewed, 8 unresolved discussions (waiting on @andrewmilson)
crates/prover/src/core/backend/simd/fft/ifft.rs
line 156 at r2 (raw file):
for index_l in 0..(1 << loop_bits) { let index = (index_h << loop_bits) + index_l; let mut val0 = PackedBaseField::load(values.add(index * 32).cast_const() as *const u32);
Perhaps it's better to change the type to u32 in the callers as well?
I see there's a lot of "as" casting here, which we probably want to avoid if possible.
crates/prover/src/core/backend/simd/fft/ifft.rs
line 291 at r2 (raw file):
super::_mul_twiddle_simd(r1, twiddle_dbl) } }
Can you extract this block to a function mul_twiddle?
Code quote:
cfg_if::cfg_if! {
if #[cfg(all(target_feature = "neon", target_arch = "aarch64"))] {
super::_mul_twiddle_neon(r1, twiddle_dbl)
} else if #[cfg(all(target_feature = "simd128", target_arch = "wasm32"))] {
super::_mul_twiddle_wasm(r1, twiddle_dbl)
} else if #[cfg(all(target_arch = "x86_64", target_feature = "avx512f"))] {
super::_mul_twiddle_avx512(r1, twiddle_dbl)
} else if #[cfg(all(target_arch = "x86_64", target_feature = "avx2f"))] {
super::_mul_twiddle_avx2(r1, twiddle_dbl)
} else {
super::_mul_twiddle_simd(r1, twiddle_dbl)
}
}
crates/prover/src/core/backend/simd/fft/ifft.rs
line 346 at r2 (raw file):
// The twiddles for layer 2 are replicated in the following pattern: // 0 1 2 3 0 1 2 3 0 1 2 3 0 1 2 3
The explicit number below make this comment redundant. Remove it.
crates/prover/src/core/backend/simd/fft/mod.rs
line 65 at r2 (raw file):
} unsafe fn _mm512_load_epi32(mem_addr: *const i32) -> u32x16 {
I think this name should change, as it is a specific intel intrinsic. Perhaps just inline it?
Code quote:
_mm512_load_epi32
crates/prover/src/core/backend/simd/fft/mod.rs
line 79 at r2 (raw file):
// Start by loading the twiddles for the second layer (layer 1): // The twiddles for layer 1 are replicated in the following pattern: // 0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7
Same here. Remove line.
crates/prover/src/core/backend/simd/fft/mod.rs
line 117 at r2 (raw file):
])) }; let t0 = IndicesFromT1::swizzle(t1) ^ NEGATION_MASK;
why simd_swizzle! doesnt work here?
crates/prover/src/core/backend/simd/fft/mod.rs
line 122 at r2 (raw file):
#[cfg(target_arch = "aarch64")] fn _mul_twiddle_neon(a: PackedBaseField, twiddle_dbl: u32x16) -> PackedBaseField {
I don't think we should implement these again.
For architectures that in regular mul, need to double one argument, they should be refactored and the part after doubling extracted.
For other architectures, there's no need to save the twiddle as soubled at all. This was just done to save ops.
crates/prover/src/core/backend/simd/fft/rfft.rs
line 312 at r2 (raw file):
twiddle_dbl: u32x16, ) -> (PackedBaseField, PackedBaseField) { let prod = {
Same
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 1 of 4 files reviewed, 8 unresolved discussions (waiting on @spapinistarkware)
crates/prover/src/core/backend/simd/fft/ifft.rs
line 156 at r2 (raw file):
Previously, spapinistarkware (Shahar Papini) wrote…
Perhaps it's better to change the type to u32 in the callers as well?
I see there's a lot of "as" casting here, which we probably want to avoid if possible.
These are changed in the next PR.
https://reviewable.io/reviews/starkware-libs/stwo/596
crates/prover/src/core/backend/simd/fft/ifft.rs
line 291 at r2 (raw file):
Previously, spapinistarkware (Shahar Papini) wrote…
Can you extract this block to a function mul_twiddle?
Done.
crates/prover/src/core/backend/simd/fft/ifft.rs
line 346 at r2 (raw file):
Previously, spapinistarkware (Shahar Papini) wrote…
The explicit number below make this comment redundant. Remove it.
Done.
crates/prover/src/core/backend/simd/fft/mod.rs
line 65 at r2 (raw file):
Previously, spapinistarkware (Shahar Papini) wrote…
I think this name should change, as it is a specific intel intrinsic. Perhaps just inline it?
renamed in the next PR https://reviewable.io/reviews/starkware-libs/stwo/596
crates/prover/src/core/backend/simd/fft/mod.rs
line 117 at r2 (raw file):
Previously, spapinistarkware (Shahar Papini) wrote…
why simd_swizzle! doesnt work here?
Updated.
crates/prover/src/core/backend/simd/fft/mod.rs
line 122 at r2 (raw file):
Previously, spapinistarkware (Shahar Papini) wrote…
I don't think we should implement these again.
For architectures that in regular mul, need to double one argument, they should be refactored and the part after doubling extracted.
For other architectures, there's no need to save the twiddle as soubled at all. This was just done to save ops.
Extracting the mul part after the doubling has removed a lot of the code duplication but WDYT about keeping the twiddles double for all platforms at the moment? Pros: All docs and naming is already for double twiddles, No performance cost to any platform leaving as is. Cons: we have two very similar looking mul implementations for Neon (as AFAIK this is the only architechture that has the doubling mul instruction).
crates/prover/src/core/backend/simd/fft/rfft.rs
line 312 at r2 (raw file):
Previously, spapinistarkware (Shahar Papini) wrote…
Same
Done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 1 of 4 files reviewed, 8 unresolved discussions (waiting on @spapinistarkware)
crates/prover/src/core/backend/simd/fft/mod.rs
line 79 at r2 (raw file):
Previously, spapinistarkware (Shahar Papini) wrote…
Same here. Remove line.
Done.
6cbca64
to
f6c5046
Compare
941402f
to
0fdf4c7
Compare
f6c5046
to
6a263ed
Compare
0fdf4c7
to
d68543b
Compare
6a263ed
to
dc11de3
Compare
d68543b
to
3cb07a8
Compare
1b95f6d
to
aa6e648
Compare
3cb07a8
to
16f412d
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 0 of 5 files reviewed, 8 unresolved discussions (waiting on @spapinistarkware)
crates/prover/src/core/backend/simd/fft/mod.rs
line 122 at r2 (raw file):
Previously, andrewmilson (Andrew Milson) wrote…
Extracting the mul part after the doubling has removed a lot of the code duplication but WDYT about keeping the twiddles double for all platforms at the moment? Pros: All docs and naming is already for double twiddles, No performance cost to any platform leaving as is. Cons: we have two very similar looking mul implementations for Neon (as AFAIK this is the only architechture that has the doubling mul instruction).
Added as a TODO. Will follow up in another PR.
16f412d
to
83fd68d
Compare
aa6e648
to
64fd01c
Compare
83fd68d
to
1ade01d
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 1 of 4 files at r1, 1 of 4 files at r5, 1 of 4 files at r6.
Reviewable status: 2 of 5 files reviewed, all discussions resolved
1ade01d
to
42df366
Compare
This change is