-
Notifications
You must be signed in to change notification settings - Fork 49
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
some performance optimizations in decoders #369
Conversation
87a6ace
to
4300372
Compare
4300372
to
1ac7557
Compare
There is some improvement for most of the Update
|
7341b00
to
637cd3a
Compare
I think this is ready for review. I hope that these improvements results for evaluate decoders (#369 (comment)) are plausible and sensible. Thank you! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, this is a good idea!
Currently I do not believe that most of these changes do much though.
I like the
row_x = @view syndrome_samples[:,1:d.nx]
row_z = @view syndrome_samples[:,d.nx+1:end]
and the improvement to evaluate_gueses
, but I do not believe the other changes help and would prefer to have them reverted.
Could you do microbenchmarks to prove what works here?
ext/QuantumCliffordLDPCDecodersExt/QuantumCliffordLDPCDecodersExt.jl
Outdated
Show resolved
Hide resolved
ext/QuantumCliffordLDPCDecodersExt/QuantumCliffordLDPCDecodersExt.jl
Outdated
Show resolved
Hide resolved
ext/QuantumCliffordPyQDecodersExt/QuantumCliffordPyQDecodersExt.jl
Outdated
Show resolved
Hide resolved
ext/QuantumCliffordPyQDecodersExt/QuantumCliffordPyQDecodersExt.jl
Outdated
Show resolved
Hide resolved
ext/QuantumCliffordPyQDecodersExt/QuantumCliffordPyQDecodersExt.jl
Outdated
Show resolved
Hide resolved
src/ecc/decoder_pipeline.jl
Outdated
for i in 1:nsamples | ||
is_decoded = true | ||
for j in 1:size(faults_matrix, 1) | ||
sum_mod = 0 | ||
@inbounds @simd for k in 1:size(faults_matrix, 2) | ||
sum_mod += faults_matrix[j, k] * guesses[i, k] | ||
end | ||
sum_mod %= 2 | ||
if sum_mod != measured_faults[i, j] | ||
is_decoded = false | ||
break | ||
end | ||
end | ||
decoded += is_decoded |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems promising, but is it actually faster on its own? It probably needs to be benchmarked separately
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Most of the performance improvements occurred due to evaluate guesses
. I noticed that the in-place approach to vcat and hcat slowed things down, increased memory/time ratio, that was why the improvements was not that visible. I have reverted back as per your comments. Now, it looks better: #369 (comment)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
#Micro benchmarks: evaluate guesses
julia> using BenchmarkTools
julia> n_faults = 500;
julia> n_guesses = 100;
julia> measured_faults = rand(0:1, n_guesses, n_faults);
julia> guesses = rand(0:1, n_guesses, n_faults);
julia> faults_matrix = rand(0:1, n_faults, n_faults);
julia> function evaluate_guesses_view(measured_faults, guesses, faults_matrix)
nsamples = size(guesses, 1)
guess_faults = (faults_matrix * guesses') .% 2
decoded = 0
for i in 1:nsamples
(@view guess_faults[:,i]) == (@view measured_faults[i,:]) && (decoded += 1)
end
return (nsamples - decoded) / nsamples
end
julia> function evaluate_guesses_loop(measured_faults, guesses, faults_matrix)
nsamples = size(guesses, 1)
decoded = 0
for i in 1:nsamples
is_decoded = true
for j in 1:size(faults_matrix, 1)
sum_mod = 0
@inbounds @simd for k in 1:size(faults_matrix, 2)
sum_mod += faults_matrix[j, k] * guesses[i, k]
end
sum_mod %= 2
if sum_mod != measured_faults[i, j]
is_decoded = false
break
end
end
decoded += is_decoded
end
return (nsamples - decoded) / nsamples
end
julia> @benchmark evaluate_guesses_view($measured_faults, $guesses, $faults_matrix)
BenchmarkTools.Trial: 423 samples with 1 evaluation.
Range (min … max): 9.462 ms … 100.313 ms ┊ GC (min … max): 0.00% … 87.81%
Time (median): 11.613 ms ┊ GC (median): 0.00%
Time (mean ± σ): 11.808 ms ± 4.717 ms ┊ GC (mean ± σ): 2.29% ± 5.55%
▂▂ ▅█▇▄ ▂
██▅████▅▄▂▃▄▃▄▄▃▄▄▅▇▆▇▃▇▇▄█▇▇█▆▅█▆▇▆▇▆▅▇▄▃▂▅▂▃▃▃▂▂▃▂▃▁▂▁▁▁▂▂ ▄
9.46 ms Histogram: frequency by time 15.5 ms <
Memory estimate: 812.11 KiB, allocs estimate: 8.
julia> @benchmark evaluate_guesses_loop($measured_faults, $guesses, $faults_matrix)
BenchmarkTools.Trial: 10000 samples with 1 evaluation.
Range (min … max): 119.402 μs … 1.440 ms ┊ GC (min … max): 0.00% … 0.00%
Time (median): 127.033 μs ┊ GC (median): 0.00%
Time (mean ± σ): 135.949 μs ± 49.806 μs ┊ GC (mean ± σ): 0.00% ± 0.00%
█ ▇▃▄▃▂▁▁ ▁▁▁▂▁ ▁ ▁
█▆▄███████████████████████████▇▇██▇▇▇▆▆▇▇▆▆▅▅▅▄▆▇▆▄▆▄▄▄▄▅▃▃▅ █
119 μs Histogram: log(frequency) by time 246 μs <
Memory estimate: 0 bytes, allocs estimate: 0.
# Micro benchmarks attached, with and without view:
julia> syndrome_samples = rand(1000, 1000) ;
julia> d = Dict(:nx => 500);
julia> function with_view(syndrome_samples, d)
row_x = @view syndrome_samples[:, 1:d[:nx]]
row_z = @view syndrome_samples[:, d[:nx]+1:end]
return row_x, row_z
end
julia> function without_view(syndrome_samples, d)
row_x = syndrome_samples[:, 1:d[:nx]]
row_z = syndrome_samples[:, d[:nx]+1:end]
return row_x, row_z
end
julia> @benchmark with_view($syndrome_samples, $d)
# Benchmarking without @view
BenchmarkTools.Trial: 10000 samples with 996 evaluations.
Range (min … max): 24.653 ns … 225.071 ns ┊ GC (min … max): 0.00% … 0.00%
Time (median): 24.912 ns ┊ GC (median): 0.00%
Time (mean ± σ): 27.309 ns ± 7.016 ns ┊ GC (mean ± σ): 0.00% ± 0.00%
█ ▇▄▄▄ ▁ ▂
█▁█████▄▄▃▃▄▅▄▃▃▁▃▁▃▄▇▇▇▆▄▇▆▆█▇▇▇██▇█▆▆▅▅▅▄▄▆▆▇▆▅▅▆▇▇▄▄▆▅▆▆▇ █
24.7 ns Histogram: log(frequency) by time 60 ns <
Memory estimate: 0 bytes, allocs estimate: 0.
julia> @benchmark without_view($syndrome_samples, $d)
BenchmarkTools.Trial: 3337 samples with 1 evaluation.
Range (min … max): 995.098 μs … 7.976 ms ┊ GC (min … max): 0.00% … 85.61%
Time (median): 1.064 ms ┊ GC (median): 0.00%
Time (mean ± σ): 1.491 ms ± 795.193 μs ┊ GC (mean ± σ): 6.96% ± 13.21%
██▆▄▂ ▁▃▃▂ ▂▃▂▁ ▁▂▃▂ ▁
██████▆▆▅▆▃▅▃▃▁▁▅▁▁▃▄▄▃▃▁▆█████▇▄▄▄▃▅▃▁▁▁▁▁▁▁▇█████▆▅▆█████▇▆ █
995 μs Histogram: log(frequency) by time 3.33 ms <
Memory estimate: 7.63 MiB, allocs estimate: 4.
Thanks for your comments. As requested, MicroBenchmarks are attached here: #369 (comment). I have removed all the unnecessary changes except the Table taken from: https://github.com/QuantumSavory/QuantumClifford.jl/actions/runs/11060472698/job/30731074834?pr=369#step:6:1140
|
This PR is about removing the performance detriments because of allocations introduced by
vcat
andhcat
in the decoders. In addition,evaluate_guesses
is also made allocation free. Attempt is made to improve thedecode
andbatchdecode
as well. Instead of unnecessary copying,@view
has been used. I have also checked this there is not any JET errors introduced as well. Resource: Performance Tips guide for better understanding of the context.vcat
andhcat
from decoders in accordance with In-place and batchdecode
functions #217evaluate_guesses
allocation free using loop in accordance with none of the decoders and evaluators should allocate #219...