-
-
Notifications
You must be signed in to change notification settings - Fork 440
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
added vectorization to generate_n #6215
base: master
Are you sure you want to change the base?
Conversation
@Johan511 could you create graphs that use the same y-axis limits, please? |
The code you proposed touches on sequential operations only. Could you measure the sequential speedup as well? |
The change works for par_unseq too as parallel version of generate_n works by calling sequential generate on chunks. Will post speedups for unseq soon. |
Please note that the performance gains are actually not very significant. Without -O3 flagBenchmark Time CPU Iterations With -O3 flagBenchmark Time CPU Iterations |
You should always enable all optimizations for performance measurements. |
-O3 flag seems to tries vectorize most loops. Should I try compiling HPX with O2 flag and compare performance of vectorized vs non vectorized? Often times the performance on vectorization gains seem to be minimal as -O3 seems to already vectorize loops. |
@Johan511 could you please rebase this onto master, now that the release is out? |
1234486
to
5e4f001
Compare
Performance test reportHPX PerformanceComparison
Info
Comparison
Info
Comparison
Info
Explanation of Symbols
|
inspect was reporting:
Please rebase one more time to pull in all changes from master. |
Signed-off-by: Johan511 <[email protected]>
5e4f001
to
c0d05c9
Compare
@hkaiser I have rebased and added unit tests to ensure everything is working with generate_n algorithm with unseq and par_unseq execution policies. And for the performance, both seq and unseq are almost generating same assembly with Release Mode (uses -O3). Because compiler is able to vectorize the loops in seq mode also, there is seems to be no extra gains using unseq. |
retest lsu |
Can we construct test cases where the compiler is not able to vectorize things on its own? |
added vectorization to generate_n
hpx::generate_n calls std::generate_n, if in parallel mode it splits up the work into chunks and calls generate_n on each chunk.
Previously no execution policy was specified for std::generate_n (defaulted to seq), this PR changes it and mentions seq or unseq based on hpx::execution policy mentioned by user
par_unseq:
par: