You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I am applying your nice tool to typical stencil applications and I am observing very long simulation runtimes on high-dimensional stencils (several orders of magnitude longer than execution time). Most of the time is spent in the "warmup phase" and I am wondering about this:
Does it assume that only one element is loaded/stored to the cache per iteration? On higher-dimensional stencils, I easily read 100-1000 elements per iteration.
So could something like this be used instead of element_size:
sorry for the long delay and thanks for getting in touch with us.
The warmup-phase should be limited by invalid_entries > 0 (becoming 0 when the cache is fully initialized, indirectly taking the number of accessed elements into account). You can try decreasing warmup_increment, but I wouldn't expect much improvement. A more likely target for improvements would be the subsequently called functions:
Hello,
I am applying your nice tool to typical stencil applications and I am observing very long simulation runtimes on high-dimensional stencils (several orders of magnitude longer than execution time). Most of the time is spent in the "warmup phase" and I am wondering about this:
kerncraft/kerncraft/cacheprediction.py
Line 563 in b5a302d
Does it assume that only one element is loaded/stored to the cache per iteration? On higher-dimensional stencils, I easily read 100-1000 elements per iteration.
So could something like this be used instead of element_size:
kerncraft/kerncraft/cacheprediction.py
Line 548 in b5a302d
, but estimated on read elements per iteration? If this leads to inaccuracy, would this still be reasonably accurate?
I would have researched this in the related publications, but I couldn't find those details.
Thanks in advance!
The text was updated successfully, but these errors were encountered: