Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Error in likwid-accuracy.py due to non-matching REGEX_BENCH #652

Open
aproeme opened this issue Nov 28, 2024 · 1 comment
Open

[BUG] Error in likwid-accuracy.py due to non-matching REGEX_BENCH #652

aproeme opened this issue Nov 28, 2024 · 1 comment
Labels

Comments

@aproeme
Copy link

aproeme commented Nov 28, 2024

Dear LIKWID devs,

Describe the bug

When I use the accuracy test tool likwid-accuracy.py with either of the predefined BRANCH or DATA test files, I get the following error:

Group BRANCH Test load
Size 12kB:  *
Traceback (most recent call last):
  File "./likwid-accuracy.py", line 637, in <module>
    fp_ref.write("%d %d %f %f\n" % (counter, get_bytes(size), float(match.group(1)), float(value)))
AttributeError: 'NoneType' object has no attribute 'group'

This is because match does not contain a result here, ultimately because REGEX_BENCH is defined as NOTHING in BRANCH.txt and DATA.txt hence nothing is extracted from the output of running likwid-bench-marker with likwid-perfctr.

I assume line 637 should actually be as follows (where value has been set to 0 just above):

fp_ref.write("%d %d %f %f\n" % (counter, get_bytes(size), float(value), float(value)))

This does raise a question though, namely where to find the relevant reference metrics - "instructions per branch" for BRANCH and "load to store ratio" for DATA - and get them plotted to compare likwid-perfctr output to, as you have done in many of the example accuracy test results for various architectures shared on the wiki. You mention the following on the wiki at https://github.com/RRZE-HPC/likwid/wiki/TestAccuracy:

For some tests, likwid-bench does not calculate and print the appropriate metrics, like 'Instructions per branch', but they are commonly constant, hence we can define the result in the test input files.

Could I ask you to please clarify this and thereby help me understand

  • Where I can obtain appropriate reference values (if they are not already defined in the .ptt test input files)?
  • How I can include these reference values in plots?

To Reproduce
I'm using LIKWID release 5.3.0 and after building likwid-bench-marker and likwid-bench-plain in /test/accuracy, running as follows for the case of predefined BRANCH tests:

./likwid-accuracy.py -s BRANCH --pgf --gnuplot

Additional context
Although I checked that I encountered the error above with the unmodified 5.3.0 release run with Python2, I am also updating the accuracy tests to Python3 for convenience. For the tests that did not fail as above, I also seemed to need to make various fixes to get things to work (see master...aproeme:likwid:master), including:

  • some small fixes to REGEX syntax in the predefined test file definitions (not yet committed)
  • fixes to pgfplot syntax (possibly default pgfplot behaviour has changed since likwid-accuracy.py was last used/developed).
  • some seemingly useful additions to the accuracy test Makefile.

If this is of interest for integration I could submit a pull request for review once done.

FYI the system I'm verifying on is AMD Zen2 architecture (this is on ARCHER2: https://www.archer2.ac.uk)

@aproeme aproeme added the bug label Nov 28, 2024
@TomTheBear
Copy link
Member

Thanks for opening the issue and already working on likwid-accuracy.py.

I have multiple versions of this file, maybe with some of the changes you did but I have never think about committing it. Mostly because I started doing these measurements using ReFrame but it is not publishable yet.

My approach was to add the reference values for BRANCH (instructions per branch) and DATA (load/store ratio) to the files in TESTS. The value of X (instructions per branch or load/store ratio) is constant for all test variants independent of the selected work group size. Since both of these groups do not require the WA_FACTOR (only data transfer related groups require it), I encoded the constant value there. That's why the load kernel measured with BRANCH has WA_FACTOR=11.0: https://github.com/RRZE-HPC/likwid/blob/master/bench/x86-64/load.ptt#L9

I look at your changes and comment on them tomorrow.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants