-
Notifications
You must be signed in to change notification settings - Fork 487
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Collect CUDA/CPU profiling info into result sheets. #5921
Conversation
This PR: 0. Adds CUDA/CPU collection capabilties to the script. 1. Modifies result_analyzer.py to analyze newly collected results. 2. Moves CUDA synchronize/XLA device synchronize into the profiler. 3. Fixes list typing for Python 3.8+. Tested with command: python3 xla/benchmarks/experiment_runner.py --dynamo=openxla --xla=PJRT --test=train --filter=basic_gnn_gcn$ --suite-name=torchbench --accelerator=cuda --progress-bar --output-dirname=/tmp/output --repeat=2 --print-subprocess --no-resume --profile-cuda-cpu-collect --profile-cuda python3 xla/benchmarks/result_analyzer.py --output-dir=/tmp/output
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice, ty!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, one comment.
53b80a7
to
dde4ea4
Compare
dde4ea4
to
ac84ed1
Compare
) | ||
return | ||
|
||
kernel_dump = prof.profiler.total_average() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Curious where is this total_average() defined?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
* Collect CUDA/CPU profiling info into result sheets. This PR: 0. Adds CUDA/CPU collection capabilties to the script. 1. Modifies result_analyzer.py to analyze newly collected results. 2. Moves CUDA synchronize/XLA device synchronize into the profiler. 3. Fixes list typing for Python 3.8+. Tested with command: python3 xla/benchmarks/experiment_runner.py --dynamo=openxla --xla=PJRT --test=train --filter=basic_gnn_gcn$ --suite-name=torchbench --accelerator=cuda --progress-bar --output-dirname=/tmp/output --repeat=2 --print-subprocess --no-resume --profile-cuda-cpu-collect --profile-cuda python3 xla/benchmarks/result_analyzer.py --output-dir=/tmp/output * Lint, and add _s suffix to metrics --------- Co-authored-by: root <[email protected]>
* Collect CUDA/CPU profiling info into result sheets. This PR: 0. Adds CUDA/CPU collection capabilties to the script. 1. Modifies result_analyzer.py to analyze newly collected results. 2. Moves CUDA synchronize/XLA device synchronize into the profiler. 3. Fixes list typing for Python 3.8+. Tested with command: python3 xla/benchmarks/experiment_runner.py --dynamo=openxla --xla=PJRT --test=train --filter=basic_gnn_gcn$ --suite-name=torchbench --accelerator=cuda --progress-bar --output-dirname=/tmp/output --repeat=2 --print-subprocess --no-resume --profile-cuda-cpu-collect --profile-cuda python3 xla/benchmarks/result_analyzer.py --output-dir=/tmp/output * Lint, and add _s suffix to metrics --------- Co-authored-by: root <[email protected]>
Thanks. cc @zpcore to take advantage of this feature in future benchmarking automation work. |
* Collect CUDA/CPU profiling info into result sheets. This PR: 0. Adds CUDA/CPU collection capabilties to the script. 1. Modifies result_analyzer.py to analyze newly collected results. 2. Moves CUDA synchronize/XLA device synchronize into the profiler. 3. Fixes list typing for Python 3.8+. Tested with command: python3 xla/benchmarks/experiment_runner.py --dynamo=openxla --xla=PJRT --test=train --filter=basic_gnn_gcn$ --suite-name=torchbench --accelerator=cuda --progress-bar --output-dirname=/tmp/output --repeat=2 --print-subprocess --no-resume --profile-cuda-cpu-collect --profile-cuda python3 xla/benchmarks/result_analyzer.py --output-dir=/tmp/output * Lint, and add _s suffix to metrics --------- Co-authored-by: root <[email protected]>
* Collect CUDA/CPU profiling info into result sheets. This PR: 0. Adds CUDA/CPU collection capabilties to the script. 1. Modifies result_analyzer.py to analyze newly collected results. 2. Moves CUDA synchronize/XLA device synchronize into the profiler. 3. Fixes list typing for Python 3.8+. Tested with command: python3 xla/benchmarks/experiment_runner.py --dynamo=openxla --xla=PJRT --test=train --filter=basic_gnn_gcn$ --suite-name=torchbench --accelerator=cuda --progress-bar --output-dirname=/tmp/output --repeat=2 --print-subprocess --no-resume --profile-cuda-cpu-collect --profile-cuda python3 xla/benchmarks/result_analyzer.py --output-dir=/tmp/output * Lint, and add _s suffix to metrics --------- Co-authored-by: root <[email protected]>
* Collect CUDA/CPU profiling info into result sheets. This PR: 0. Adds CUDA/CPU collection capabilties to the script. 1. Modifies result_analyzer.py to analyze newly collected results. 2. Moves CUDA synchronize/XLA device synchronize into the profiler. 3. Fixes list typing for Python 3.8+. Tested with command: python3 xla/benchmarks/experiment_runner.py --dynamo=openxla --xla=PJRT --test=train --filter=basic_gnn_gcn$ --suite-name=torchbench --accelerator=cuda --progress-bar --output-dirname=/tmp/output --repeat=2 --print-subprocess --no-resume --profile-cuda-cpu-collect --profile-cuda python3 xla/benchmarks/result_analyzer.py --output-dir=/tmp/output * Lint, and add _s suffix to metrics --------- Co-authored-by: root <[email protected]>
This PR:
Tested with command: