Skip to content

Commit

Permalink
Merge benchmark scripts into the main branch (#5913)
Browse files Browse the repository at this point in the history
* Adding the benchmarking with TorchBench (#5788)

* Initial commit with dummy model benchmark

* add XRT support

* Add torchbench benchmark models

* add randomize_input

* add model set up for torchbench model

* update ExperimentLoader

* Add saving results

* minor args update

* update style

* add experiment name

* add grad context for eval and train

* minor user config update

* fix train() return item

* minor refactor

* add dynamo options

* add column in result for dynamo setting

* using  to capture output and error

* Fix some failure cases for dynamo

* reduce eval result size by returning eval loss

* minor refactor

* revert eval result change

* minor fix

* Change output format to jsonl

* Add accelerator model nname

* add skipping finished experiments

* main process needs to remove PJRT_DEVICE env var that is automatically added

* Add a simple result analyzer

* Result analyzer save to database csv with historical data

* Handle detectron2 models

* minor update

* add deny list

* Create run_benchmark

* Rename run_benchmark to run_benchmark.sh

* Fix device names and dynamo backend names in benchmark runner (#5806)

* update optimizer for openxla

* Add benchmark selection by tier 1-3 (#5808)

* Apply Pytorch/XLA formatting style (#5816)

* Add top tier benchmark runner (#5809)

* Add profiling capabilities to experiment_runner.py script (#5812)

* update run model config call interface, optimizer and result analyze script

* update dependency errir

* Add profiling capabilties

---------

Co-authored-by: zpcore <[email protected]>

* benchmarks: add script to aggregate results from result_analyzer (#5829)

* benchmarks: extract tiers into their own file

So that they can be reused in other files. The second user is coming
next.

* benchmarks: add aggregate.py

This script processes output CSV files from results_analyzer to
generate CSV/plots. Example:

$ for fmt in csv png; do \
    for acc in v100 a6000; do \
      for report in latest histogram speedup; do \
        for test in training inference; do \
          FILENAME=/tmp/png/$acc-$test-$report.$fmt; \
          python3 aggregate.py \
	  --accelerator=$acc \
	  --test=$test \
	  -i /tmp/csv-depot \
	  --report=$report \
	  --title="All benchmarks" \
	  --format=$fmt > $FILENAME || break; \
	  chmod 644 $FILENAME; \
        done; \
      done; \
    done; \
  done

This generates plots and CSV files to summarize the latest
performance vs. Inductor, as well as a histogram and a geomean
speedup over time for all the input CSV data in /tmp/csv-depot.
Results are broken down per accelerator and either inference or
training.

To generate results per tier, we just have to pass --filter-by-tier
to the above and update the title to --title="Tier 1".

* Fix syntax in experiment_runner.py (#5827)

* Add flag to forward XLA flags and allow for experiment expansion (#5828)

* Add hide-errors flag to result analyzer (#5836)

* Add readme and linting

* Fix ClusterResolver

---------

Co-authored-by: Liyang90 <[email protected]>
Co-authored-by: Manfei <[email protected]>
Co-authored-by: zpcore <[email protected]>
Co-authored-by: Grzegorz Olechwierowicz <[email protected]>
Co-authored-by: Emilio Cota <[email protected]>
  • Loading branch information
6 people committed Jan 12, 2024
1 parent 880a31c commit ada54e5
Show file tree
Hide file tree
Showing 13 changed files with 2,020 additions and 2 deletions.
2 changes: 1 addition & 1 deletion .github/workflows/lintercheck.yml
Original file line number Diff line number Diff line change
Expand Up @@ -66,7 +66,7 @@ jobs:
exit 1
fi
yapf -i -r *.py test/ scripts/ torch_xla/
yapf -i -r *.py test/ scripts/ torch_xla/ benchmarks/
git_status=$(git status --porcelain)
if [[ $git_status ]]; then
git diff
Expand Down
2 changes: 1 addition & 1 deletion .vscode/settings.json
Original file line number Diff line number Diff line change
Expand Up @@ -19,4 +19,4 @@
],
"python.formatting.provider": "yapf",
"editor.formatOnSave": true
}
}
55 changes: 55 additions & 0 deletions benchmarks/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,55 @@
# Benchmarking

The two main benchmarking scripts are
- `experiment_runner.py` to run benchmark experiments, and
- `result_analyzer.py` to aggregate the benchmark result in CSV form.


## Experiment runner

Run the `experiment_runner.py` from the `pytorch` directory, which should be the
parent of the `xla` directory.

The following example runs the alexnet benchmark on GPU through the
Pytorch/XLA-dynamo path and through the Inductor-dynamo with 5 repetitions each.
The results will be stored in a json file in `experiment_results`.

```
cd pytorch
python xla/benchmarks/experiment_runner.py \
--dynamo=openxla_eval --dynamo=openxla --dynamo=inductor \
--xla=PJRT --xla=None \
--test=eval --test=train \
--suite-name=torchbench \
--accelerator=cuda \
--output-dirname=experiment_results \
--repeat=5 \
--print-subprocess \
--no-resume \
--filter="^alexnet$"
```

You can change the flags to add the configurations you are interested in. The
`experiment_runner.py` will expand the options to all supported configurations.
For example, in the case above, it will consider all the possible combinations
among the flags `--dynamo`, `--xla`, and `--test`, 4 of which are supported:

- `dynamo=openxla_eval`, `xla=PJRT`, `test=eval`
- `dynamo=openxla`, `xla=PJRT`, `test=train`
- `dynamo=inductor`, `xla=None`, `test=eval`
- `dynamo=inductor`, `xla=None`, `test=train`


## Result analyzer

Run the `result_analyzer.py` from the `pytorch` directory, which should be the
parent of the `xla` directory.

The following example analyzes the results generated by the above invocation of
`experiment_runner.py`. The aggregates are saved in CSV format in
`experiment_results/metric_report.csv`.

```
cd pytorch
python xla/benchmarks/result_analyzer.py --output-dirname=experiment_results
```
Loading

0 comments on commit ada54e5

Please sign in to comment.