Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding the benchmarking with TorchBench #5788

Merged
merged 36 commits into from
Nov 13, 2023
Merged

Conversation

Liyang90
Copy link
Collaborator

Merging the benchmark fork

@JackCaoG
Copy link
Collaborator

Thanks @Liyang90 @zpcore can you verified if this script works on today's nightly before we merge this one?

@zpcore
Copy link
Collaborator

zpcore commented Nov 10, 2023

Thanks @Liyang90 @zpcore can you verified if this script works on today's nightly before we merge this one?

I don't think so, there are several things we want to update, e.g.,:

  1. zpcore@835b5ea: Dynamo backend naming

  2. zpcore@b146f0a Update with newest torchbench call api.

  3. zpcore@b853e85 Enable Dynamo optimizer.

@RissyRan may have other updates she wants to add.

Should we check in this version first and make the update next? We can add the CI once we are done with the update with the latest build.

@Liyang90
Copy link
Collaborator Author

Liyang90 commented Nov 10, 2023

Thanks @Liyang90 @zpcore can you verified if this script works on today's nightly before we merge this one?

I don't think so, there are several things we want to update, e.g.,:

  1. zpcore@835b5ea: Dynamo backend naming
  2. zpcore@b146f0a Update with newest torchbench call api.
  3. zpcore@b853e85 Enable Dynamo optimizer.

Should we check in this version first and make the update next? We can add the CI once we are done with the update with the latest build.

Right, I think this PR requires a follow up PR before its functionality can be tested.

This is okay as it is not the main branch.

@JackCaoG
Copy link
Collaborator

Yea I am ok with merging it as it is while we are working on follow up prs.

@zpcore zpcore merged commit 1b905cc into pytorch:benchmark Nov 13, 2023
1 check passed
@frgossen
Copy link
Collaborator

I ran into a few issues with the runner and think it needs a few minor changes.
https://github.com/pytorch/xla/pull/5806/files

frgossen pushed a commit to frgossen/pytorch-xla that referenced this pull request Nov 22, 2023
* Initial commit with dummy model benchmark

* add XRT support

* Add torchbench benchmark models

* add randomize_input

* add model set up for torchbench model

* update ExperimentLoader

* Add saving results

* minor args update

* update style

* add experiment name

* add grad context for eval and train

* minor user config update

* fix train() return item

* minor refactor

* add dynamo options

* add column in result for dynamo setting

* using  to capture output and error

* Fix some failure cases for dynamo

* reduce eval result size by returning eval loss

* minor refactor

* revert eval result change

* minor fix

* Change output format to jsonl

* Add accelerator model nname

* add skipping finished experiments

* main process needs to remove PJRT_DEVICE env var that is automatically added

* Add a simple result analyzer

* Result analyzer save to database csv with historical data

* Handle detectron2 models

* minor update

* add deny list
frgossen added a commit that referenced this pull request Nov 22, 2023
* Adding the benchmarking with TorchBench (#5788)

* Initial commit with dummy model benchmark

* add XRT support

* Add torchbench benchmark models

* add randomize_input

* add model set up for torchbench model

* update ExperimentLoader

* Add saving results

* minor args update

* update style

* add experiment name

* add grad context for eval and train

* minor user config update

* fix train() return item

* minor refactor

* add dynamo options

* add column in result for dynamo setting

* using  to capture output and error

* Fix some failure cases for dynamo

* reduce eval result size by returning eval loss

* minor refactor

* revert eval result change

* minor fix

* Change output format to jsonl

* Add accelerator model nname

* add skipping finished experiments

* main process needs to remove PJRT_DEVICE env var that is automatically added

* Add a simple result analyzer

* Result analyzer save to database csv with historical data

* Handle detectron2 models

* minor update

* add deny list

* Create run_benchmark

* Rename run_benchmark to run_benchmark.sh

* Fix device names and dynamo backend names in benchmark runner (#5806)

* update optimizer for openxla

* Add benchmark selection by tier 1-3 (#5808)

* Apply Pytorch/XLA formatting style (#5816)

* Add top tier benchmark runner (#5809)

* Add profiling capabilities to experiment_runner.py script (#5812)

* update run model config call interface, optimizer and result analyze script

* update dependency errir

* Add profiling capabilties

---------

Co-authored-by: zpcore <[email protected]>

* benchmarks: add script to aggregate results from result_analyzer (#5829)

* benchmarks: extract tiers into their own file

So that they can be reused in other files. The second user is coming
next.

* benchmarks: add aggregate.py

This script processes output CSV files from results_analyzer to
generate CSV/plots. Example:

$ for fmt in csv png; do \
    for acc in v100 a6000; do \
      for report in latest histogram speedup; do \
        for test in training inference; do \
          FILENAME=/tmp/png/$acc-$test-$report.$fmt; \
          python3 aggregate.py \
	  --accelerator=$acc \
	  --test=$test \
	  -i /tmp/csv-depot \
	  --report=$report \
	  --title="All benchmarks" \
	  --format=$fmt > $FILENAME || break; \
	  chmod 644 $FILENAME; \
        done; \
      done; \
    done; \
  done

This generates plots and CSV files to summarize the latest
performance vs. Inductor, as well as a histogram and a geomean
speedup over time for all the input CSV data in /tmp/csv-depot.
Results are broken down per accelerator and either inference or
training.

To generate results per tier, we just have to pass --filter-by-tier
to the above and update the title to --title="Tier 1".

* Fix syntax in experiment_runner.py (#5827)

* Add flag to forward XLA flags and allow for experiment expansion (#5828)

* Add hide-errors flag to result analyzer (#5836)

* Add readme and linting

* Fix ClusterResolver

---------

Co-authored-by: Liyang90 <[email protected]>
Co-authored-by: Manfei <[email protected]>
Co-authored-by: zpcore <[email protected]>
Co-authored-by: Grzegorz Olechwierowicz <[email protected]>
Co-authored-by: Emilio Cota <[email protected]>
lsy323 pushed a commit to lsy323/xla that referenced this pull request Nov 28, 2023
* Adding the benchmarking with TorchBench (pytorch#5788)

* Initial commit with dummy model benchmark

* add XRT support

* Add torchbench benchmark models

* add randomize_input

* add model set up for torchbench model

* update ExperimentLoader

* Add saving results

* minor args update

* update style

* add experiment name

* add grad context for eval and train

* minor user config update

* fix train() return item

* minor refactor

* add dynamo options

* add column in result for dynamo setting

* using  to capture output and error

* Fix some failure cases for dynamo

* reduce eval result size by returning eval loss

* minor refactor

* revert eval result change

* minor fix

* Change output format to jsonl

* Add accelerator model nname

* add skipping finished experiments

* main process needs to remove PJRT_DEVICE env var that is automatically added

* Add a simple result analyzer

* Result analyzer save to database csv with historical data

* Handle detectron2 models

* minor update

* add deny list

* Create run_benchmark

* Rename run_benchmark to run_benchmark.sh

* Fix device names and dynamo backend names in benchmark runner (pytorch#5806)

* update optimizer for openxla

* Add benchmark selection by tier 1-3 (pytorch#5808)

* Apply Pytorch/XLA formatting style (pytorch#5816)

* Add top tier benchmark runner (pytorch#5809)

* Add profiling capabilities to experiment_runner.py script (pytorch#5812)

* update run model config call interface, optimizer and result analyze script

* update dependency errir

* Add profiling capabilties

---------

Co-authored-by: zpcore <[email protected]>

* benchmarks: add script to aggregate results from result_analyzer (pytorch#5829)

* benchmarks: extract tiers into their own file

So that they can be reused in other files. The second user is coming
next.

* benchmarks: add aggregate.py

This script processes output CSV files from results_analyzer to
generate CSV/plots. Example:

$ for fmt in csv png; do \
    for acc in v100 a6000; do \
      for report in latest histogram speedup; do \
        for test in training inference; do \
          FILENAME=/tmp/png/$acc-$test-$report.$fmt; \
          python3 aggregate.py \
	  --accelerator=$acc \
	  --test=$test \
	  -i /tmp/csv-depot \
	  --report=$report \
	  --title="All benchmarks" \
	  --format=$fmt > $FILENAME || break; \
	  chmod 644 $FILENAME; \
        done; \
      done; \
    done; \
  done

This generates plots and CSV files to summarize the latest
performance vs. Inductor, as well as a histogram and a geomean
speedup over time for all the input CSV data in /tmp/csv-depot.
Results are broken down per accelerator and either inference or
training.

To generate results per tier, we just have to pass --filter-by-tier
to the above and update the title to --title="Tier 1".

* Fix syntax in experiment_runner.py (pytorch#5827)

* Add flag to forward XLA flags and allow for experiment expansion (pytorch#5828)

* Add hide-errors flag to result analyzer (pytorch#5836)

* Add readme and linting

* Fix ClusterResolver

---------

Co-authored-by: Liyang90 <[email protected]>
Co-authored-by: Manfei <[email protected]>
Co-authored-by: zpcore <[email protected]>
Co-authored-by: Grzegorz Olechwierowicz <[email protected]>
Co-authored-by: Emilio Cota <[email protected]>
chunnienc pushed a commit to chunnienc/xla that referenced this pull request Dec 14, 2023
* Adding the benchmarking with TorchBench (pytorch#5788)

* Initial commit with dummy model benchmark

* add XRT support

* Add torchbench benchmark models

* add randomize_input

* add model set up for torchbench model

* update ExperimentLoader

* Add saving results

* minor args update

* update style

* add experiment name

* add grad context for eval and train

* minor user config update

* fix train() return item

* minor refactor

* add dynamo options

* add column in result for dynamo setting

* using  to capture output and error

* Fix some failure cases for dynamo

* reduce eval result size by returning eval loss

* minor refactor

* revert eval result change

* minor fix

* Change output format to jsonl

* Add accelerator model nname

* add skipping finished experiments

* main process needs to remove PJRT_DEVICE env var that is automatically added

* Add a simple result analyzer

* Result analyzer save to database csv with historical data

* Handle detectron2 models

* minor update

* add deny list

* Create run_benchmark

* Rename run_benchmark to run_benchmark.sh

* Fix device names and dynamo backend names in benchmark runner (pytorch#5806)

* update optimizer for openxla

* Add benchmark selection by tier 1-3 (pytorch#5808)

* Apply Pytorch/XLA formatting style (pytorch#5816)

* Add top tier benchmark runner (pytorch#5809)

* Add profiling capabilities to experiment_runner.py script (pytorch#5812)

* update run model config call interface, optimizer and result analyze script

* update dependency errir

* Add profiling capabilties

---------

Co-authored-by: zpcore <[email protected]>

* benchmarks: add script to aggregate results from result_analyzer (pytorch#5829)

* benchmarks: extract tiers into their own file

So that they can be reused in other files. The second user is coming
next.

* benchmarks: add aggregate.py

This script processes output CSV files from results_analyzer to
generate CSV/plots. Example:

$ for fmt in csv png; do \
    for acc in v100 a6000; do \
      for report in latest histogram speedup; do \
        for test in training inference; do \
          FILENAME=/tmp/png/$acc-$test-$report.$fmt; \
          python3 aggregate.py \
	  --accelerator=$acc \
	  --test=$test \
	  -i /tmp/csv-depot \
	  --report=$report \
	  --title="All benchmarks" \
	  --format=$fmt > $FILENAME || break; \
	  chmod 644 $FILENAME; \
        done; \
      done; \
    done; \
  done

This generates plots and CSV files to summarize the latest
performance vs. Inductor, as well as a histogram and a geomean
speedup over time for all the input CSV data in /tmp/csv-depot.
Results are broken down per accelerator and either inference or
training.

To generate results per tier, we just have to pass --filter-by-tier
to the above and update the title to --title="Tier 1".

* Fix syntax in experiment_runner.py (pytorch#5827)

* Add flag to forward XLA flags and allow for experiment expansion (pytorch#5828)

* Add hide-errors flag to result analyzer (pytorch#5836)

* Add readme and linting

* Fix ClusterResolver

---------

Co-authored-by: Liyang90 <[email protected]>
Co-authored-by: Manfei <[email protected]>
Co-authored-by: zpcore <[email protected]>
Co-authored-by: Grzegorz Olechwierowicz <[email protected]>
Co-authored-by: Emilio Cota <[email protected]>
golechwierowicz added a commit that referenced this pull request Jan 12, 2024
* Adding the benchmarking with TorchBench (#5788)

* Initial commit with dummy model benchmark

* add XRT support

* Add torchbench benchmark models

* add randomize_input

* add model set up for torchbench model

* update ExperimentLoader

* Add saving results

* minor args update

* update style

* add experiment name

* add grad context for eval and train

* minor user config update

* fix train() return item

* minor refactor

* add dynamo options

* add column in result for dynamo setting

* using  to capture output and error

* Fix some failure cases for dynamo

* reduce eval result size by returning eval loss

* minor refactor

* revert eval result change

* minor fix

* Change output format to jsonl

* Add accelerator model nname

* add skipping finished experiments

* main process needs to remove PJRT_DEVICE env var that is automatically added

* Add a simple result analyzer

* Result analyzer save to database csv with historical data

* Handle detectron2 models

* minor update

* add deny list

* Create run_benchmark

* Rename run_benchmark to run_benchmark.sh

* Fix device names and dynamo backend names in benchmark runner (#5806)

* update optimizer for openxla

* Add benchmark selection by tier 1-3 (#5808)

* Apply Pytorch/XLA formatting style (#5816)

* Add top tier benchmark runner (#5809)

* Add profiling capabilities to experiment_runner.py script (#5812)

* update run model config call interface, optimizer and result analyze script

* update dependency errir

* Add profiling capabilties

---------

Co-authored-by: zpcore <[email protected]>

* benchmarks: add script to aggregate results from result_analyzer (#5829)

* benchmarks: extract tiers into their own file

So that they can be reused in other files. The second user is coming
next.

* benchmarks: add aggregate.py

This script processes output CSV files from results_analyzer to
generate CSV/plots. Example:

$ for fmt in csv png; do \
    for acc in v100 a6000; do \
      for report in latest histogram speedup; do \
        for test in training inference; do \
          FILENAME=/tmp/png/$acc-$test-$report.$fmt; \
          python3 aggregate.py \
	  --accelerator=$acc \
	  --test=$test \
	  -i /tmp/csv-depot \
	  --report=$report \
	  --title="All benchmarks" \
	  --format=$fmt > $FILENAME || break; \
	  chmod 644 $FILENAME; \
        done; \
      done; \
    done; \
  done

This generates plots and CSV files to summarize the latest
performance vs. Inductor, as well as a histogram and a geomean
speedup over time for all the input CSV data in /tmp/csv-depot.
Results are broken down per accelerator and either inference or
training.

To generate results per tier, we just have to pass --filter-by-tier
to the above and update the title to --title="Tier 1".

* Fix syntax in experiment_runner.py (#5827)

* Add flag to forward XLA flags and allow for experiment expansion (#5828)

* Add hide-errors flag to result analyzer (#5836)

* Add readme and linting

* Fix ClusterResolver

---------

Co-authored-by: Liyang90 <[email protected]>
Co-authored-by: Manfei <[email protected]>
Co-authored-by: zpcore <[email protected]>
Co-authored-by: Grzegorz Olechwierowicz <[email protected]>
Co-authored-by: Emilio Cota <[email protected]>
bhavya01 pushed a commit that referenced this pull request Apr 22, 2024
* Adding the benchmarking with TorchBench (#5788)

* Initial commit with dummy model benchmark

* add XRT support

* Add torchbench benchmark models

* add randomize_input

* add model set up for torchbench model

* update ExperimentLoader

* Add saving results

* minor args update

* update style

* add experiment name

* add grad context for eval and train

* minor user config update

* fix train() return item

* minor refactor

* add dynamo options

* add column in result for dynamo setting

* using  to capture output and error

* Fix some failure cases for dynamo

* reduce eval result size by returning eval loss

* minor refactor

* revert eval result change

* minor fix

* Change output format to jsonl

* Add accelerator model nname

* add skipping finished experiments

* main process needs to remove PJRT_DEVICE env var that is automatically added

* Add a simple result analyzer

* Result analyzer save to database csv with historical data

* Handle detectron2 models

* minor update

* add deny list

* Create run_benchmark

* Rename run_benchmark to run_benchmark.sh

* Fix device names and dynamo backend names in benchmark runner (#5806)

* update optimizer for openxla

* Add benchmark selection by tier 1-3 (#5808)

* Apply Pytorch/XLA formatting style (#5816)

* Add top tier benchmark runner (#5809)

* Add profiling capabilities to experiment_runner.py script (#5812)

* update run model config call interface, optimizer and result analyze script

* update dependency errir

* Add profiling capabilties

---------

Co-authored-by: zpcore <[email protected]>

* benchmarks: add script to aggregate results from result_analyzer (#5829)

* benchmarks: extract tiers into their own file

So that they can be reused in other files. The second user is coming
next.

* benchmarks: add aggregate.py

This script processes output CSV files from results_analyzer to
generate CSV/plots. Example:

$ for fmt in csv png; do \
    for acc in v100 a6000; do \
      for report in latest histogram speedup; do \
        for test in training inference; do \
          FILENAME=/tmp/png/$acc-$test-$report.$fmt; \
          python3 aggregate.py \
	  --accelerator=$acc \
	  --test=$test \
	  -i /tmp/csv-depot \
	  --report=$report \
	  --title="All benchmarks" \
	  --format=$fmt > $FILENAME || break; \
	  chmod 644 $FILENAME; \
        done; \
      done; \
    done; \
  done

This generates plots and CSV files to summarize the latest
performance vs. Inductor, as well as a histogram and a geomean
speedup over time for all the input CSV data in /tmp/csv-depot.
Results are broken down per accelerator and either inference or
training.

To generate results per tier, we just have to pass --filter-by-tier
to the above and update the title to --title="Tier 1".

* Fix syntax in experiment_runner.py (#5827)

* Add flag to forward XLA flags and allow for experiment expansion (#5828)

* Add hide-errors flag to result analyzer (#5836)

* Add readme and linting

* Fix ClusterResolver

---------

Co-authored-by: Liyang90 <[email protected]>
Co-authored-by: Manfei <[email protected]>
Co-authored-by: zpcore <[email protected]>
Co-authored-by: Grzegorz Olechwierowicz <[email protected]>
Co-authored-by: Emilio Cota <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants