Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add hide-errors flag to result analyzer #5836

Merged
merged 1 commit into from
Nov 21, 2023

Conversation

frgossen
Copy link
Collaborator

No description provided.

Copy link
Collaborator

@vanbasten23 vanbasten23 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@vanbasten23 vanbasten23 merged commit a64f4ca into pytorch:benchmark Nov 21, 2023
1 check passed
zpcore pushed a commit that referenced this pull request Nov 21, 2023
@frgossen frgossen deleted the hide-errors branch November 21, 2023 23:13
frgossen added a commit to frgossen/pytorch-xla that referenced this pull request Nov 22, 2023
frgossen added a commit that referenced this pull request Nov 22, 2023
* Adding the benchmarking with TorchBench (#5788)

* Initial commit with dummy model benchmark

* add XRT support

* Add torchbench benchmark models

* add randomize_input

* add model set up for torchbench model

* update ExperimentLoader

* Add saving results

* minor args update

* update style

* add experiment name

* add grad context for eval and train

* minor user config update

* fix train() return item

* minor refactor

* add dynamo options

* add column in result for dynamo setting

* using  to capture output and error

* Fix some failure cases for dynamo

* reduce eval result size by returning eval loss

* minor refactor

* revert eval result change

* minor fix

* Change output format to jsonl

* Add accelerator model nname

* add skipping finished experiments

* main process needs to remove PJRT_DEVICE env var that is automatically added

* Add a simple result analyzer

* Result analyzer save to database csv with historical data

* Handle detectron2 models

* minor update

* add deny list

* Create run_benchmark

* Rename run_benchmark to run_benchmark.sh

* Fix device names and dynamo backend names in benchmark runner (#5806)

* update optimizer for openxla

* Add benchmark selection by tier 1-3 (#5808)

* Apply Pytorch/XLA formatting style (#5816)

* Add top tier benchmark runner (#5809)

* Add profiling capabilities to experiment_runner.py script (#5812)

* update run model config call interface, optimizer and result analyze script

* update dependency errir

* Add profiling capabilties

---------

Co-authored-by: zpcore <[email protected]>

* benchmarks: add script to aggregate results from result_analyzer (#5829)

* benchmarks: extract tiers into their own file

So that they can be reused in other files. The second user is coming
next.

* benchmarks: add aggregate.py

This script processes output CSV files from results_analyzer to
generate CSV/plots. Example:

$ for fmt in csv png; do \
    for acc in v100 a6000; do \
      for report in latest histogram speedup; do \
        for test in training inference; do \
          FILENAME=/tmp/png/$acc-$test-$report.$fmt; \
          python3 aggregate.py \
	  --accelerator=$acc \
	  --test=$test \
	  -i /tmp/csv-depot \
	  --report=$report \
	  --title="All benchmarks" \
	  --format=$fmt > $FILENAME || break; \
	  chmod 644 $FILENAME; \
        done; \
      done; \
    done; \
  done

This generates plots and CSV files to summarize the latest
performance vs. Inductor, as well as a histogram and a geomean
speedup over time for all the input CSV data in /tmp/csv-depot.
Results are broken down per accelerator and either inference or
training.

To generate results per tier, we just have to pass --filter-by-tier
to the above and update the title to --title="Tier 1".

* Fix syntax in experiment_runner.py (#5827)

* Add flag to forward XLA flags and allow for experiment expansion (#5828)

* Add hide-errors flag to result analyzer (#5836)

* Add readme and linting

* Fix ClusterResolver

---------

Co-authored-by: Liyang90 <[email protected]>
Co-authored-by: Manfei <[email protected]>
Co-authored-by: zpcore <[email protected]>
Co-authored-by: Grzegorz Olechwierowicz <[email protected]>
Co-authored-by: Emilio Cota <[email protected]>
lsy323 pushed a commit to lsy323/xla that referenced this pull request Nov 28, 2023
* Adding the benchmarking with TorchBench (pytorch#5788)

* Initial commit with dummy model benchmark

* add XRT support

* Add torchbench benchmark models

* add randomize_input

* add model set up for torchbench model

* update ExperimentLoader

* Add saving results

* minor args update

* update style

* add experiment name

* add grad context for eval and train

* minor user config update

* fix train() return item

* minor refactor

* add dynamo options

* add column in result for dynamo setting

* using  to capture output and error

* Fix some failure cases for dynamo

* reduce eval result size by returning eval loss

* minor refactor

* revert eval result change

* minor fix

* Change output format to jsonl

* Add accelerator model nname

* add skipping finished experiments

* main process needs to remove PJRT_DEVICE env var that is automatically added

* Add a simple result analyzer

* Result analyzer save to database csv with historical data

* Handle detectron2 models

* minor update

* add deny list

* Create run_benchmark

* Rename run_benchmark to run_benchmark.sh

* Fix device names and dynamo backend names in benchmark runner (pytorch#5806)

* update optimizer for openxla

* Add benchmark selection by tier 1-3 (pytorch#5808)

* Apply Pytorch/XLA formatting style (pytorch#5816)

* Add top tier benchmark runner (pytorch#5809)

* Add profiling capabilities to experiment_runner.py script (pytorch#5812)

* update run model config call interface, optimizer and result analyze script

* update dependency errir

* Add profiling capabilties

---------

Co-authored-by: zpcore <[email protected]>

* benchmarks: add script to aggregate results from result_analyzer (pytorch#5829)

* benchmarks: extract tiers into their own file

So that they can be reused in other files. The second user is coming
next.

* benchmarks: add aggregate.py

This script processes output CSV files from results_analyzer to
generate CSV/plots. Example:

$ for fmt in csv png; do \
    for acc in v100 a6000; do \
      for report in latest histogram speedup; do \
        for test in training inference; do \
          FILENAME=/tmp/png/$acc-$test-$report.$fmt; \
          python3 aggregate.py \
	  --accelerator=$acc \
	  --test=$test \
	  -i /tmp/csv-depot \
	  --report=$report \
	  --title="All benchmarks" \
	  --format=$fmt > $FILENAME || break; \
	  chmod 644 $FILENAME; \
        done; \
      done; \
    done; \
  done

This generates plots and CSV files to summarize the latest
performance vs. Inductor, as well as a histogram and a geomean
speedup over time for all the input CSV data in /tmp/csv-depot.
Results are broken down per accelerator and either inference or
training.

To generate results per tier, we just have to pass --filter-by-tier
to the above and update the title to --title="Tier 1".

* Fix syntax in experiment_runner.py (pytorch#5827)

* Add flag to forward XLA flags and allow for experiment expansion (pytorch#5828)

* Add hide-errors flag to result analyzer (pytorch#5836)

* Add readme and linting

* Fix ClusterResolver

---------

Co-authored-by: Liyang90 <[email protected]>
Co-authored-by: Manfei <[email protected]>
Co-authored-by: zpcore <[email protected]>
Co-authored-by: Grzegorz Olechwierowicz <[email protected]>
Co-authored-by: Emilio Cota <[email protected]>
chunnienc pushed a commit to chunnienc/xla that referenced this pull request Dec 14, 2023
* Adding the benchmarking with TorchBench (pytorch#5788)

* Initial commit with dummy model benchmark

* add XRT support

* Add torchbench benchmark models

* add randomize_input

* add model set up for torchbench model

* update ExperimentLoader

* Add saving results

* minor args update

* update style

* add experiment name

* add grad context for eval and train

* minor user config update

* fix train() return item

* minor refactor

* add dynamo options

* add column in result for dynamo setting

* using  to capture output and error

* Fix some failure cases for dynamo

* reduce eval result size by returning eval loss

* minor refactor

* revert eval result change

* minor fix

* Change output format to jsonl

* Add accelerator model nname

* add skipping finished experiments

* main process needs to remove PJRT_DEVICE env var that is automatically added

* Add a simple result analyzer

* Result analyzer save to database csv with historical data

* Handle detectron2 models

* minor update

* add deny list

* Create run_benchmark

* Rename run_benchmark to run_benchmark.sh

* Fix device names and dynamo backend names in benchmark runner (pytorch#5806)

* update optimizer for openxla

* Add benchmark selection by tier 1-3 (pytorch#5808)

* Apply Pytorch/XLA formatting style (pytorch#5816)

* Add top tier benchmark runner (pytorch#5809)

* Add profiling capabilities to experiment_runner.py script (pytorch#5812)

* update run model config call interface, optimizer and result analyze script

* update dependency errir

* Add profiling capabilties

---------

Co-authored-by: zpcore <[email protected]>

* benchmarks: add script to aggregate results from result_analyzer (pytorch#5829)

* benchmarks: extract tiers into their own file

So that they can be reused in other files. The second user is coming
next.

* benchmarks: add aggregate.py

This script processes output CSV files from results_analyzer to
generate CSV/plots. Example:

$ for fmt in csv png; do \
    for acc in v100 a6000; do \
      for report in latest histogram speedup; do \
        for test in training inference; do \
          FILENAME=/tmp/png/$acc-$test-$report.$fmt; \
          python3 aggregate.py \
	  --accelerator=$acc \
	  --test=$test \
	  -i /tmp/csv-depot \
	  --report=$report \
	  --title="All benchmarks" \
	  --format=$fmt > $FILENAME || break; \
	  chmod 644 $FILENAME; \
        done; \
      done; \
    done; \
  done

This generates plots and CSV files to summarize the latest
performance vs. Inductor, as well as a histogram and a geomean
speedup over time for all the input CSV data in /tmp/csv-depot.
Results are broken down per accelerator and either inference or
training.

To generate results per tier, we just have to pass --filter-by-tier
to the above and update the title to --title="Tier 1".

* Fix syntax in experiment_runner.py (pytorch#5827)

* Add flag to forward XLA flags and allow for experiment expansion (pytorch#5828)

* Add hide-errors flag to result analyzer (pytorch#5836)

* Add readme and linting

* Fix ClusterResolver

---------

Co-authored-by: Liyang90 <[email protected]>
Co-authored-by: Manfei <[email protected]>
Co-authored-by: zpcore <[email protected]>
Co-authored-by: Grzegorz Olechwierowicz <[email protected]>
Co-authored-by: Emilio Cota <[email protected]>
golechwierowicz added a commit that referenced this pull request Jan 12, 2024
* Adding the benchmarking with TorchBench (#5788)

* Initial commit with dummy model benchmark

* add XRT support

* Add torchbench benchmark models

* add randomize_input

* add model set up for torchbench model

* update ExperimentLoader

* Add saving results

* minor args update

* update style

* add experiment name

* add grad context for eval and train

* minor user config update

* fix train() return item

* minor refactor

* add dynamo options

* add column in result for dynamo setting

* using  to capture output and error

* Fix some failure cases for dynamo

* reduce eval result size by returning eval loss

* minor refactor

* revert eval result change

* minor fix

* Change output format to jsonl

* Add accelerator model nname

* add skipping finished experiments

* main process needs to remove PJRT_DEVICE env var that is automatically added

* Add a simple result analyzer

* Result analyzer save to database csv with historical data

* Handle detectron2 models

* minor update

* add deny list

* Create run_benchmark

* Rename run_benchmark to run_benchmark.sh

* Fix device names and dynamo backend names in benchmark runner (#5806)

* update optimizer for openxla

* Add benchmark selection by tier 1-3 (#5808)

* Apply Pytorch/XLA formatting style (#5816)

* Add top tier benchmark runner (#5809)

* Add profiling capabilities to experiment_runner.py script (#5812)

* update run model config call interface, optimizer and result analyze script

* update dependency errir

* Add profiling capabilties

---------

Co-authored-by: zpcore <[email protected]>

* benchmarks: add script to aggregate results from result_analyzer (#5829)

* benchmarks: extract tiers into their own file

So that they can be reused in other files. The second user is coming
next.

* benchmarks: add aggregate.py

This script processes output CSV files from results_analyzer to
generate CSV/plots. Example:

$ for fmt in csv png; do \
    for acc in v100 a6000; do \
      for report in latest histogram speedup; do \
        for test in training inference; do \
          FILENAME=/tmp/png/$acc-$test-$report.$fmt; \
          python3 aggregate.py \
	  --accelerator=$acc \
	  --test=$test \
	  -i /tmp/csv-depot \
	  --report=$report \
	  --title="All benchmarks" \
	  --format=$fmt > $FILENAME || break; \
	  chmod 644 $FILENAME; \
        done; \
      done; \
    done; \
  done

This generates plots and CSV files to summarize the latest
performance vs. Inductor, as well as a histogram and a geomean
speedup over time for all the input CSV data in /tmp/csv-depot.
Results are broken down per accelerator and either inference or
training.

To generate results per tier, we just have to pass --filter-by-tier
to the above and update the title to --title="Tier 1".

* Fix syntax in experiment_runner.py (#5827)

* Add flag to forward XLA flags and allow for experiment expansion (#5828)

* Add hide-errors flag to result analyzer (#5836)

* Add readme and linting

* Fix ClusterResolver

---------

Co-authored-by: Liyang90 <[email protected]>
Co-authored-by: Manfei <[email protected]>
Co-authored-by: zpcore <[email protected]>
Co-authored-by: Grzegorz Olechwierowicz <[email protected]>
Co-authored-by: Emilio Cota <[email protected]>
bhavya01 pushed a commit that referenced this pull request Apr 22, 2024
* Adding the benchmarking with TorchBench (#5788)

* Initial commit with dummy model benchmark

* add XRT support

* Add torchbench benchmark models

* add randomize_input

* add model set up for torchbench model

* update ExperimentLoader

* Add saving results

* minor args update

* update style

* add experiment name

* add grad context for eval and train

* minor user config update

* fix train() return item

* minor refactor

* add dynamo options

* add column in result for dynamo setting

* using  to capture output and error

* Fix some failure cases for dynamo

* reduce eval result size by returning eval loss

* minor refactor

* revert eval result change

* minor fix

* Change output format to jsonl

* Add accelerator model nname

* add skipping finished experiments

* main process needs to remove PJRT_DEVICE env var that is automatically added

* Add a simple result analyzer

* Result analyzer save to database csv with historical data

* Handle detectron2 models

* minor update

* add deny list

* Create run_benchmark

* Rename run_benchmark to run_benchmark.sh

* Fix device names and dynamo backend names in benchmark runner (#5806)

* update optimizer for openxla

* Add benchmark selection by tier 1-3 (#5808)

* Apply Pytorch/XLA formatting style (#5816)

* Add top tier benchmark runner (#5809)

* Add profiling capabilities to experiment_runner.py script (#5812)

* update run model config call interface, optimizer and result analyze script

* update dependency errir

* Add profiling capabilties

---------

Co-authored-by: zpcore <[email protected]>

* benchmarks: add script to aggregate results from result_analyzer (#5829)

* benchmarks: extract tiers into their own file

So that they can be reused in other files. The second user is coming
next.

* benchmarks: add aggregate.py

This script processes output CSV files from results_analyzer to
generate CSV/plots. Example:

$ for fmt in csv png; do \
    for acc in v100 a6000; do \
      for report in latest histogram speedup; do \
        for test in training inference; do \
          FILENAME=/tmp/png/$acc-$test-$report.$fmt; \
          python3 aggregate.py \
	  --accelerator=$acc \
	  --test=$test \
	  -i /tmp/csv-depot \
	  --report=$report \
	  --title="All benchmarks" \
	  --format=$fmt > $FILENAME || break; \
	  chmod 644 $FILENAME; \
        done; \
      done; \
    done; \
  done

This generates plots and CSV files to summarize the latest
performance vs. Inductor, as well as a histogram and a geomean
speedup over time for all the input CSV data in /tmp/csv-depot.
Results are broken down per accelerator and either inference or
training.

To generate results per tier, we just have to pass --filter-by-tier
to the above and update the title to --title="Tier 1".

* Fix syntax in experiment_runner.py (#5827)

* Add flag to forward XLA flags and allow for experiment expansion (#5828)

* Add hide-errors flag to result analyzer (#5836)

* Add readme and linting

* Fix ClusterResolver

---------

Co-authored-by: Liyang90 <[email protected]>
Co-authored-by: Manfei <[email protected]>
Co-authored-by: zpcore <[email protected]>
Co-authored-by: Grzegorz Olechwierowicz <[email protected]>
Co-authored-by: Emilio Cota <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants