From aad30d5d6adfe2aa3a88ce46de524deb22d649d7 Mon Sep 17 00:00:00 2001 From: tapadipti <32855442+tapadipti@users.noreply.github.com> Date: Thu, 24 Aug 2023 15:39:42 +0530 Subject: [PATCH 1/4] Studio update images (#4758) * Update text and image for project action buttons * Studio: Explore exps - Update screenshots and adjust text accordingly * Document nested branches and explain when the commits in branch filter is useful * Address PR comment * Address PR comment * Small clarification * Address Oded's comments in the PR and make 'nested branches' and 'commits on branch' filter clearer * Committing one of the changed files after yarn fix-all * Revert "Committing one of the changed files after yarn fix-all" This reverts commit 5479bd76d38d466e4800b7126fc010d6605bdddb. --- .../explore-ml-experiments.md | 162 ++++++++++++------ 1 file changed, 111 insertions(+), 51 deletions(-) diff --git a/content/docs/studio/user-guide/projects-and-experiments/explore-ml-experiments.md b/content/docs/studio/user-guide/projects-and-experiments/explore-ml-experiments.md index af6e3c2472..329f6bf71a 100644 --- a/content/docs/studio/user-guide/projects-and-experiments/explore-ml-experiments.md +++ b/content/docs/studio/user-guide/projects-and-experiments/explore-ml-experiments.md @@ -1,34 +1,24 @@ # Explore ML Experiments -The projects dashboard in Iterative Studio contains all your projects. Open a -project by clicking on its name. An experiments table for the project will be -generated as shown below. This includes metrics, hyperparameters, and -information about datasets and models. +The projects dashboard in Iterative Studio contains all your projects. Click on +a project name to open the project table, which contains: -![](https://static.iterative.ai/img/studio/view_components.png) - -The major components of a project table are: - -- [Git history and live experiments](#git-history-and-live-metrics) that show - you the complete experimentation history as well as live metrics of running - experiments. -- [Display preferences](#display-preferences) that let you show/hide branches, - commits and columns, and re-arrange the table. +- [Git history and live experiments](#git-history-and-live-metrics) of the + project +- [Display preferences](#display-preferences) - Buttons to [visualize, compare, and run experiments](#visualize-compare-and-run-experiments). - Button to [export project data](#export-project-data). ## Git history and live experiments -The branches and commits in your Git repository are displayed along with the +Branches and commits in your Git repository are displayed along with the corresponding models, metrics, hyperparameters, and DVC-tracked files. -[New experiments submitted from Iterative Studio][run experiments] appear as -experiment commits, which are eventually pushed to Git. Experiments that you -push using the `dvc exp push` command as well as any live experiments that you -send using [DVCLive] are displayed in a special experiment row nested under the -parent Git commit. More details of how live experiments are displayed can be -found +Experiments that you push using the `dvc exp push` command as well as any live +experiments that you send using [DVCLive] are displayed in a special experiment +row nested under the parent Git commit. More details of how live experiments are +displayed can be found [here](/doc/studio/user-guide/projects-and-experiments/live-metrics-and-plots#view-live-metrics-and-plots). To manually check for updates in your repository, use the `Reload` button 🔄 @@ -38,43 +28,95 @@ located above the project table. ![](https://static.iterative.ai/img/studio/view_components_1.gif) +### Nested branches + +When a Git branch (e.g., `feature-branch-1`) is merged into another branch +(e.g., `main`), two possibilities exist: + +- `feature-branch-1` is still active. That is, the user continues to push more + commits to this branch. Since the branch now contains new unique commits, the + project table will display both `main` and `feature-branch-1` separately. + `feature-branch-1` will show the new commits that are not part of `main` while + all the merged commits will be shown inside `main`. + +- `feature-branch-1` is inactive. That is, the user does NOT push any more + commits to this branch. Since the branch does not contain any new unique + commits, Iterative Studio considers `feature-branch-1` as **"nested"** within + `main` and does not display it as a separate branch. This helps to keep the + project table concise and reduce clutter that can accumulate over time when + inactive branches are not cleaned from the Git repository. After all, those + inactive branches usually carry no new information for the purpose of managing + experiments. If you would like to display all commits of such an inactive + branch, use the + [`Commits on branch = feature-branch-1` display filter](#filters). + ## Display preferences The table contains buttons to specify filters and other preferences regarding which commits and columns to display. -![](https://static.iterative.ai/img/studio/view_components_2.gif) - ### Filters: -You can filter the commits that you want to display by the following fields: - -- **Branch:** The Git branch -- **Tag:** The Git tag -- **Author:** Author of the Git commit -- **Metric:** Values of different metrics. For instance, you can display only - those experiments for which the value of `avg_prec` is greater than `0.9`. -- **Metric delta:** Change in the value of the metric. For instance, you can use - this filter to only display those experiments for which the value of - `avg_prec` changed by more than `0.1` compared to the baseline experiment. -- **Param:** Values of different parameters -- **File size:** Size of the data, model and other files corresponding to your - experiments -- **File changed:** Whether or not any given file changed in the experiment +Click on the `Filters` button to specify which rows you want to show in the +project table. + +![Project filters](https://static.iterative.ai/img/studio/project_filters.png) + +There are two types of filters: + +- **Quick filters** (highlighted in orange above): Use the quick filter buttons + to + + - Show only DVC experiments + - Show only selected experiments + - Toggle hidden commits (include or exclude hidden commits in the project + table) + +- **Custom filters** (highlighted in purple above): Filter commits by one or + more of the following fields: + + - Column values (values of metrics, hyperparameters, etc.) and their deltas + - Git related fields such as Git branch, commit message, tag and author + + + + The `Branch` filter displays only the specified branch and its commits. + + On the other hand, the `Commits on branch` filter will also display branches + [inside which the specified branch is nested](#nested-branches). + + When a Git branch is nested inside another branch, the project table + [does not display the nested branch](#nested-branches). If + `feature-branch-1` is nested within `main`, `feature-branch-1` is NOT + displayed in the project table even if you apply the + `Branch = feature-brach-1` filter. + + In this case, if you would like to filter for commits in `feature-branch-1`, + you should use the `Commits on branch = feature-branch-1` filter. This will + display the `main` branch with commits that were merged from + `feature-branch-1` into `main`. A hint is present to indicate that even + though the commits appear inside `main`, they are part of the nested branch + `feature-branch-1`. + + ![Result of commits on branch filter](https://static.iterative.ai/img/studio/commits_on_branch_filter.png) + + ### Columns: Select the columns you want to display and hide the rest. ![Showing and hiding columns](https://static.iterative.ai/img/studio/show_hide_columns.gif) -You can also click and drag the columns in the table to rearrange them. - If your project is missing some required columns or includes columns that you do not want, refer to the following troubleshooting sections: - [Project does not contain the columns that I want](/doc/studio/troubleshooting#project-does-not-contain-the-columns-that-i-want) - [Project contains columns that I did not import](/doc/studio/troubleshooting#project-contains-columns-that-i-did-not-import) +To reorder the columns, click and drag them in the table or from the Columns +dropdown. +![Showing and hiding columns](https://static.iterative.ai/img/studio/reorder_columns.gif) + ### Hide commits: Commits can be hidden from the project table in the following ways: @@ -98,39 +140,56 @@ Commits can be hidden from the project table in the following ways: commits that do not add much value in your project. To hide a commit or branch, click on the 3-dot menu next to the commit or branch name and click on `Hide commit` or `Hide branch`. + + ![Hide commit](https://static.iterative.ai/img/studio/hide_commit.png) + - **Unhide commits:** You can unhide commits as needed, so that you don't lose any experimentation history. To display all hidden commits, click on the - `Show hidden commits` toggle (refer [the above gif](#display-preferences)). - This will display all hidden commits, with a `hidden` (closed eye) indicator. + `Show hidden commits` toggle (refer [filters](#filters)). This will display + all hidden commits, with a `hidden` (closed eye) indicator. + + ![Hidden commit indicator](https://static.iterative.ai/img/studio/hidden_commit_indicator.png) + To unhide any commit, click on the 3-dot menu for that commit and click on `Show commit`. -### Selected only: + ![Show hidden commit](https://static.iterative.ai/img/studio/show_hidden_commit.png) -Toggle between showing and hiding experiments that you have not selected. +### Delta mode -### Delta mode: +For metrics, models and files columns with numeric values, you can display +either the absolute values or their delta (difference) from the baseline row. To +toggle between these two options, use the `Delta mode` button. -Toggle between absolute values and difference from the baseline row. +![Delta mode](https://static.iterative.ai/img/studio/delta_mode.png) ### Save changes: -Save your filters or column display preferences so that these preferences remain -intact even after you log out of Iterative Studio and log back in later. +Whenever you make any changes to your project's columns, commits or filters, a +notification to save or discard your changes is displayed at the top of the +project table. Saved changes remain intact even after you log out of Iterative +Studio and log back in later. + +![Save or discard changes](https://static.iterative.ai/img/studio/save_discard_changes.png) ## Visualize, compare and run experiments Use the following buttons to visualize, compare and run experiments: -- **Show plots:** Open the `Plots` pane and [display plots] for the selected - commits. +- **Plots:** Open the `Plots` pane and [display plots] for the selected commits. +- **Trends:** [Generate trend charts] to see how the metrics have changed over + time. - **Compare:** [Compare experiments] side by side. - **Run:** [Run experiments] and [track results in real time][live-metrics-and-plots]. -- **Trends:** [Generate trend charts] to see how the metrics have changed over - time. -![](https://static.iterative.ai/img/studio/view_components_3.gif) +These buttons appear above your project table as shown below. +![example export to csv](https://static.iterative.ai/img/studio/project_action_buttons_big_screen.png) + +On smaller screens, the buttons might appear without text labels, as shown +below. + +![example export to csv](https://static.iterative.ai/img/studio/project_action_buttons_small_screen.png) ## Export project data @@ -143,6 +202,7 @@ Below is an example of the downloaded CSV file. ![example export to csv](https://static.iterative.ai/img/studio/project_export_to_csv_example.png) +[DVCLive]: /doc/dvclive [display plots]: /doc/studio/user-guide/projects-and-experiments/visualize-and-compare#display-plots-and-images [Compare experiments]: From 9b8b48d30e3f7563b4f547fa33062a2c8d5766f7 Mon Sep 17 00:00:00 2001 From: "github-actions[bot]" <41898282+github-actions[bot]@users.noreply.github.com> Date: Thu, 24 Aug 2023 11:21:55 -0700 Subject: [PATCH 2/4] dvc 3.16.0 (#4797) Co-authored-by: Olivaw[bot] --- src/components/DownloadButton/index.tsx | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/components/DownloadButton/index.tsx b/src/components/DownloadButton/index.tsx index 7fb6a42cdd..85c37e494e 100644 --- a/src/components/DownloadButton/index.tsx +++ b/src/components/DownloadButton/index.tsx @@ -9,7 +9,7 @@ import { logEvent } from '@dvcorg/gatsby-theme-iterative/src/utils/front/plausib import * as styles from './styles.module.css' import { OS, useUserOS } from '../../utils/front/useUserOS' -const VERSION = `3.15.3` +const VERSION = `3.16.0` const dropdownItems = [ OS.UNKNOWN, From b07857f304cfcf0fe3e0fde38c25fd9df7ccd885 Mon Sep 17 00:00:00 2001 From: Dave Berenbaum Date: Thu, 24 Aug 2023 18:41:13 -0400 Subject: [PATCH 3/4] dvclive-first metrics and plots (#4795) * dvclive-first metrics and plots * fix linting issues * make clear that adding metrics/plots outs is optional * minor updates --- .../docs/command-reference/metrics/diff.md | 51 ++++++----- .../docs/command-reference/metrics/index.md | 84 +++++++++---------- content/docs/dvclive/how-it-works.md | 13 +-- content/docs/dvclive/index.md | 18 ++++ .../docs/user-guide/integrations/sagemaker.md | 4 +- .../pipelines/defining-pipelines.md | 26 ++---- .../project-structure/dvcyaml-files.md | 17 +++- 7 files changed, 116 insertions(+), 97 deletions(-) diff --git a/content/docs/command-reference/metrics/diff.md b/content/docs/command-reference/metrics/diff.md index deee9d71b0..328d8d4ebb 100644 --- a/content/docs/command-reference/metrics/diff.md +++ b/content/docs/command-reference/metrics/diff.md @@ -88,31 +88,38 @@ all the current metrics (without comparisons). ## Examples -Start by creating a metrics file and commit it (see the `-M` option of -`dvc stage add` for more details): +Start with a simple Python script to generate metrics: -```cli -$ dvc stage add -n eval -M metrics.json \ - 'echo {"AUC": 0.9643, "TP": 527} > metrics.json' +```python +# train.py +import random +from dvclive import Live -$ dvc repro +with Live() as live: + live.log_metric("AUC", random.random()) + live.log_metric("TP", random.randint(0, 1000)) +``` -$ cat metrics.json -{"AUC": 0.9643, "TP": 527} +Run the script and commit it: -$ git add dvc.* metrics.json -$ git commit -m "Add metrics file" +```cli +$ python train.py +$ git add train.py dvclive +$ git commit -m "Add metrics" ``` Now let's simulate a change in our AUC metric: ```cli -$ echo '{"AUC":0.9671, "TP":531}' > metrics.json - -$ git diff -... --{"AUC":0.9643, "TP":527} -+{"AUC":0.9671, "TP":531} +$ python train.py + +$ git diff -- dvclive/metrics.json + { +- "AUC": 0.7891189181402177, +- "TP": 215 ++ "AUC": 0.18113944203594523, ++ "TP": 768 + } ``` To see the change, let's run `dvc metrics diff`. This compares our current @@ -121,9 +128,9 @@ had in the latest commit (`HEAD`): ```cli $ dvc metrics diff -Path Metric HEAD workspace Change -metrics.json AUC 0.9643 0.9671 0.0028 -metrics.json TP 527 531 4 +Path Metric HEAD workspace Change +dvclive/metrics.json AUC 0.78912 0.18114 -0.60798 +dvclive/metrics.json TP 215 768 553 ``` ## Example: compare metrics among specific versions @@ -133,7 +140,7 @@ two [revisions](https://git-scm.com/docs/revisions)): ```cli $ dvc metrics diff --targets metrics.json -- 305fb8b c7bef55 -Path Metric 305fb8b c7bef55 Change -metrics.json AUC 0.9643 0.9743 0.0100 -metrics.json TP 527 516 -11 +Path Metric 305fb8b c7bef55 Change +dvclive/metrics.json AUC 0.9643 0.9743 0.0100 +dvclive/metrics.json TP 527 516 -11 ``` diff --git a/content/docs/command-reference/metrics/index.md b/content/docs/command-reference/metrics/index.md index 87b26b69b8..ea7d743b78 100644 --- a/content/docs/command-reference/metrics/index.md +++ b/content/docs/command-reference/metrics/index.md @@ -18,16 +18,13 @@ positional arguments: ## Description In order to follow the performance of machine learning experiments, DVC has the -ability to mark stage outputs or other files as metrics. These -metrics are project-specific floating-point or integer values e.g. AUC, ROC, -false positives, etc. +ability to mark [structured files](#supported-file-formats) containing key/value +pairs as metrics. These metrics are project-specific floating-point, integer, or +string values e.g. AUC, ROC, false positives, etc. -In pipelines, metrics files are typically generated by user data -processing code, and are tracked using the `-m` (`--metrics`) and `-M` -(`--metrics-no-cache`) options of `dvc stage add`. If using -[DVCLive](/doc/dvclive/live/log_metric), the files are generated and tracked -automatically. Metrics files may also may be manually added to -[`dvc.yaml`](/doc/user-guide/project-structure/dvcyaml-files). +If using [DVCLive](/doc/dvclive/live/log_metric), the files are generated and +metrics are configured automatically. Metrics files also may be manually added +to [`dvc.yaml`](/doc/user-guide/project-structure/dvcyaml-files). In contrast to `dvc plots`, these metrics should be stored in hierarchical files. Unlike its `dvc plots` counterpart, `dvc metrics diff` can report the @@ -42,26 +39,14 @@ metrics.json AUC 0.763981 0.801807 0.037826 `dvc metrics` subcommands can be used on any [valid metrics files](#supported-file-formats). By default they use the ones -specified in `dvc.yaml` (if any), for example `summary.json` below: +specified in `dvc.yaml` (if any), including those added automatically by +DVCLive. For example, `summary.json` below: ```yaml -stages: - train: - cmd: python train.py - deps: - - users.csv - outs: - - model.pkl - metrics: - - summary.json: - cache: false +metrics: + - summary.json ``` -> `cache: false` above specifies that `summary.json` is not tracked or -> cached by DVC (`-M` option of `dvc stage add`). These metrics -> files are normally committed with Git instead. See `dvc.yaml` for more -> information on the file format above. - ### Supported file formats Metrics can be organized as tree hierarchies in JSON, TOML 1.0, or YAML 1.2 @@ -96,29 +81,44 @@ to compare and pick the best performing experiment. ## Examples -First, let's imagine we have a simple [stage](/doc/command-reference/run) that -produces an `eval.json` metrics file: +First, let's imagine we have a simple Python script using DVCLive to log some +metrics: -```cli -$ dvc stage add -n evaluate -d code/evaluate.py -M eval.json \ - python code/evaluate.py +```python +from dvclive import Live -$ dvc repro +with Live() as live: + ... + live.log_metric("AUC", auc) + live.log_metric("error", error) + live.log_metric("TP", tp) ``` -> `-M` (`--metrics-no-cache`) tells DVC to mark `eval.json` as a metrics file, -> without tracking it directly (You can track it with Git). See `dvc stage add` -> for more info. +This will generate some log files, including `dvclive/metrics.json`, which looks +like: + +```json +{ + "AUC": 0.66729, + "error": 0.16982, + "TP": 516 +} +``` + +It will also generate `dvclive/dvc.yaml`, which includes: + +```yaml +metrics: + - metrics.json +``` Now let's print metrics values that we are tracking in this project, using `dvc metrics show`: ```cli $ dvc metrics show - eval.json: - AUC: 0.66729 - error: 0.16982 - TP: 516 +Path AUC TP error +dvclive/metrics.json 0.66729 516 0.16982 ``` When there are metrics file changes (before committing them with Git), the @@ -127,8 +127,8 @@ When there are metrics file changes (before committing them with Git), the ```cli $ dvc metrics diff -Path Metric HEAD workspace Change -eval.json AUC 0.65115 0.66729 0.01614 -eval.json error 0.1666 0.16982 0.00322 -eval.json TP 528 516 -12 +Path Metric HEAD workspace Change +dvclive/metrics.json AUC 0.65115 0.66729 0.01614 +dvclive/metrics.json error 0.1666 0.16982 0.00322 +dvclive/metrics.json TP 528 516 -12 ``` diff --git a/content/docs/dvclive/how-it-works.md b/content/docs/dvclive/how-it-works.md index 9c3146c8dc..0cc41557c9 100644 --- a/content/docs/dvclive/how-it-works.md +++ b/content/docs/dvclive/how-it-works.md @@ -107,19 +107,12 @@ Using `Live.log_image()` to log multiple images may also grow too large to track with Git, in which case you can use [`Live(cache_images=True)`](/doc/dvclive/live#parameters) to cache them. -## Run with DVC - -Experimenting in Python interactively (like in notebooks) is great for -exploration, but eventually you may need a more structured way to run -reproducible experiments (for example, running a multi-step pipeline or queueing -multiple experiments). By configuring DVC [pipelines], you can -[run experiments](/doc/user-guide/experiment-management/running-experiments) -with `dvc exp run`. This will track the inputs and outputs of your code, and -also enable features like queuing, parameter tuning, and grid searches. +## Setup to Run with DVC DVCLive by default [generates] its own `dvc.yaml` file to configure the experiment results, but you can create your own `dvc.yaml` file at the base of -your repository (or elsewhere) to define a [pipeline](#run-with-dvc) or +your repository (or elsewhere) to define a [pipeline](#setup-to-run-with-dvc) to +run experiments with DVC or [customize plots](/doc/user-guide/experiment-management/visualizing-plots#defining-plots). Do not reuse the DVCLive `dvc.yaml` file since it gets overwritten during each experiment run. A pipeline stage for model training might look like: diff --git a/content/docs/dvclive/index.md b/content/docs/dvclive/index.md index 175ccd1b0a..f7018df03b 100644 --- a/content/docs/dvclive/index.md +++ b/content/docs/dvclive/index.md @@ -154,3 +154,21 @@ with Live(save_dvc_exp=True) as live: After you run your training code, all the logged data will be stored in the `dvclive` directory. Check the [DVCLive outputs](/doc/dvclive/how-it-works) page for more details. + +## Run with DVC + +Experimenting in Python interactively (like in notebooks) is great for +exploration, but eventually you may need a more structured way to run +reproducible experiments. By configuring DVC [pipelines], you can [run +experiments] with `dvc exp run`. This will track the inputs and outputs of code, +and enable more advanced workflows like multi-step pipelines and queueing +multiple experiments or even an entire grid search. See examples of how to [add +DVCLive to a pipeline] or [add a pipeline to DVCLive code], or get more +information about how to [setup a pipeline] to work with DVCLive. + +[run experiments with DVC]: + /doc/user-guide/experiment-management/running-experiments +[pipelines]: /doc/user-guide/pipelines +[add DVCLive to a pipeline]: /doc/start/data-management/metrics-parameters-plots +[add a pipeline to DVCLive code]: /doc/start/experiments/experiment-pipelines +[setup a pipeline]: /doc/dvclive/how-it-works#setup-to-run-with-dvc diff --git a/content/docs/user-guide/integrations/sagemaker.md b/content/docs/user-guide/integrations/sagemaker.md index 6dd697f49f..ca2c90081a 100644 --- a/content/docs/user-guide/integrations/sagemaker.md +++ b/content/docs/user-guide/integrations/sagemaker.md @@ -64,7 +64,9 @@ modified easily. The DVC pipeline stage is defined in `dvc.yaml` like this: ```yaml prepare: cmd: - - wget https://sagemaker-sample-data-us-west-2.s3-us-west-2.amazonaws.com/autopilot/direct_marketing/bank-additional.zip -O bank-additional.zip + - wget + https://sagemaker-sample-data-us-west-2.s3-us-west-2.amazonaws.com/autopilot/direct_marketing/bank-additional.zip + -O bank-additional.zip - python sm_prepare.py --bucket ${bucket} --prefix ${prefix} deps: - sm_prepare.py diff --git a/content/docs/user-guide/pipelines/defining-pipelines.md b/content/docs/user-guide/pipelines/defining-pipelines.md index f6bfdcbccc..eb24ea4615 100644 --- a/content/docs/user-guide/pipelines/defining-pipelines.md +++ b/content/docs/user-guide/pipelines/defining-pipelines.md @@ -209,31 +209,17 @@ Use `dvc params diff` to compare parameters across project versions. ## Outputs Stage outputs are files (or directories) written by pipelines, for -example machine learning models, intermediate artifacts, as well as data [plots] -and performance [metrics]. These files are cached by DVC -automatically, and tracked with the help of `dvc.lock` files (or `.dvc` files, -see `dvc add`). +example machine learning models and intermediate artifacts. These files are +cached by DVC automatically, and tracked with the help of +`dvc.lock` files (or `.dvc` files, see `dvc add`). Outputs can be dependencies of subsequent stages (as explained earlier). So when they change, DVC may need to reproduce downstream stages as well (handled automatically). -The types of outputs are: - -- Files and directories: Typically data to feed to intermediate stages, as well - as the final results of a pipeline (e.g. a dataset or an ML model). - -- [Metrics]: DVC supports small text files that usually contain model - performance metrics from the evaluation, validation, or testing phases of the - ML lifecycle. DVC allows to compare produced metrics with one another using - `dvc metrics diff` and presents the results as a table with `dvc metrics show` - or `dvc exp show`. - -- [Plots]: Different kinds of data that can be visually graphed. For example - contrast ML performance statistics or continuous metrics from multiple - experiments. `dvc plots show` can generate charts for certain data files or - render custom image files for you, or you can compare different ones with - `dvc plots diff`. +DVC can also track [metrics] and [plots] files, which can optionally be added as +stage outputs, or even added with `cache: false` in `dvc.yaml` since they are +often small enough to store in Git. diff --git a/content/docs/user-guide/project-structure/dvcyaml-files.md b/content/docs/user-guide/project-structure/dvcyaml-files.md index bcb8c1e4fa..c833c65c9a 100644 --- a/content/docs/user-guide/project-structure/dvcyaml-files.md +++ b/content/docs/user-guide/project-structure/dvcyaml-files.md @@ -57,7 +57,7 @@ metrics: Metrics are key/value pairs saved in structured files that map a metric name to a numeric value. See `dvc metrics` for more information and how to compare among -experiments. +experiments, or [DVCLive] for a helper to log metrics. ## Params @@ -90,7 +90,8 @@ DVC will create separate rendering for each type. -Refer to [Visualizing Plots] and `dvc plots show` for more examples. +Refer to [Visualizing Plots] and `dvc plots show` for more examples, and refer +to [DVCLive] for a helper to log plots. [visualizing plots]: /doc/user-guide/experiment-management/visualizing-plots @@ -353,6 +354,16 @@ See also `dvc params diff` to compare params across project version. ### Metrics and Plots outputs + + +Metrics and plots outputs described below come from earlier versions of DVC and +remain as a convenience. You can instead define metrics and plots separate from +your pipeline with [DVCLive] or add "top-level" [metrics](#metrics) and +[plots](#plots). You can optionally include them as regular `outs` in the +pipeline. + + + Like common outputs, metrics and plots files are produced by the stage `cmd`. However, their purpose is different. Typically they contain metadata to evaluate pipeline processes. Example: @@ -898,3 +909,5 @@ Full parameter dependencies (both key and value) are listed too `dvc.lock` (no `${}` expression). As for [`foreach` stages](#foreach-stages) and [`matrix` stages](#matrix-stages), individual stages are expanded (no `foreach` or `matrix` structures are preserved). + +[DVCLive]: /doc/dvclive From 9ad0b97ad07d1baab74cabb70be221955004f85b Mon Sep 17 00:00:00 2001 From: Dave Berenbaum Date: Fri, 25 Aug 2023 10:10:54 -0400 Subject: [PATCH 4/4] drop metrics/plots stage outputs (#4798) --- content/docs/command-reference/plots/diff.md | 3 - content/docs/command-reference/plots/index.md | 7 +- .../docs/command-reference/plots/modify.md | 167 ---------------- content/docs/command-reference/plots/show.md | 9 +- content/docs/sidebar.json | 4 - .../visualizing-plots.md | 47 ----- .../project-structure/dvcyaml-files.md | 187 ++++++------------ 7 files changed, 66 insertions(+), 358 deletions(-) delete mode 100644 content/docs/command-reference/plots/modify.md diff --git a/content/docs/command-reference/plots/diff.md b/content/docs/command-reference/plots/diff.md index 5c99f3750c..5bc0023102 100644 --- a/content/docs/command-reference/plots/diff.md +++ b/content/docs/command-reference/plots/diff.md @@ -41,9 +41,6 @@ specified with the `--targets` option (any valid plots file is accepted). The plot style can be customized with [plot templates], using the `--template` option. See `dvc plots` to learn more about plots files and templates. -> Note that the default behavior of this command can be modified per metrics -> file with `dvc plots modify`. - Another way to display plots is the `dvc plots show` command, which just lists all the current plots, without comparisons. diff --git a/content/docs/command-reference/plots/index.md b/content/docs/command-reference/plots/index.md index 6e2e436de9..30c581f083 100644 --- a/content/docs/command-reference/plots/index.md +++ b/content/docs/command-reference/plots/index.md @@ -2,14 +2,13 @@ A set of commands to visualize and compare data series or images from ML projects: [show](/doc/command-reference/plots/show), -[diff](/doc/command-reference/plots/diff), -[modify](/doc/command-reference/plots/modify) and +[diff](/doc/command-reference/plots/diff), and [templates](/doc/command-reference/plots/templates). ## Synopsis ```usage -usage: dvc plots [-h] [-q | -v] {show,diff,modify,templates} ... +usage: dvc plots [-h] [-q | -v] {show,diff,templates} ... positional arguments: COMMAND @@ -17,8 +16,6 @@ positional arguments: definitions in `dvc.yaml`. diff Show multiple versions of a plot by overlaying them in a single image. - modify Modify display properties of data-series plots - defined in stages (has no effect on image plots). templates List built-in plots templates or show JSON specification for one. ``` diff --git a/content/docs/command-reference/plots/modify.md b/content/docs/command-reference/plots/modify.md deleted file mode 100644 index 1fe72071e6..0000000000 --- a/content/docs/command-reference/plots/modify.md +++ /dev/null @@ -1,167 +0,0 @@ -# plots modify - -Modify display properties of data-series [plots](/doc/command-reference/plots) -defined in stages. - -> ⚠️ Note that this command can modify only data-series plots. It has no effect -> on image-type plots or any [top-level plot] definitions. - -[top-level plot]: /doc/user-guide/project-structure/dvcyaml-files#plots - -## Synopsis - -```usage -usage: dvc plots modify [-h] [-q | -v] [-t ] [-x ] - [-y ] [--no-header] [--title ] - [--x-label ] [--y-label ] - [--unset [ [ ...]]] - target - -positional arguments: - target Plots file to set properties for - (defined at the stage level) -``` - -## Description - -It might be not convenient for users or automation systems to specify all the -_display properties_ (such as `y-label`, `template`, `title`, etc.) each time -plots are generated with `dvc plots show` or `dvc plots diff`. This command sets -(or unsets) default display properties for a specific plots file. - -The path to the plots file `target` is required. It must be listed in a -`dvc.yaml` file (see the `--plots` option of `dvc stage add`). -`dvc plots modify` adds the display properties to `dvc.yaml`. - -Property names are passed as [options](#options) to this command (prefixed with -`--`). These are based on the [Vega-Lite](https://vega.github.io/vega-lite/) -specification. - -Note that a secondary use of this command is to convert output or simple -`dvc metrics` file into a plots file (see an -[example](#example-convert-any-output-into-a-plot)). - -## Options - -- `-t , --template ` - set a default - [plot template](/doc/user-guide/experiment-management/visualizing-plots#plot-templates-data-series-only). - -- `-x ` - set a default field or column name (or number) from which the X - axis data comes from. - -- `-y ` - set a default field or column name (or number) from which the Y - axis data comes from. - -- `--x-label ` - set a default title for the X axis. - -- `--y-label ` - set a default title for the Y axis. - -- `--title ` - set a default plot title. - -- `--unset [ [ ...]]` - unset one or more display - properties. Use the property name(s) without `--` in the argument sent to this - option. - -- `--no-header` - lets DVC know that the `target` CSV or TSV does not have a - header. A 0-based numeric index can be used to identify each column instead of - names. - -- `-h`, `--help` - prints the usage/help message, and exit. - -- `-q`, `--quiet` - do not write anything to standard output. Exit with 0 if no - problems arise, otherwise 1. - -- `-v`, `--verbose` - displays detailed tracing information. - -## Examples - -The initial plot was showing the last column of CSV file by default which is -_loss_ metrics while _accuracy_ is expected as Y axis: - -``` -epoch,accuracy,loss -0,0.9403833150863647,0.2019129991531372 -1,0.9733833074569702,0.08973673731088638 -2,0.9815833568572998,0.06529958546161652 -3,0.9861999750137329,0.04984375461935997 -4,0.9882333278656006,0.041892342269420624 -``` - -```cli -$ dvc plots show logs.csv -file:///Users/usr/src/myclassifier/logs.html -``` - -![](/img/plots_mod_loss.svg) - -Changing the y-axis to _accuracy_: - -```cli -$ dvc plots modify logs.csv -y accuracy -$ dvc plots show logs.csv -file:///Users/usr/src/myclassifier/logs.html -``` - -![](/img/plots_mod_acc.svg) - -Note that a new field _y_ was added to `dvc.yaml` file for the plot. Make sure -to commit the change in Git if the modification needs to be preserved. - -```yaml -plots: - - logs.csv: - cache: false - y: accuracy -``` - -Changing the plot `title` and `x-label`: - -```cli -$ dvc plots modify logs.csv --title Accuracy -x epoch --x-label Epoch -$ dvc plots show logs.csv -file:///Users/usr/src/myclassifier/logs.html -``` - -![](/img/plots_mod_acc_titles.svg) - -Two new fields were added to `dvc.yaml`: `x-label` and `title`: - -```yaml -plots: - - plots.csv: - cache: false - y: accuracy - x_label: epoch - title: Accuracy -``` - -## Example: Template change - -Something like `dvc stage add --plots file.csv ...` assigns the default -template, which needs to be changed in many cases. This command can do so: - -```cli -$ dvc plots modify classes.csv --template confusion -``` - -## Example: Convert any output into a plot - -Let's take an example `evaluate` stage which has `logs.csv` as an output. We can -use `dvc plots modify` to convert the `logs.csv` output file into a plots file, -and then confirm the changes that happened in `dvc.yaml`: - -```cli -$ dvc plots modify logs.csv -``` - -```git - evaluate: - cmd: python src/evaluate.py - deps: - - src/evaluate.py -- outs: -- - logs.csv - plots: - - scores.json -+ - logs.csv -``` diff --git a/content/docs/command-reference/plots/show.md b/content/docs/command-reference/plots/show.md index defee7b828..099bb44dcf 100644 --- a/content/docs/command-reference/plots/show.md +++ b/content/docs/command-reference/plots/show.md @@ -30,13 +30,6 @@ All plots defined in `dvc.yaml` are used by default, but you can specify any The plot style can be customized with [plot templates], using the `--template` option. To learn more about plots file formats and templates, see `dvc plots`. - - -The default behavior of this command can be modified per [stage plot] file with -`dvc plots modify`. - - - [certain data]: /doc/user-guide/experiment-management/visualizing-plots#supported-plot-file-formats [plot templates]: @@ -205,7 +198,7 @@ $ dvc plots show --no-header logs.csv -y 2 file:///Users/usr/src/dvc_plots/index.html ``` -## Example: Top-level plots +## Example: `dvc.yaml` plots ### Simple plot definition diff --git a/content/docs/sidebar.json b/content/docs/sidebar.json index 5cbd5b3928..fda47b7801 100644 --- a/content/docs/sidebar.json +++ b/content/docs/sidebar.json @@ -429,10 +429,6 @@ "label": "plots diff", "slug": "diff" }, - { - "label": "plots modify", - "slug": "modify" - }, { "label": "plots templates", "slug": "templates" diff --git a/content/docs/user-guide/experiment-management/visualizing-plots.md b/content/docs/user-guide/experiment-management/visualizing-plots.md index c5a724bc8c..817e31a426 100644 --- a/content/docs/user-guide/experiment-management/visualizing-plots.md +++ b/content/docs/user-guide/experiment-management/visualizing-plots.md @@ -188,53 +188,6 @@ Refer to the [full format specification] and `dvc plots show` for more details. -### Plot outputs - -Plots can use any file defined in the project, including outputs of -[pipelines]: - -```yaml -plots: - - logs.csv: - x: epoch - y: loss -stages: - build: - cmd: python train.py - outs: - - logs.csv - ... -``` - -Alternatively, when defining [pipelines], some outputs (both files -and directories) can be placed under a `plots` list for the corresponding stage -in `dvc.yaml`. This will tell DVC that they are intended for visualization. - - - -When using `dvc stage add`, use `--plots/--plots-no-cache` instead of -`--outs/--outs-no-cache`. - - - -```yaml -stages: - build: - cmd: python train.py - plots: - - logs.csv: - x: epoch - y: loss - ... -``` - -Marking stage outputs as plots is convenient for working with plots at the stage -level, without having to write top-level `plots` definitions in `dvc.yaml`. -However, stage-level plots do not support custom plot IDs or multiple data -sources. - -[pipelines]: /doc/start/data-management/data-pipelines - ## Plot templates (data-series only) DVC uses [Vega-Lite](https://vega.github.io/vega-lite/) JSON specifications to diff --git a/content/docs/user-guide/project-structure/dvcyaml-files.md b/content/docs/user-guide/project-structure/dvcyaml-files.md index c833c65c9a..20880c399d 100644 --- a/content/docs/user-guide/project-structure/dvcyaml-files.md +++ b/content/docs/user-guide/project-structure/dvcyaml-files.md @@ -82,12 +82,6 @@ directory path (relative to the location of `dvc.yaml`) or an arbitrary string. If the ID is an arbitrary string, a file path must be provided in the `y` field (`x` file path is always optional and cannot be the only path provided). -In addition to these "top-level plots," users can mark specific stage -outputs as [plot outputs](#metrics-and-plots-outputs). DVC will -collect both types and display everything conforming to each plot configuration. -If any stage plot files or directories are also used in a top-level definition, -DVC will create separate rendering for each type. - Refer to [Visualizing Plots] and `dvc plots show` for more examples, and refer @@ -99,75 +93,66 @@ to [DVCLive] for a helper to log plots. ### Available configuration fields -- `y` - source for the Y axis data: - - - **Top-level plots** (_string, list, dict_): - - If plot ID is a path, one or more column/field names is expected. For - example: - - ```yaml - plots: - - regression_hist.csv: - y: mean_squared_error - - classifier_hist.csv: - y: [acc, loss] - ``` - - If plot ID is an arbitrary string, a dictionary of file paths mapped to - column/field names is expected. For example: - - ```yaml - plots: - - train_val_test: - y: - train.csv: [train_acc, val_acc] - test.csv: test_acc - ``` - - - **Plot outputs** (_string_): one column/field name. - -- `x` - source for the X axis data. An auto-generated _step_ field is used by - default. - - - **Top-level plots** (_string, dict_): - - If plot ID is a path, one column/field name is expected. For example: - - ```yaml - plots: - - classifier_hist.csv: - y: [acc, loss] - x: epoch - ``` - - If plot ID is an arbitrary string, `x` may either be one column/field name, - or a dictionary of file paths each mapped to one column/field name (the - number of column/field names must match the number in `y`). - - ```yaml - plots: - - train_val_test: # single x - y: - train.csv: [train_acc, val_acc] - test.csv: test_acc - x: epoch - - roc_vs_prc: # x dict - y: - precision_recall.json: precision - roc.json: tpr - x: - precision_recall.json: recall - roc.json: fpr - - confusion: # different x and y paths - y: - dir/preds.csv: predicted - x: - dir/actual.csv: actual - template: confusion - ``` - - - **Plot outputs** (_string_): one column/field name. +- `y` (_string, list, dict_) - source for the Y axis data: + + If plot ID is a path, one or more column/field names is expected. For example: + + ```yaml + plots: + - regression_hist.csv: + y: mean_squared_error + - classifier_hist.csv: + y: [acc, loss] + ``` + + If plot ID is an arbitrary string, a dictionary of file paths mapped to + column/field names is expected. For example: + + ```yaml + plots: + - train_val_test: + y: + train.csv: [train_acc, val_acc] + test.csv: test_acc + ``` + +- `x` (_string, dict_) - source for the X axis data. An auto-generated _step_ + field is used by default. + + If plot ID is a path, one column/field name is expected. For example: + + ```yaml + plots: + - classifier_hist.csv: + y: [acc, loss] + x: epoch + ``` + + If plot ID is an arbitrary string, `x` may either be one column/field name, or + a dictionary of file paths each mapped to one column/field name (the number of + column/field names must match the number in `y`). + + ```yaml + plots: + - train_val_test: # single x + y: + train.csv: [train_acc, val_acc] + test.csv: test_acc + x: epoch + - roc_vs_prc: # x dict + y: + precision_recall.json: precision + roc.json: tpr + x: + precision_recall.json: recall + roc.json: fpr + - confusion: # different x and y paths + y: + dir/preds.csv: predicted + x: + dir/actual.csv: actual + template: confusion + ``` - `y_label` (_string_) - Y axis label. If all `y` data sources have the same field name, that will be the default. Otherwise, it's "y". @@ -175,10 +160,8 @@ to [DVCLive] for a helper to log plots. - `x_label` (_string_) - X axis label. If all `y` data sources have the same field name, that will be the default. Otherwise, it's "x". -- `title` (_string_) - header for the plot(s). Defaults: - - - **Top-level plots**: `path/to/dvc.yaml::plot_id` - - **Plot outputs**: `path/to/data.csv` +- `title` (_string_) - header for the plot(s). Defaults to + `path/to/dvc.yaml::plot_id`. - `template` (_string_) - [plot template]. Defaults to `linear`. @@ -235,7 +218,7 @@ them). -Output files may be viable data sources for [top-level plots](#plots). +Output files may be viable data sources for [plots](#plots). @@ -352,48 +335,6 @@ See also `dvc params diff` to compare params across project version. -### Metrics and Plots outputs - - - -Metrics and plots outputs described below come from earlier versions of DVC and -remain as a convenience. You can instead define metrics and plots separate from -your pipeline with [DVCLive] or add "top-level" [metrics](#metrics) and -[plots](#plots). You can optionally include them as regular `outs` in the -pipeline. - - - -Like common outputs, metrics and plots files are -produced by the stage `cmd`. However, their purpose is different. Typically they -contain metadata to evaluate pipeline processes. Example: - -```yaml -stages: - build: - cmd: python train.py - deps: - - features.csv - outs: - - model.pt - metrics: - - accuracy.json: - cache: false - plots: - - auc.json: - cache: false -``` - - - -`cache: false` is typical here, since they're small enough for Git to store -directly. - - - -The commands in `dvc metrics` and `dvc plots` help you display and compare -metrics and plots. - ## Stage entries These are the fields that are accepted in each stage: @@ -405,8 +346,6 @@ These are the fields that are accepted in each stage: | `deps` | List of dependency paths (relative to `wdir`). | | `outs` | List of output paths (relative to `wdir`). These can contain certain optional [subfields](#output-subfields). | | `params` | List of parameter dependency keys (field names) to track from `params.yaml` (in `wdir`). The list may also contain other parameters file names, with a sub-list of the param names to track in them. | -| `metrics` | List of [metrics files](/doc/command-reference/metrics), and optionally, whether or not this metrics file is cached (`true` by default). See the `--metrics-no-cache` (`-M`) option of `dvc stage add`. | -| `plots` | List of [plot metrics](/doc/command-reference/plots), and optionally, their default configuration (subfields matching the options of `dvc plots modify`), and whether or not this plots file is cached ( `true` by default). See the `--plots-no-cache` option of `dvc stage add`. | | `frozen` | Whether or not this stage is frozen (prevented from execution during reproduction) | | `always_changed` | Causes this stage to be always considered as [changed] by commands such as `dvc status` and `dvc repro`. `false` by default | | `meta` | (Optional) arbitrary metadata can be added manually with this field. Any YAML content is supported. `meta` contents are ignored by DVC, but they can be meaningful for user processes that read or write `.dvc` files directly. |