Skip to content

Setting Up an Experiment

TeresaEsch edited this page Feb 15, 2024 · 16 revisions

Each time you run an experiment on DaCapo, you have to tell it where to store the output, where to find the data,

Data Storage

What data are stored

DaCapo has the following main data storage components:

  • Loss stats: The loss per training run iteration is stored along with a couple other statistics such as how long that iteration took to compute. These will be stored in the MongoDB if available.
  • Validation scores: For each validation step. These will be stored in the MongoDB if available.
  • Validation volumes: The results of validation (images with presumptive organelles labeled) are stored in zarr datasets so you visually inspect the best predictions on your validation out data according to the validation metric of your choice. This data will be stored on disk.
  • Checkpoints: Copies of your model are stored at various intervals during training. This lets you retrieve the best performing model according to the validation metric of your choice. This data will be stored on disk.
  • Training Snapshots: Every n iterations (where n corresponds to the snapshot_interval defined in the Training configuration) a snapshot that includes the inputs and outputs of the model at that iteration is stored along with some extra results that can be very helpful for debugging. The saved arrays include: Ground Truth, Target (Ground Truth transformed by Task), Raw, Prediction, Gradient, and Weights (for modifying the loss). This data will be stored on disk.
  • Configs: To make runs easily reproducible, the configuration files used to execute experiments are saved. This way other people can use the exact same configuration files or change single parameters and get comparable results. This data will be stored in the MongoDB if available.

Defining Storage Locations To define where this data goes, create a dacapo.yaml configuration file. Here is a template:

mongodbhost: mongodb://dbuser:dbpass@dburl:dbport/ mongodbname: dacapo runs_base_dir: /path/to/my/data/storage

The runs_base_dir defines where your on-disk data will be stored. The mongodbhost and mongodbname define the mongodb host and database that will store your cloud data. If you want to store everything on disk, replace mongodbhost and mongodbname with a single type: files and everything will be saved to disk. Configs

Next you need to create your configuration files for your experiments. This can all be done in python. There is also a web based gui: the dacapo-dashboard https://github.com/funkelab/dacapo-dashboard, see the Simple Experiment us

Clone this wiki locally