Skip to content
This repository has been archived by the owner on May 15, 2024. It is now read-only.

Prorotyping experiment tracking system for Firefox Translations

License

Notifications You must be signed in to change notification settings

vrigal/translations-experiment-tracking

 
 

Repository files navigation

translations-experiment-tracking

Track and extract data from the training system of Firefox Translations.

Logs are extracted from Marian training tasks, running in Task Cluster.

This POC works offline, using a text log sample within the samples directory. It outputs an instance of the TrainingLog dataclass with the following attributes:

  • info: Marian information as a dict
  • configuration Runtime configuration as a dict
  • training List of Training dataclass instances:
    • epoch
    • up
    • sen
    • cost
    • time
    • rate
    • gnorm
  • validation List of Validation dataclass instances:
    • epoch
    • up
    • chrf
    • ce_mean_words
    • bleu_detok
  • logs as a dict of log lines, indexed by their header (e.g. marian, data, memory)

Install and run the package

On a virtual environment, you can install the package using pip:

$ pip install .

Run the parser with the local sample:

$ parse_tc_logs -i samples/<log_file>

Publish data to Weight & Biases:

$ parse_tc_logs -i samples/<log_file> --wandb-project <project> --wandb-group=<group> --wandb-run-name=<run>

Run the parser on a directory containing experiments and publis to Weight & Biases:

$ parse_experiment_dir -d models

Development

On a virtual environment, you can install the package using pip: A developer may want to install the package in editable mode (i.e install from the local path directly):

$ pip install -e .

Pre-commit rules are automatically run once pre-commits hooks have been installed:

$ pip install pre-commit
$ pre-commit install
$ pre-commit run -a # Run pre-commit once

About

Prorotyping experiment tracking system for Firefox Translations

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 51.4%
  • Jupyter Notebook 48.6%