From 6354afdcb2d52c986f3f0051d03132f474944c4d Mon Sep 17 00:00:00 2001 From: Sebastian Hoffmann Date: Tue, 2 Apr 2024 15:22:34 +0200 Subject: [PATCH] chore: readme --- README.md | 18 ++++++++++++++---- 1 file changed, 14 insertions(+), 4 deletions(-) diff --git a/README.md b/README.md index 9e65de4..f0e9db0 100644 --- a/README.md +++ b/README.md @@ -1,10 +1,20 @@ # dmlcloud [![](https://img.shields.io/pypi/v/dmlcloud)](https://pypi.org/project/dmlcloud/) -[![](https://img.shields.io/github/actions/workflow/status/sehoffmann/dmlcloud/run_tests.yml?logo=github)](https://github.com/sehoffmann/dmlcloud/actions/workflows/run_tests.yml) +[![](https://img.shields.io/github/actions/workflow/status/sehoffmann/dmlcloud/run_tests.yml?label=tests&logo=github)](https://github.com/sehoffmann/dmlcloud/actions/workflows/run_tests.yml) [![](https://img.shields.io/github/actions/workflow/status/sehoffmann/dmlcloud/run_linting.yml?label=lint&logo=github)](https://github.com/sehoffmann/dmlcloud/actions/workflows/run_linting.yml) -Flexibel, easy-to-use, opinionated +*Flexibel, easy-to-use, opinionated* -**dmlcloud** is a library for distributed training of deep learning models with torch. Its main aim is to do all these tiny little tedious things that everybody just copy pastes over and over again, while still giving you full control over the training loop and maximum flexibility. +*dmlcloud* is a library for **distributed training** of deep learning models with *torch*. Unlike other similar frameworks, dmcloud adds as little additional complexity and abstraction as possible. It is tailored towards a carefully selected set of libraries and workflows. -Unlike other similar frameworks, such as *lightning*, dmcloud tries to add as little additional complexity and abstraction as possible. Instead, it is tailored towards a careful selected set of libraries and workflows and sticks with them. +## Installation +``` +pip install dmlcloud +``` + +## Why dmlcloud? +- Easy initialization of `torch.distributed` (supports *slurm* and *MPI*). +- Simple, yet powerful, API. No unnecessary abstractions and complications. +- Checkpointing and metric tracking (distributed) +- Extensive logging and diagnostics out-of-the-box. Greatly improve reproducability and traceability. +- A wealth of useful utility functions required for distributed training (e.g. for data set sharding)