Skip to content

Commit

Permalink
Merge branch 'main' into dbfs-hf
Browse files Browse the repository at this point in the history
  • Loading branch information
dakinggg authored Jun 5, 2024
2 parents 3adc789 + ac56dc5 commit e5a0719
Show file tree
Hide file tree
Showing 27 changed files with 1,724 additions and 272 deletions.
4 changes: 4 additions & 0 deletions .github/CODEOWNERS
Validating CODEOWNERS rules …
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,10 @@
# This includes setup.py, the README, and the CODEOWNERS file itself!
/* @mosaicml/composer-team-admins

# Require team approval for code changes
/llmfoundry/ @mosaicml/composer-team-eng
/scripts/ @mosaicml/composer-team-eng

# Require admin approval to change the CI build configuration
# All CI Changes should be reviewed for security
/.ci/ @mosaicml/composer-team-admins
Expand Down
2 changes: 1 addition & 1 deletion Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@ ADD https://raw.githubusercontent.com/mosaicml/llm-foundry/$BRANCH_NAME/setup.py
RUN rm setup.py

# Install TransformerEngine
RUN NVTE_FRAMEWORK=pytorch CMAKE_BUILD_PARALLEL_LEVEL=4 MAX_JOBS=4 pip install git+https://github.com/NVIDIA/TransformerEngine.git@05eb6deb31c1b48e9f4380d18fe95f3c38e84335
RUN NVTE_FRAMEWORK=pytorch CMAKE_BUILD_PARALLEL_LEVEL=3 MAX_JOBS=3 pip install git+https://github.com/cli99/TransformerEngine.git@6b21f606f2459d49c2113d69236d68d334edeb4c

# Install and uninstall foundry to cache foundry requirements
RUN git clone -b $BRANCH_NAME https://github.com/mosaicml/llm-foundry.git
Expand Down
6 changes: 3 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -169,11 +169,11 @@ pip install -e ".[gpu]" # or `pip install -e .` if no NVIDIA GPU.
```

### TransformerEngine and amp_fp8 support
NVIDIA H100 GPUs have FP8 support; this additionally requires the following installations:
NVIDIA H100 GPUs have FP8 support; we have installed Flash Attention and Transformer in our Docker images already (see above). If you are not using our Docker images, you can install these packages with:
<!--pytest.mark.skip-->
```bash
pip install flash-attn==1.0.7 --no-build-isolation
pip install git+https://github.com/NVIDIA/TransformerEngine.git@v0.10
pip install flash-attn --no-build-isolation
pip install git+https://github.com/NVIDIA/TransformerEngine.git@stable
```

See [here](https://github.com/mosaicml/llm-foundry/blob/main/TUTORIAL.md#TransformerEngine-and-amp_fp8-support) for more details on enabling TransformerEngine layers and amp_fp8.
Expand Down
5 changes: 5 additions & 0 deletions llmfoundry/callbacks/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,8 @@
from llmfoundry.callbacks.log_mbmoe_tok_per_expert_callback import (
MegaBlocksMoE_TokPerExpert,
)
from llmfoundry.callbacks.loss_perp_v_len_callback import \
LossPerpVsContextLengthLogger
from llmfoundry.callbacks.monolithic_ckpt_callback import (
MonolithicCheckpointSaver,
)
Expand Down Expand Up @@ -52,6 +54,8 @@
callbacks.register('mbmoe_tok_per_expert', func=MegaBlocksMoE_TokPerExpert)
callbacks.register('run_timeout', func=RunTimeoutCallback)

callbacks.register('loss_perp_v_len', func=LossPerpVsContextLengthLogger)

callbacks_with_config.register('async_eval', func=AsyncEval)
callbacks_with_config.register('curriculum_learning', func=CurriculumLearning)

Expand All @@ -66,4 +70,5 @@
'MegaBlocksMoE_TokPerExpert',
'AsyncEval',
'CurriculumLearning',
'LossPerpVsContextLengthLogger',
]
Loading

0 comments on commit e5a0719

Please sign in to comment.