Skip to content

Commit

Permalink
minor changes
Browse files Browse the repository at this point in the history
  • Loading branch information
ShashankMosaicML committed Oct 31, 2023
1 parent fa46318 commit 5b10164
Showing 1 changed file with 2 additions and 2 deletions.
4 changes: 2 additions & 2 deletions TUTORIAL.md
Original file line number Diff line number Diff line change
Expand Up @@ -344,7 +344,7 @@ The majority of our training setups use `triton`. -->
What is the result of this? Although sm89+ is not **formally** supported until LLVM15, our testing on H100 GPUs shows that `attn_impl=triton` still works well and still runs fast. The only issue is that when the network is starting to run, LLVM might throw a warning like: `'sm_90' is not a recognized processor for this target (ignoring processor)`. This warning does not seem to affect performance.

#### Support for FlashAttention-2
- [FlashAttention-2](https://arxiv.org/pdf/2307.08691.pdf) improves upon FlashAttention to get even faster attention computation. LLM-Foundry now supports FlashAttention-2 by simply using the [new MosaicML Docker image](https://github.com/mosaicml/llm-foundry#mosaicml-docker-images), then [following the instructions here](https://github.com/mosaicml/llm-foundry#with-docker-recommended), and then running <code>pip install -e ".[gpu-flash2]"</code>. Then setting <code>attn_impl: flash</code> uses FlashAttention2. This will also install the [flash-attn library](https://github.com/Dao-AILab/flash-attention) v2.3.2.
- [FlashAttention-2](https://arxiv.org/pdf/2307.08691.pdf) improves upon FlashAttention to get even faster attention computation. LLM-Foundry now supports FlashAttention-2, please follow the instructions [here](https://github.com/mosaicml/llm-foundry/tree/main/scripts/train#flashattention).

### What kinds of positional embeddings does LLM Foundry support?
Currently we support [Learned Positional Embeddings](https://arxiv.org/pdf/1706.03762.pdf), [Attention with Linear Biases (ALiBi)](https://arxiv.org/pdf/2108.12409.pdf), and [Rotary Positional Embeddings (RoPE)](https://arxiv.org/pdf/2104.09864.pdf). There is also an option to switch off all of these embeddings to get (No Positional Embedding)[https://arxiv.org/pdf/2203.16634.pdf].
Expand All @@ -353,7 +353,7 @@ Currently we support [Learned Positional Embeddings](https://arxiv.org/pdf/1706.
|:-----------------------------------|:------------------------------------------------------------------|:---------------------------------------------------:|:----------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| Learned Positional Embeddings | <pre>model:<br> learned_pos_emb:&nbsp;True</pre>| 65.7 | |
| ALiBi | <pre>model:<br> attn_config:<br> alibi:&nbsp;True</pre>| 64.5 | Requires Triton or Torch attention. |
| RoPE (Dao-AILab Implementation) | <pre>model:<br> attn_config:<br> rope:&nbsp;True<br> rope_imp:&nbsp;dail</pre>| 64.5 | Requires a CUDA GPU and [the flash-attn library](https://github.com/Dao-AILab/flash-attention) (v2.0.1 or higher) to be installed. Please see the instructions in the section above on how to install the flash-attn library v2.3.2. Note that attention implementation can still be torch, triton, or flash, just that this needs the the flash-attn library (v2.0.1 or higher) since we import their RotaryEmbedding class. |
| RoPE (Dao-AILab Implementation) | <pre>model:<br> attn_config:<br> rope:&nbsp;True<br> rope_imp:&nbsp;dail</pre>| 64.5 | Requires a CUDA GPU and [the flash-attn library](https://github.com/Dao-AILab/flash-attention) (v2.0.1 or higher) to be installed. Please see the instructions in the [paragraph above](#support-for-flashattention-2) on how to install flash-attn library v2. Note that attention implementation can still be torch, triton, or flash, just that this needs the the flash-attn library (v2.0.1 or higher) since we import their RotaryEmbedding class. |
| RoPE (Huggin Face Implementation) | <pre>model:<br> attn_config:<br> rope:&nbsp;True<br> rope_imp:&nbsp;hf</pre>| 62.3 | |

### Can I finetune using PEFT / LoRA?
Expand Down

0 comments on commit 5b10164

Please sign in to comment.