minor changes

mosaicml · Oct 31, 2023 · 5b10164 · 5b10164
1 parent fa46318
commit 5b10164
Showing 1 changed file with 2 additions and 2 deletions.
diff --git a/TUTORIAL.md b/TUTORIAL.md
@@ -344,7 +344,7 @@ The majority of our training setups use `triton`. -->
   What is the result of this? Although sm89+ is not **formally** supported until LLVM15, our testing on H100 GPUs shows that `attn_impl=triton` still works well and still runs fast. The only issue is that when the network is starting to run, LLVM might throw a warning like: `'sm_90' is not a recognized processor for this target (ignoring processor)`. This warning does not seem to affect performance.
 
 #### Support for FlashAttention-2
-- [FlashAttention-2](https://arxiv.org/pdf/2307.08691.pdf) improves upon FlashAttention to get even faster attention computation. LLM-Foundry now supports FlashAttention-2 by simply using the [new MosaicML Docker image](https://github.com/mosaicml/llm-foundry#mosaicml-docker-images), then [following the instructions here](https://github.com/mosaicml/llm-foundry#with-docker-recommended), and then running <code>pip install -e ".[gpu-flash2]"</code>. Then setting <code>attn_impl: flash</code> uses FlashAttention2. This will also install the [flash-attn library](https://github.com/Dao-AILab/flash-attention) v2.3.2.
+- [FlashAttention-2](https://arxiv.org/pdf/2307.08691.pdf) improves upon FlashAttention to get even faster attention computation. LLM-Foundry now supports FlashAttention-2, please follow the instructions [here](https://github.com/mosaicml/llm-foundry/tree/main/scripts/train#flashattention).
 
 ### What kinds of positional embeddings does LLM Foundry support?
 Currently we support [Learned Positional Embeddings](https://arxiv.org/pdf/1706.03762.pdf), [Attention with Linear Biases (ALiBi)](https://arxiv.org/pdf/2108.12409.pdf), and [Rotary Positional Embeddings (RoPE)](https://arxiv.org/pdf/2104.09864.pdf). There is also an option to switch off all of these embeddings to get (No Positional Embedding)[https://arxiv.org/pdf/2203.16634.pdf].
@@ -353,7 +353,7 @@ Currently we support [Learned Positional Embeddings](https://arxiv.org/pdf/1706.
 |:-----------------------------------|:------------------------------------------------------------------|:---------------------------------------------------:|:----------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
 | Learned Positional Embeddings      | <pre>model:<br>     learned_pos_emb:&nbsp;True</pre>| 65.7                                                |                                                                                                                                                                             |
 | ALiBi                              | <pre>model:<br>     attn_config:<br>         alibi:&nbsp;True</pre>| 64.5                                                |  Requires Triton or Torch attention.                                                                                                                                        |
-| RoPE (Dao-AILab Implementation)    | <pre>model:<br>     attn_config:<br>         rope:&nbsp;True<br>         rope_imp:&nbsp;dail</pre>| 64.5                                                | Requires a CUDA GPU and [the flash-attn library](https://github.com/Dao-AILab/flash-attention) (v2.0.1 or higher) to be installed. Please see the instructions in the section above on how to install the flash-attn library v2.3.2. Note that attention implementation can still be torch, triton, or flash, just that this needs the the flash-attn library (v2.0.1 or higher) since we import their RotaryEmbedding class. |
+| RoPE (Dao-AILab Implementation)    | <pre>model:<br>     attn_config:<br>         rope:&nbsp;True<br>         rope_imp:&nbsp;dail</pre>| 64.5                                                | Requires a CUDA GPU and [the flash-attn library](https://github.com/Dao-AILab/flash-attention) (v2.0.1 or higher) to be installed. Please see the instructions in the [paragraph above](#support-for-flashattention-2) on how to install flash-attn library v2. Note that attention implementation can still be torch, triton, or flash, just that this needs the the flash-attn library (v2.0.1 or higher) since we import their RotaryEmbedding class. |
 | RoPE (Huggin Face Implementation)  | <pre>model:<br>     attn_config:<br>         rope:&nbsp;True<br>         rope_imp:&nbsp;hf</pre>| 62.3                                                |                                                                                                                                                                             |
 
 ### Can I finetune using PEFT / LoRA?