diff --git a/TUTORIAL.md b/TUTORIAL.md
index 8cbf484e09..c0eeb078ca 100644
--- a/TUTORIAL.md
+++ b/TUTORIAL.md
@@ -344,7 +344,7 @@ The majority of our training setups use `triton`. -->
What is the result of this? Although sm89+ is not **formally** supported until LLVM15, our testing on H100 GPUs shows that `attn_impl=triton` still works well and still runs fast. The only issue is that when the network is starting to run, LLVM might throw a warning like: `'sm_90' is not a recognized processor for this target (ignoring processor)`. This warning does not seem to affect performance.
#### Support for FlashAttention-2
-- [FlashAttention-2](https://arxiv.org/pdf/2307.08691.pdf) improves upon FlashAttention to get even faster attention computation. LLM-Foundry now supports FlashAttention-2 by simply using the [new MosaicML Docker image](https://github.com/mosaicml/llm-foundry#mosaicml-docker-images), then [following the instructions here](https://github.com/mosaicml/llm-foundry#with-docker-recommended), and then running pip install -e ".[gpu-flash2]"
. Then setting attn_impl: flash
uses FlashAttention2. This will also install the [flash-attn library](https://github.com/Dao-AILab/flash-attention) v2.3.2.
+- [FlashAttention-2](https://arxiv.org/pdf/2307.08691.pdf) improves upon FlashAttention to get even faster attention computation. LLM-Foundry now supports FlashAttention-2, please follow the instructions [here](https://github.com/mosaicml/llm-foundry/tree/main/scripts/train#flashattention).
### What kinds of positional embeddings does LLM Foundry support?
Currently we support [Learned Positional Embeddings](https://arxiv.org/pdf/1706.03762.pdf), [Attention with Linear Biases (ALiBi)](https://arxiv.org/pdf/2108.12409.pdf), and [Rotary Positional Embeddings (RoPE)](https://arxiv.org/pdf/2104.09864.pdf). There is also an option to switch off all of these embeddings to get (No Positional Embedding)[https://arxiv.org/pdf/2203.16634.pdf].
@@ -353,7 +353,7 @@ Currently we support [Learned Positional Embeddings](https://arxiv.org/pdf/1706.
|:-----------------------------------|:------------------------------------------------------------------|:---------------------------------------------------:|:----------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| Learned Positional Embeddings |
model:| 65.7 | | | ALiBi |
learned_pos_emb: True
model:| 64.5 | Requires Triton or Torch attention. | -| RoPE (Dao-AILab Implementation) |
attn_config:
alibi: True
model:| 64.5 | Requires a CUDA GPU and [the flash-attn library](https://github.com/Dao-AILab/flash-attention) (v2.0.1 or higher) to be installed. Please see the instructions in the section above on how to install the flash-attn library v2.3.2. Note that attention implementation can still be torch, triton, or flash, just that this needs the the flash-attn library (v2.0.1 or higher) since we import their RotaryEmbedding class. | +| RoPE (Dao-AILab Implementation) |
attn_config:
rope: True
rope_imp: dail
model:| 64.5 | Requires a CUDA GPU and [the flash-attn library](https://github.com/Dao-AILab/flash-attention) (v2.0.1 or higher) to be installed. Please see the instructions in the [paragraph above](#support-for-flashattention-2) on how to install flash-attn library v2. Note that attention implementation can still be torch, triton, or flash, just that this needs the the flash-attn library (v2.0.1 or higher) since we import their RotaryEmbedding class. | | RoPE (Huggin Face Implementation) |
attn_config:
rope: True
rope_imp: dail
model:| 62.3 | | ### Can I finetune using PEFT / LoRA?
attn_config:
rope: True
rope_imp: hf