Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding support for Rotary Position Embeddings #675

Merged
merged 119 commits into from
Nov 6, 2023

Conversation

ShashankMosaicML
Copy link
Contributor

@ShashankMosaicML ShashankMosaicML commented Oct 13, 2023

Adding the support for Rotary Positional Embeddings(RoPE).

tl;dr: Advantages of RoPE: This embedding gets applied to the query and key matrices, and hence (unlike ALiBi embeddings), is agnostic to the attention implementation and works out-of-the box for any attention implementation. Recent works have also shown that some variants of RoPE are good at extrapolating beyond training length.

Design doc

Experiments: 125M model, 1B model.

@ShashankMosaicML
Copy link
Contributor Author

@dakinggg will this merge cause any problems in merging mpt code into the Hugging Face github codebase?

llmfoundry/models/layers/attention.py Show resolved Hide resolved
llmfoundry/models/layers/attention.py Outdated Show resolved Hide resolved
llmfoundry/models/layers/blocks.py Outdated Show resolved Hide resolved
llmfoundry/models/layers/blocks.py Outdated Show resolved Hide resolved
llmfoundry/models/mpt/configuration_mpt.py Outdated Show resolved Hide resolved
llmfoundry/models/mpt/configuration_mpt.py Show resolved Hide resolved
llmfoundry/models/mpt/modeling_mpt.py Outdated Show resolved Hide resolved
tests/test_flash_triton_torch.py Outdated Show resolved Hide resolved
tests/test_rope_dail_vs_hf.py Outdated Show resolved Hide resolved
tests/test_model.py Outdated Show resolved Hide resolved
TUTORIAL.md Outdated Show resolved Hide resolved
TUTORIAL.md Outdated Show resolved Hide resolved
TUTORIAL.md Outdated Show resolved Hide resolved
TUTORIAL.md Outdated Show resolved Hide resolved
llmfoundry/models/layers/attention.py Outdated Show resolved Hide resolved
llmfoundry/models/layers/attention.py Outdated Show resolved Hide resolved
llmfoundry/models/mpt/configuration_mpt.py Show resolved Hide resolved
llmfoundry/models/mpt/modeling_mpt.py Show resolved Hide resolved
@ShashankMosaicML ShashankMosaicML enabled auto-merge (squash) November 6, 2023 22:07
@ShashankMosaicML ShashankMosaicML enabled auto-merge (squash) November 6, 2023 22:12
@ShashankMosaicML ShashankMosaicML enabled auto-merge (squash) November 6, 2023 22:33
@ShashankMosaicML ShashankMosaicML merged commit 1d504c8 into mosaicml:main Nov 6, 2023
12 checks passed
@ShashankMosaicML ShashankMosaicML deleted the rotary_hf_imp branch November 6, 2023 23:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants