Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BetterTransformer optimizations can't be applied to Falcon #1543

Closed
4 tasks
pcuenca opened this issue Nov 16, 2023 · 1 comment
Closed
4 tasks

BetterTransformer optimizations can't be applied to Falcon #1543

pcuenca opened this issue Nov 16, 2023 · 1 comment
Labels
bug Something isn't working

Comments

@pcuenca
Copy link
Member

pcuenca commented Nov 16, 2023

System Info

Python 3.10, optimum @ main, transformers @ main

Who can help?

@fxmarty

Information

  • The official example scripts
  • My own modified scripts

Tasks

  • An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
  • My own task or dataset (give details below)

Reproduction (minimal, reproducible, runnable)

Reproduction:

from transformers import AutoTokenizer, AutoModelForCausalLM
from optimum.bettertransformer import BetterTransformer
import torch

model_id = "tiiuae/falcon-rw-1b"

tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id)
model = BetterTransformer.transform(model)

inputs = tokenizer("Hello, my dog is cute", return_tensors="pt")
outputs = model.generate(**inputs, max_new_tokens=10)

Falcon attention was refactored in huggingface/transformers@05ea7b7#diff-81c616a9db6f569c579ccf03c30c2f69aa7b65fa40959ac7e882fb8d541891d7. This removed the property maybe_rotary and adopted llama conventions for rotary embeddings.

We could modify the use of maybe_rotary here by using something like:

        submodules = ["query_key_value", "dense", "attention_dropout"]
        if not config.alibi:
            submodules.append("rotary_emb")

And then we'd need to adapt the code here, applying rotary embeddings when alibi is not in use.

Expected behavior

Transformation would work.

@pcuenca pcuenca added the bug Something isn't working label Nov 16, 2023
@fxmarty
Copy link
Contributor

fxmarty commented Dec 13, 2023

Hi, Falcon with SDPA is supported by default in Transformers now huggingface/transformers#26572, and we deprecate the usage of BetterTransformer for this architecture.

See https://huggingface.co/docs/transformers/perf_infer_gpu_one#flashattention-and-memory-efficient-attention-through-pytorchs-scaleddotproductattention

@fxmarty fxmarty closed this as completed Dec 13, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants