Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

try setting attn impl to sdpa... #202

Open
gokulcoder7 opened this issue Nov 9, 2024 · 0 comments
Open

try setting attn impl to sdpa... #202

gokulcoder7 opened this issue Nov 9, 2024 · 0 comments

Comments

@gokulcoder7
Copy link

model = AutoModel.from_pretrained("nvidia/Llama-3.1-Nemotron-70B-Instruct-HF",compression='4bit')
Fetching 8 files: 100%|██████████████████████████████████████████████████████████████████████████| 8/8 [00:00<?, ?it/s]
found_layers:{'model.embed_tokens.': True, 'model.layers.0.': True, 'model.layers.1.': True, 'model.layers.2.': True, 'model.layers.3.': True, 'model.layers.4.': True, 'model.layers.5.': True, 'model.layers.6.': True, 'model.layers.7.': True, 'model.layers.8.': True, 'model.layers.9.': True, 'model.layers.10.': True, 'model.layers.11.': True, 'model.layers.12.': True, 'model.layers.13.': True, 'model.layers.14.': True, 'model.layers.15.': True, 'model.layers.16.': True, 'model.layers.17.': True, 'model.layers.18.': True, 'model.layers.19.': True, 'model.layers.20.': True, 'model.layers.21.': True, 'model.layers.22.': True, 'model.layers.23.': True, 'model.layers.24.': True, 'model.layers.25.': True, 'model.layers.26.': True, 'model.layers.27.': True, 'model.layers.28.': True, 'model.layers.29.': True, 'model.layers.30.': True, 'model.layers.31.': True, 'model.layers.32.': True, 'model.layers.33.': True, 'model.layers.34.': True, 'model.layers.35.': True, 'model.layers.36.': True, 'model.layers.37.': True, 'model.layers.38.': True, 'model.layers.39.': True, 'model.layers.40.': True, 'model.layers.41.': True, 'model.layers.42.': True, 'model.layers.43.': True, 'model.layers.44.': True, 'model.layers.45.': True, 'model.layers.46.': True, 'model.layers.47.': True, 'model.layers.48.': True, 'model.layers.49.': True, 'model.layers.50.': True, 'model.layers.51.': True, 'model.layers.52.': True, 'model.layers.53.': True, 'model.layers.54.': True, 'model.layers.55.': True, 'model.layers.56.': True, 'model.layers.57.': True, 'model.layers.58.': True, 'model.layers.59.': True, 'model.layers.60.': True, 'model.layers.61.': True, 'model.layers.62.': True, 'model.layers.63.': True, 'model.layers.64.': True, 'model.layers.65.': True, 'model.layers.66.': True, 'model.layers.67.': True, 'model.layers.68.': True, 'model.layers.69.': True, 'model.layers.70.': True, 'model.layers.71.': True, 'model.layers.72.': True, 'model.layers.73.': True, 'model.layers.74.': True, 'model.layers.75.': True, 'model.layers.76.': True, 'model.layers.77.': True, 'model.layers.78.': True, 'model.layers.79.': True, 'model.norm.': True, 'lm_head.': True}
saved layers already found in G:\nvidiallama-3_1-nemotron-70b-instruct\models--nvidia--Llama-3.1-Nemotron-70B-Instruct-HF\snapshots\fac73d3507320ec1258620423469b4b38f88df6e\splitted_model.4bit
The class optimum.bettertransformers.transformation.BetterTransformer is deprecated and will be removed in a future release.
new version of transfomer, no need to use BetterTransformer, try setting attn impl to sdpa...
attn imp: <class 'transformers.models.llama.modeling_llama.LlamaSdpaAttention'>
not support prefetching for compression for now. loading with no prepetching mode.

how to set attn impl to sdpa...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant