Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Need fp8 support to use H100/H200 and halve GPU costs #3151

Open
samueldashadrach opened this issue Jan 2, 2025 · 1 comment
Open

Need fp8 support to use H100/H200 and halve GPU costs #3151

samueldashadrach opened this issue Jan 2, 2025 · 1 comment

Comments

@samueldashadrach
Copy link

H100/H200 have dedicated fp8 tensor cores with twice as many FLOPs as fp16. This will halve GPU cost. Developers will use whichever library has lower cost, assuming the cost reduction comes without the dev having to put additional effort.

Please consider prioritising fp8 support, your library risks going outdated otherwise.

@samueldashadrach
Copy link
Author

Update: I'm doing embedding inference using best models on MTEB benchmark, to be more specific.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant