Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add containers/tei/{cpu,gpu}/1.6.0 #132

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

alvarobartt
Copy link
Member

Description

This PR adds a new container for TEI v1.6.0 just released (see the release notes at https://github.com/huggingface/text-embeddings-inference/releases/tag/v1.6.0).

The main feature on TEI v1.6.0 w.r.t. TEI v1.5.0 is that it now supports multiple CPU backends, not just ONNX, meaning that it can also serve embedding models on CPU with backends other than ONNX (since not every model on the Hub comes with an ONNX-converted version of the weights). Some other features include the addition of the General Text Embeddings (GTE) heads, the implementation of MPNet, fixes around the health checks, and much more.

Note

Note that this PR also includes the changes from the https://github.com/huggingface/text-embeddings-inference/releases/tag/v1.5.1 release.

To inspect the changes required to make the TEI container work in GCP, see the diff at:

Copy link
Member

@philschmid philschmid left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@philschmid
Copy link
Member

How does this CPU multibackend work? Does it check if there are *.onnx weights and if so use them if not use normal pytorch + candle?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants