Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Embeddings base class, SentenceTransformerEmbeddings class, EmbeddingGeneration and FaissNearestNeighbour steps #830

Merged
merged 14 commits into from
Jul 26, 2024

Conversation

gabrielmbmb
Copy link
Member

@gabrielmbmb gabrielmbmb commented Jul 25, 2024

Description

This PR adds a new Embeddings base class which purpose is to define an interface for classes that generates sentence embeddings. In addition, it adds the first implementation of this class, the SentenceTransformerEmbeddings class which uses sentence-transformers package.

In addition, it adds EmbeddingGeneration step which uses an Embeddings model to generate sentence embeddings, and FaissNearestNeighbour step which uses faiss to create an index used to search the top k nearest neighbour and the scores for each input row.

Copy link

Documentation for this PR has been built. You can view it at: https://distilabel.argilla.io/pr-830/

Copy link

codspeed-hq bot commented Jul 25, 2024

CodSpeed Performance Report

Merging #830 will not alter performance

Comparing embeddings (8527538) with develop (6ac0267)

Summary

✅ 1 untouched benchmarks

@gabrielmbmb gabrielmbmb changed the title Add Embeddings base class and SentenceTransformers class Add Embeddings base class, SentenceTransformerEmbeddings class and EmbeddingGeneration step Jul 25, 2024
@gabrielmbmb gabrielmbmb added the enhancement New feature or request label Jul 25, 2024
@gabrielmbmb gabrielmbmb added this to the 1.3.0 milestone Jul 25, 2024
@gabrielmbmb gabrielmbmb self-assigned this Jul 25, 2024
@gabrielmbmb gabrielmbmb changed the title Add Embeddings base class, SentenceTransformerEmbeddings class and EmbeddingGeneration step Add Embeddings base class, SentenceTransformerEmbeddings class, EmbeddingGeneration and FaissNearestNeighbour steps Jul 26, 2024
@gabrielmbmb gabrielmbmb marked this pull request as ready for review July 26, 2024 17:02
@gabrielmbmb gabrielmbmb merged commit 04b86f5 into develop Jul 26, 2024
7 checks passed
@gabrielmbmb gabrielmbmb deleted the embeddings branch July 26, 2024 17:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

1 participant