👍🎉 First off, thanks for taking the time to contribute! 🎉👍
The following is a set of guidelines for contributing to FastEmbed. These are mostly guidelines, not rules. Use your best judgment, and feel free to propose changes to this document in a pull request.
I don't want to read this whole thing, I just have a question!!!
Note: Please don't file an issue to ask a question. You'll get faster results by using the resources below:
Bugs are tracked as GitHub issues.
Explain the problem and include additional details to help maintainers reproduce the problem:
- Use a clear and descriptive title for the issue to identify the problem.
- Describe the exact steps which reproduce the problem in as many details as possible. For example, start by explaining how you are using FastEmbed, e.g. with Langchain, Qdrant Client, Llama Index and which command exactly you used. When listing steps, don't just say what you did, but explain how you did it.
- Provide specific examples to demonstrate the steps. Include links to files or GitHub projects, or copy/pasteable snippets, which you use in those examples. If you're providing snippets in the issue, use Markdown code blocks.
- Describe the behavior you observed after following the steps and point out what exactly is the problem with that behavior.
- Explain which behavior you expected to see instead and why.
- If the problem is related to performance or memory, include a call stack profile capture and your observations.
Include details about your configuration and environment:
- Which version of FastEmbed are you using? You can get the exact version by running
python -c "import fastembed; print(fastembed.__version__)"
. - What's the name and version of the OS you're using?
- Which packages do you have installed? You can get that list by running
pip freeze
Unsure where to begin contributing to FastEmbed? You can start by looking through these good-first-issue
issues:
- Good First Issue - issues which should only require a few lines of code, and a test or two. These are a great way to get started with FastEmbed. This includes adding new models which are already tested and ready on Huggingface Hub.
The best way to learn about the mechanics of FastEmbed is to start working on it.
Your first code contribution can be small bug fixes:
- This PR adds a small bug fix for a single input: qdrant#148
- This PR adds a check for the right file location and extension, specific to an OS: qdrant#128
Even documentation improvements and tests are most welcome:
- This PR fixes a README link: qdrant#143
- Open Requests for New Models are here.
- There are quite a few pull requests that were merged for this purpose and you can use them as a reference. Here is an example: qdrant#129
- Make sure to add tests for the new model
- The CANONICAL_VECTOR values must come from a reference implementation usually from Huggingface Transformers or Sentence Transformers
- Here is a reference Colab Notebook for how we will evaluate whether your VECTOR values in the test are correct or not.
We use ruff for code linting. It should be installed with poetry since it's a dev dependency.
We use pre-commit hooks to ensure that the code is linted before it's committed. You can install pre-commit hooks by running pre-commit install
in the root directory of the project.