This repository contains a template for creating custom components for your deepset Cloud pipelines. Components are Python code snippets that perform specific tasks within your pipeline. This template will guide you through all the necessary elements your custom component must include. In addition this repository contains two sample components which are ready to be used:
CharacterSplitter
implemented in/src/dc_custom_component/example_components/preprocessors/character_splitter.py
to be used in indexing pipelinesKeywordBooster
implemented in/src/dc_custom_component/example_components/rankers/keyword_booster.py
to be used in query pipelines
For more information about custom components, please refer to our Custom Components documentation. For a step-by-step guide on creating custom components, see Create a Custom Component.
- Python v3.10 or v3.11
hatch
package manager
We use hatch
to manage our Python packages. Install it with pip:
pip install hatch
Once installed, create a virtual environment by running:
hatch shell
This installs all necessary packages needed to create a custom component. You can reference this virtual environment in your IDE.
For more information on hatch, please refer to the official Hatch documentation.
File | Description |
---|---|
/src/dc_custom_component/components |
Directory for implementing custom components. You can logically group custom components in sub-directories. E.g. see how example components are grouped by type. |
/src/dc_custom_component/__about__.py |
Your custom components' version. deepset Cloud always uses the latest version. Bump the version every time you update your component before uploading it to deepset Cloud. |
/pyproject.toml |
Information about the project. If needed, add your components' dependencies in this file in the dependencies section. |
Note that the location of your custom component implementation defines the type name of your component to be used in pipeline YAML. E.g. our example components have the following type names:
dc_custom_component.example_components.preprocessor.character_splitter.CharacterSplitter
dc_custom_component.example_components.rankers.keyword_booster.KeyWordBooster
We defined a suite of formatting tools. To format your code, run:
hatch run code-quality:all
It's crucial to thoroughly test your custom component before uploading it to deepset Cloud. Consider adding unit tests and integration tests to ensure your component functions correctly within a pipeline.
- pytest is ready to be used with hatch
- implement your tests under
/test
- run
hatch run tests
- Fork this repository.
- Navigate to the
/src/dc_custom_component/components/
folder. - Add your custom components following the examples under
/src/dc_custom_component/example_components/
. - Delete example components under
/src/dc_custom_component/example_components/
. - Update the components' version in
/src/__about__.py
. - Format your code using the
hatch run code-quality:all
command. (hatch commands work from project root directory only) - Set your deepset Cloud API key.
- On Linux and macOS:
export API_KEY=<TOKEN>
- On Windows:
SET API_KEY=<TOKEN>
- On Linux and macOS:
- Upload your project by running the following command from inside of this project:
- On Linux and macOS:
hatch run dc:build-and-push
- On Windows:
hatch run dc:build-and-push-windows
This creates a zip file calledcustom_component.zip
in thedist
directory and uploads it to deepset Cloud.
- On Linux and macOS:
For detailed instructions, refer to our documentation on Creating a Custom Component.