We are always looking for ways to improve the templates and the workflow. You can file bugs in the repo.
We are also looking for contributions to new modules, especially example modules can be very useful for others. Please open a PR with your suggested changes or propose a functionality by creating an issue.
The official cdf_* modules are owned by the respective teams in Cognite. Any changes to these will be reviewed by the teams to ensure that nothing breaks. If you open a PR on these modules, the PR will be reviewed by the team owning the module.
cdf_infield_location is an example of a team-owned module.
Adding a new module consists of the following steps:
- Determine where to put it (core, common, modules, examples, or experimental).
- Create a new directory for the module with sub-directories per configuration type the module needs. See the YAML reference documentation.
- Add a
default.config.yaml
file to the module root directory if you have variables in the templates. - Add a
README.md
file to the module root directory with a description of the module and variables. - Update
default.packages.yaml
in cognite_toolkit root with the new module if it is part of a package - If this is an official module, add a description of the module in the module and package documentation.
If you are not a Cognite employee and would like to contribute a module, please open an issue, so we can get in touch with you.
Each module should be as standalone as possible, but they can be dependent on other modules. If you need to deploy a data model as a foundational element for both transformations and applications to work, you may add a module with the data model. However, a better module would be one that includes all the elements needed to get data from the source system, through RAW (if necessary), into a source data model, and then transformed by one or more transformations into a domain data model. The solution data models can then be a separate module that relies on the ingestion module.
Please take care to think about the best grouping of modules to make it easy to deploy and maintain. We are aiming at standardizing as much as possible, so we do not optimize for customer-specific changes and naming conventions except where we design to support it.
NOTE! Customer-specific projects should be able to use these templates directly, and also adopt new changes from this repository as they are released. Configurations that contain defaults that are meant to be changed by the customer, e.g. mapping of properties from source systems to CDF, should be contained in separate modules.
All the configurations should be kept in camelCase YAML and in a format that is compatible with the CDF API.
The configuration files are loaded directly into the Python SDK's support data classes for
use towards the CDF API. Client side schema validation should be done in the Python SDK and not in cdf-tk
to ensure that you can immediately
add a yaml configuration property without upcoming anything else than the version of the Python SDK.
NOTE!! As of now, any non-recognised properties will just be ignored by the Python SDK. If you don't get the desired configuration deployed, check your spelling.
The scripts currently support many resources like raw, data models, time series, groups, and transformations. It also has some support for loading of data that may be used as example data for CDF projects. However, as a general rule, templates should contain governed configurations necessary to set up ingest, data pipelines, and contextualisations, but not the actual data itself.
Of course, where data population of e.g. data model is part of the configuration, that is fine. The scripts are continuously under development to simplify management of configurations, and we are pushing the functionality into the Python SDK when that makes sense.
The cdf_
prefixed modules should be tested as part of the product development. Our internal
test framework for scenario based testing can be found in the Cognite private big-smoke repository.
The cdf-tk deploy
script command will clean configurations if you specify --drop
, so you can
try to apply the configuration multiple times without having to clean up manually. If you want to delete
everything that is governed by your templates, including data ingested into data models, the cdf-tk clean
script command can be used to clean up configurations using the scripts/delete.py
functions.
See tests for more information on how to run tests.
In order to develop cdf-tk
you need to set up a development environment. You need a working python
installation and a virtual environment. We recommend using poetry
to set up the environment as this is
the package tool that the toolkit repo uses also to create the installable python package.
When developing, you should use cdf-tk-dev.py
to run the toolkit. This script will set the environment and paths
correctly (to avoid running the installed cdf-tk package) and also set the SENTRY_ENABLED
environment
variable to false
to avoid sending errors to Sentry.
In .vscode/launch.json you will see a number of examples of debugging configurations that you can use to debug.
If you use VSCode or another IDE supporting devcontainers, the easiest way to set up the environment is to
run in the Dev Container as configured in .devcontainer. It creates a virtual python environment in .venv/ that
will automatically be picked up by VSCode or poetry also if you want to run outside the devcontainer.
When you develop cdf-tk
you should avoid sending errors to sentry
. You can control sentry
by setting
the environment
variable SENTRY_ENABLED=false
. This is set automatically when you use the cdf-tk-dev.py
.
The templates are bundled with the cdf-tk
tool, so they are released together.
To release a new version of the cdf-tk
tool and the templates, you need to do the following:
-
Create a new preparation branch from
main
where you can make the final changes and do version bumping, e.g.prepare_for_0_1_0b3
. UseaX
for alpha,bX
for beta, andrcX
for release candidate:-
Update
CHANGELOG.cdf-tk.md
file with a header e.g.## [0.1.0b3] - 2024-01-12
and review the change comments since the previous release. Ensure that the changes are correctly reflected in the comments and that the changes can be easily understood. Also verify that any breaking changes are clearly marked as such (**BREAKING**
). -
Do the same update to
CHANGELOG.templates.md
file. -
Update the files with the new version number, this is done with the
cdf bump --patch
(or--minor
,--major
,--alpha
,--beta
) command.cognite_toolkit/_version.py
pyproject.toml
_system.yaml
(multiple)
You can use the
python bump --minor --alpha
command to bump the version in all files. -
Run
poetry lock
to update thepoetry.lock
file. -
Run
pytest tests
locally to ensure that tests pass. -
Run
python module_upgrade/run_check.py
to ensure that thecdf-tk modules upgrade
command works as expected. against previous versions. See Module Upgrade for more information.if a check fails due to missing package:
- source .venv/.../bin/activate
- pip install dependency
- deactivate
- run script again
-
-
Get approval to squash merge the branch into
main
:- Verify that all Github actions pass.
-
Create a release branch:
release-x.y.z
frommain
:- Create a new tag on the branch with the version number, e.g.
v0.1.0b3
. - Open a PR with the existing
release
branch as base comparing to your newrelease-x.y.z
branch. - Get approval and merge (do not squash).
- Verify that the Github action
release
passes and pushes to PyPi.
- Create a new tag on the branch with the version number, e.g.
-
Create a new release on github.com with the tag and release notes:
- Find the tag you created and create the new release.
- Copy the release notes from the
CHANGELOG.cdf-tk.md
file, add a# cdf-tk
header. - Copy then further below the release notes from the
CHANGELOG.templates.md
file, add a# Templates
header. - Remember to mark as pre-release if this is not a final release.
-
Evaluate necessary announcements:
- On the Cognite Hub group, create a new post.
- As part of product releases, evaluate what to include.
- Cognite internal announcements.