This document describes the current state of Dagger and the features we will prioritize in the future.
The roadmap of the project is meant to be flexible, shaped by our interactions with the community and the time our contributors can dedicate to the project. Thus, if you have a suggestion or believe a certain ticket should be prioritized, please let us know through GitHub Discussions or via email: [email protected].
Use cases
- Parameterize DAGs and use those parameters from Tasks.
- Use the output of a node as the input of another.
- Compose multiple DAGs together ([#5]).
- Pass extra runtime options to any node ([#6]).
- Define DAGs using an imperative, Pythonic DSL ([#7]).
- Use hardcoded/literal values as parameters to a task ([#9]).
- Support map-reduce operations via partitioned outputs and nodes ([#12]).
- Support conditional executions of tasks.
- Support exit hooks (e.g.
on_success
,on_failure
). - Support node caching/memoization.
- Support parallel execution of nodes in the local runtime.
Supported Runtimes
- Local execution
- CLI-driven execution
- Argo Workflows (official website)
- Kubeflow Pipelines (official website)
- Airflow (official website)
If you have developed a custom runtime outside of this repository and you believe it may be useful for the community, please open a PR linking to it here.
Built-in Serializers
-
AsJSON
-
AsPickle
-
AsMessagePack
(format) -
AsAvro
([format])(https://avro.apache.org/docs/current/) -
AsParquet
(for Pandas DataFrames) -
AsCSV
(for Pandas DataFrames)
If you have developed a custom serializer outside of this repository and you believe it may be useful for the community, please open a PR linking to it here (or consider adding it to the main library).
Documentation
- README and project overview.
- Contribution Guidelines.
- Web-based Documentation Portal.
- Curated examples covering beginner and advanced use cases.
- Quick start guide.
- API reference.
- Core concepts:
- DAGs and Tasks (Nodes)
- Node Outputs
- Node Inputs
- Output and Node partitioning; Map-Reduce operations
- Serializers
- Runtimes
- Imperative DSL