[RFC] Improved APIs for defining dbt asset dependencies #14964
Replies: 3 comments 3 replies
-
Hi, nice feature ! One question : as I see a manifest path input, is this possible with |
Beta Was this translation helpful? Give feedback.
-
So does this mean that the python (downstream) process no longer needs to be aware of prefixes to normal or source assets? Rather given the upstream manifest reference and the @assets |
Beta Was this translation helpful? Give feedback.
-
I think it would be helpful to have an example with a downstream op. (For example, if you have a dbt model that calculates daily revenue as part of the nightly dbt run, and later in the day you want to send an email with yesterday's revenue without re-running dbt.) Right now I'm doing: @dbt_assets(
manifest=manifest,
io_manager_key="my_data_warehouse_io_manager",
)
def all_dbt_assets(context: OpExecutionContext, dbt: DbtCli):
...
@job
def my_job():
my_op(all_dbt_assets.to_source_asset(key=manifest.get_asset_key_for_model(model_name="my_dbt_model"))) but I'm not sure if this is the right idiom. It also doesn't show the dbt asset in the dagit UI graph of the job, but I don't know if it's supposed to (would be neat though) |
Beta Was this translation helpful? Give feedback.
-
This discussion details new experimental APIs meant to improve the ergonomics of defining python dependencies upstream and downstream of dbt assets. This is a continuation of the changes described in this prior discussion: #14477.
Please comment with any questions or feedback! Feedback here will help to inform future API changes.
Python assets upstream of dbt assets
Before 0.19.12
When a dbt asset has an upstream asset, it is defined as a source in a YAML file. In Dagster, you can add a definition for this source by explicitly specifying the asset key. When the source name or table name changes in dbt, this key would have to be updated in the asset definition:
After 0.19.12
A new
DbtManifest.get_asset_key_for_source
method fetches the asset key of a given source. This allows users to update source names or table names in dbt without updating the asset definition:If the source contains multiple tables that are created from a single step, users can call the
DbtManifest.get_asset_keys_by_output_name_for_source
method to define a@multi_asset
. This method returns a mapping of asset key by output name, allowing users to specify additional fields on theAssetOut
:Python assets downstream of dbt assets
Before 0.19.12
Downstream assets are defined by explicitly setting the asset key to match the upstream dbt model's key. This requires changing the key when the upstream model changes (i.e. the schema or model name is changed):
After 0.19.12
Users can specify an upstream asset key in the
@asset
decorator by callingDbtManifest.get_asset_key_for_model
:Beta Was this translation helpful? Give feedback.
All reactions