Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CT-3183] [Feature] custom incremental strategy from dbt package #8769

Open
3 tasks done
sudo-pradip opened this issue Oct 4, 2023 · 7 comments
Open
3 tasks done
Labels
enhancement New feature or request Refinement Maintainer input needed

Comments

@sudo-pradip
Copy link

Is this your first time submitting a feature request?

  • I have read the expectations for open source contributors
  • I have searched the existing issues, and I could not find an existing issue for this feature
  • I am requesting a straightforward extension of existing dbt functionality, rather than a Big Idea better suited to a discussion

Describe the feature

  • We can create custom incremental strategy by writing macro with get_incremental_<name>_sql, which will be invoke run time using model_context variable and get_incremental_strategy_macro
  • If we want import custom incremental strategy from dbt package then it won't work
  • This feature will allow to use custom incremental strategy from dbt package

Describe alternatives you've considered

  • Alternatively we can create a intermediate macro at project to call custom incremental strategy macro
{% macro get_incremental_merge_null_safe_sql(arg_dict) %}
    {{ dbt_materializations.get_incremental_merge_null_safe_sql(arg_dict) }}
{% endmacro %}

Who will this benefit?

  • More flexibility to create and reuse custom incremental strategy, easy install and use

Are you interested in contributing this feature?

Yes

Anything else?

  • I tried to add macro at context variable, but didn't work
{% macro some_macro() %}

    {% set jm = dbt_materializations.get_incremental_merge_null_safe_sql %}
    {% set temp = context.update({'get_incremental_merge_null_safe_sql': jm}) %}

{% endmacro %}
@sudo-pradip sudo-pradip added enhancement New feature or request triage labels Oct 4, 2023
@github-actions github-actions bot changed the title [Feature] custom incremental strategy from dbt package [CT-3183] [Feature] custom incremental strategy from dbt package Oct 4, 2023
@dbeatty10
Copy link
Contributor

Thanks for opening this @sudo-pradip !

This is essentially one of the "outstanding questions" listed here:

  • How can users or an organization share their custom incremental strategies or overrides? Can they share in a dbt package on the dbt Package Hub for example?

Let's suppose you add a custom macro named get_incremental_merge_null_safe_sql to a dbt package named dbt_materializations. Suppose further than you install that package within a project named my_project.

Did you already try overriding global macros within dbt_project.yml like this?

dispatch:
  - macro_namespace: dbt
    search_order: ['my_project', 'dbt_materializations', 'dbt']

I haven't tried this myself, so curious if you already tried it.

@sudo-pradip
Copy link
Author

Hello @dbeatty10 ,
Thank you for your prompt response

  • overriding global macros is interesting topic, but it's not working
  • for calling a specific incremental strategy macro, get_incremental_strategy_macro uses model_context, which may not respect the order define under dispatch, feature for specifying search order is only made for adapter.dispatch method ??
  • Apart from that i was going thr adapater.dispatch method, in doc i found example about dbt_utils.surrogate_key and concat method
    • I override without specifying search order using {% macro default__concat(...) %}, it worked fine
    • then i tried to specify the search order as ['dbt_utils', 'my_project'], but it still uses my project's macro
    • is it correct way of using dispatch config ? i double check and by just passing ['my_project'] only for dbt_utils macro_namespace, then it fails dbt job which prove that i used in correct way

@dbeatty10
Copy link
Contributor

Finding the right macro

  • for calling a specific incremental strategy macro, get_incremental_strategy_macro uses model_context, which may not respect the order define under dispatch, feature for specifying search order is only made for adapter.dispatch method ??

You're probably onto something here!

If that area of the code doesn't respect dispatch, then the alternative you mentioned might be our only option currently:

{% macro get_incremental_merge_null_safe_sql(arg_dict) %}
    {{ dbt_materializations.get_incremental_merge_null_safe_sql(arg_dict) }}
{% endmacro %}

Do you have any ideas / opinions about re-implementation of get_incremental_strategy_macro in a different way?

Usage of dispatch

In general, you'd need a separate dispatch configs per package. So for dbt_utils and dbt_expectations you'd need two different dispatch specifications:

dispatch:
  - macro_namespace: dbt_utils
    search_order: ['my_project', 'dbt_utils', 'dbt']
  - macro_namespace: dbt_expectations
    search_order: ['my_project', 'dbt_expectations', 'dbt']

Wanna give that dispatch config a try with dbt_utils and see if it works for you?

@sudo-pradip
Copy link
Author

sudo-pradip commented Oct 9, 2023

Current

Expectation

My assumptions

  • model_context which used in get_incremental_strategy_macro is same context variable available in end users Jinja, so model and macro can use context as i previously tried to tweak that context variable to add my dbt package macro at top level
  • I see context has keys for dbt packages, in my case dbt_materlization where i can see all my macros, so dbt package macros are available at get_incremental_strategy_macro
  • In dbt-core i tried to find why we didn't use adapter.dispatch at get_incremental_strategy_macro, sound so simple right, i didn't found any method which uses that, turn out adapter.dispatch is only available at Jinja level, but i tried to find out dispatch method code it was at

Possible solutions

  1. Since model_context has all macros including from dbt packages (but not at top level), so only thing remains is to extract it, we can write same code as adapter.dispatch where first find out search_package (respect the dispatch config), a for loop, model_contex[package_name] is not None else attempts.append, end
  2. I'm not sure about reuse adapter.dispatch at get_incremental_strategy_macro, but even if we can we have to handle prefix(default__, snowflake__) which we may don't want to change, since many adapter (snowflake) already override get_incremental_strategy_macro macro
  3. we can use execute_macro

Solution 1 we can think, but we have to duplicate our logic at two places (i think it's all right ?), whats your input on this ?

@dbeatty10 dbeatty10 self-assigned this Oct 10, 2023
@dbeatty10
Copy link
Contributor

Thanks for doing this research @sudo-pradip 🧠

I'll either need to grab an extra set of expert eyes from an engineer on the dbt-core team do some deeper research myself. Labeling this as "Refinement" accordingly.

All of us are time-constrained this week and next due to the upcoming release of dbt-core of 1.7.0 and the Coalesce conference next week, so it could be a few weeks (or more) until we can dive deeper into this.

In the meantime, your best bet is the alternative you've already considered:

  • Alternatively we can create a intermediate macro at project to call custom incremental strategy macro
{% macro get_incremental_merge_null_safe_sql(arg_dict) %}
    {{ dbt_materializations.get_incremental_merge_null_safe_sql(arg_dict) }}
{% endmacro %}

@dbeatty10 dbeatty10 added Refinement Maintainer input needed and removed triage labels Oct 10, 2023
@dbeatty10 dbeatty10 removed their assignment Oct 10, 2023
@sudo-pradip
Copy link
Author

sudo-pradip commented Dec 4, 2023

any updates ?

@dbeatty10
Copy link
Contributor

@sudo-pradip we're not able to prioritize further research at this time.

So we've opened dbt-labs/docs.getdbt.com#4716 to document the installation steps that work currently.

mirnawong1 added a commit to dbt-labs/docs.getdbt.com that referenced this issue Jan 8, 2024
[Preview](https://docs-getdbt-com-git-dbeatty-custom-incremental-d92d96-dbt-labs.vercel.app/docs/build/incremental-models#custom-strategies)

## What are you changing in this pull request and why?

This addresses the "**For end users**" portion of
#1761.

The feature request in dbt-labs/dbt-core#5245
describes the value proposition as well as the previous and new
behavior:

#### Functional Requirement
- Advanced users that wish to specify a custom incremental strategy must
be able to do so.

#### Previous behavior
- Advanced dbt users who wished to specify a custom incremental strategy
must override the same boilerplate Jinja macro by copy pasting it into
their dbt project.

#### New behavior
- Advanced dbt users who wish to specify a custom incremental strategy
will only need to create a macro that conforms to the naming convention
`get_incremental_NAME_sql` that produces the correct SQL for the target
warehouse.

## Also

To address the questions raised in
dbt-labs/dbt-core#8769, we also want to
document how to utilize custom incremental macros that come from a
package.

For example, to use the `merge_null_safe` custom incremental strategy
from the `example` package, first [install the
package](/build/packages#how-do-i-add-a-package-to-my-project), then add
this macro to your project:

```sql
{% macro get_incremental_merge_null_safe_sql(arg_dict) %}
    {% do return(example.get_incremental_merge_null_safe_sql(arg_dict)) %}
{% endmacro %}
```

## 🎩 

<img width="503" alt="image"
src="https://github.com/dbt-labs/docs.getdbt.com/assets/44704949/51c3266e-e3fb-49bd-9428-7c43920a5412">

## Checklist
- [x] Review the [Content style
guide](https://github.com/dbt-labs/docs.getdbt.com/blob/current/contributing/content-style-guide.md)
so my content adheres to these guidelines.
- [x] For [docs
versioning](https://github.com/dbt-labs/docs.getdbt.com/blob/current/contributing/single-sourcing-content.md#about-versioning),
review how to [version a whole
page](https://github.com/dbt-labs/docs.getdbt.com/blob/current/contributing/single-sourcing-content.md#adding-a-new-version)
and [version a block of
content](https://github.com/dbt-labs/docs.getdbt.com/blob/current/contributing/single-sourcing-content.md#versioning-blocks-of-content).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request Refinement Maintainer input needed
Projects
None yet
Development

No branches or pull requests

2 participants