-
Notifications
You must be signed in to change notification settings - Fork 984
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Beta documentation for Snowflake Iceberg tables #6162
Conversation
The latest updates on your projects. Learn more about Vercel for Git ↗︎
|
|
||
- You cannot create transient or temporary Iceberg tables on Snowflake. | ||
- Supplying an input to `base_location_subpath` will always be apprehended to your schema name. Currently, you cannot override this behavior, which ensures that dbt can differentiate Iceberg model builds based on the environment. | ||
- Snowflake has limitations for Dynamic Tables. Check out the [Snowflake docs]( https://docs.snowflake.com/en/sql-reference/sql/create-iceberg-table-snowflake#usage-notes) for more information. By default, we recommend leaving the `base_location_subpath` field blank, as each target has it’s own default path. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Snowflake has limitations for Dynamic Tables. Check out the [Snowflake docs]( https://docs.snowflake.com/en/sql-reference/sql/create-iceberg-table-snowflake#usage-notes) for more information. By default, we recommend leaving the `base_location_subpath` field blank, as each target has it’s own default path. | |
- By default, we recommend leaving the `base_location_subpath` field blank, as each target has it’s own default path. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm debating keeping this in:
Snowflake has limitations for Dynamic Tables. Check out the Snowflake docs for more information.
On one hand - this is useful to see but this just seems a little random. Maybe reword it to call out that Snowflake has limitations on Iceberg Table Format Dynamic Tables?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know this bullet caveating usage of Dynamic Iceberg tables points to the generic Create Iceberg Table usage notes? what's the point being made here?
also, what does this mean?
By default, we recommend leaving the
base_location_subpath
field blank, as each target has it’s own default path.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Yeah let's just remove the bullet about the Dynamic table limitations. Users can google it themselves
- And in terms fo base_location_subpath. That's a field that is optional and I think most users will leave it blank. @dataders I'm not quite sure what your confusion is - could you expand?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There needs to be an entire section devoted to base_location
and how it works.
Specifically that
- Snowflake requires a
base_location
- dbt-snowflake does not allow you to define
base_location
- Default behavior is that Snowflake will be given a
base_location
string that follows this convention:_dbt/{SCHEMA_NAME}/{MODEL_NAME}
(see source) - however, users are able configure a
base_location_subpath
that by default is empty, but, when provided, will be concatenated to the end of the above described pattern forbase_location
string generation.
Arguably more important than the description of adapter behavior, is the rationale behind this and hinting at emerging best-practice. here's something I just wrote and likely needs refinement 👁️ @amychen1776
With Snowflake-managed Iceberg format tables, you assume ownership of the data storage of theses tables. Accordingly some attention should be paid to where the
EXTERNAL VOLUME
the Iceberg tables are written. This is the purpose of the required
base_locationparameter. The Snowflake Iceberg catalog will keep track of your Iceberg tables regardless of where the data lives within the
EXTERNAL VOLUMEand what
base_locationyou provide. So in theory you can pass anything to the
base_locationparameter including passing an empty string (
''`) for all of your Snowflake-managed Iceberg tables.However, doing so is only of use if you never plan to be able to:
- jump into and navigate the underlying object store (S3 / Azure blob)
- read the Iceberg tables via an object-store integration
- grant schema-specific access to tables via object store
- use a crawler pointed at the tables within the external volume to build a new catalog with another tool
Accordingly, dbt-snowflake does not support arbitrary definition of
base_location
for Iceberg tables. Instead dbt by default write your tables within a_dbt/{SCHEMA_NAME}
prefix to ensure easier object-store observability and auditability.
|
||
There are some limitations to the implementation you need to be aware of: | ||
|
||
- You cannot create transient or temporary Iceberg tables on Snowflake. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why does this matter to end users? perhaps this is a better phrasing?
- You cannot create transient or temporary Iceberg tables on Snowflake. | |
- Using Iceberg tables with dbt, the end result is that your query is materialized in Iceberg. However, as has always been the case in dbt, often intermediary objects such as temporary and transient tables are made at materialization time. It is not possible to configure these temporary objects to also be Iceberg-formatted. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@amychen1776 Anything to add here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's tweak this sentence a little bit:
However, as has always been the case in dbt, often intermediary objects such as temporary and transient tables are made at materialization time.
To:
However, often times dbt creates intermediary objects as temporary and transient tables for certain materializations like incremental. It is not possible to configure these temporary objects also to be Iceberg-formatted. You may see non-Iceberg tables created in the logs to support specific materializations but they will be dropped after usage.
|
||
- You cannot create transient or temporary Iceberg tables on Snowflake. | ||
- Supplying an input to `base_location_subpath` will always be apprehended to your schema name. Currently, you cannot override this behavior, which ensures that dbt can differentiate Iceberg model builds based on the environment. | ||
- Snowflake has limitations for Dynamic Tables. Check out the [Snowflake docs]( https://docs.snowflake.com/en/sql-reference/sql/create-iceberg-table-snowflake#usage-notes) for more information. By default, we recommend leaving the `base_location_subpath` field blank, as each target has it’s own default path. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know this bullet caveating usage of Dynamic Iceberg tables points to the generic Create Iceberg Table usage notes? what's the point being made here?
also, what does this mean?
By default, we recommend leaving the
base_location_subpath
field blank, as each target has it’s own default path.
Co-authored-by: Mirna Wong <[email protected]> Co-authored-by: Anders <[email protected]>
Co-authored-by: Mirna Wong <[email protected]>
Co-authored-by: Anders <[email protected]>
Editorial changes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
smol things but I think we're there!
Co-authored-by: Anders <[email protected]>
Co-authored-by: Anders <[email protected]>
Approval from @amychen1776 to move forward with merge |
## What are you changing in this pull request and why? See #6162 (comment) ## Checklist - [ ] I have reviewed the [Content style guide](https://github.com/dbt-labs/docs.getdbt.com/blob/current/contributing/content-style-guide.md) so my content adheres to these guidelines. - [ ] The topic I'm writing about is for specific dbt version(s) and I have versioned it according to the [version a whole page](https://github.com/dbt-labs/docs.getdbt.com/blob/current/contributing/single-sourcing-content.md#adding-a-new-version) and/or [version a block of content](https://github.com/dbt-labs/docs.getdbt.com/blob/current/contributing/single-sourcing-content.md#versioning-blocks-of-content) guidelines. - [ ] I have added checklist item(s) to this list for anything anything that needs to happen before this PR is merged, such as "needs technical review" or "change base branch." <!-- PRE-RELEASE VERSION OF dbt (if so, uncomment): - [ ] Add a note to the prerelease version [Migration Guide](https://github.com/dbt-labs/docs.getdbt.com/tree/current/website/docs/docs/dbt-versions/core-upgrade) --> <!-- ADDING OR REMOVING PAGES (if so, uncomment): - [ ] Add/remove page in `website/sidebars.js` - [ ] Provide a unique filename for new pages - [ ] Add an entry for deleted pages in `website/vercel.json` - [ ] Run link testing locally with `npm run build` to update the links that point to deleted pages --> --------- Co-authored-by: Amy Chen <[email protected]>
What are you changing in this pull request and why?
This PR accomplishes a few things:
Checklist