Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Beta documentation for Snowflake Iceberg tables #6162

Merged
merged 43 commits into from
Oct 2, 2024

Conversation

matthewshaver
Copy link
Contributor

What are you changing in this pull request and why?

This PR accomplishes a few things:

  • Adds the Iceberg table format beta to the Snowflake configuration reference
  • Moves all of the table definitions to the top of the doc so they're closer together.

Checklist

  • I have reviewed the Content style guide so my content adheres to these guidelines.
  • The topic I'm writing about is for specific dbt version(s) and I have versioned it according to the version a whole page and/or version a block of content guidelines.
  • I have added checklist item(s) to this list for anything anything that needs to happen before this PR is merged, such as "needs technical review" or "change base branch."

@matthewshaver matthewshaver requested review from dataders and a team as code owners September 27, 2024 17:26
Copy link

vercel bot commented Sep 27, 2024

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name Status Preview Updated (UTC)
docs-getdbt-com ✅ Ready (Inspect) Visit Preview Oct 2, 2024 2:26pm

@github-actions github-actions bot added content Improvements or additions to content size: large This change will more than a week to address and might require more than one person Docs team Authored by the Docs team @dbt Labs labels Sep 27, 2024

- You cannot create transient or temporary Iceberg tables on Snowflake.
- Supplying an input to `base_location_subpath` will always be apprehended to your schema name. Currently, you cannot override this behavior, which ensures that dbt can differentiate Iceberg model builds based on the environment.
- Snowflake has limitations for Dynamic Tables. Check out the [Snowflake docs]( https://docs.snowflake.com/en/sql-reference/sql/create-iceberg-table-snowflake#usage-notes) for more information. By default, we recommend leaving the `base_location_subpath` field blank, as each target has it’s own default path.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- Snowflake has limitations for Dynamic Tables. Check out the [Snowflake docs]( https://docs.snowflake.com/en/sql-reference/sql/create-iceberg-table-snowflake#usage-notes) for more information. By default, we recommend leaving the `base_location_subpath` field blank, as each target has it’s own default path.
- By default, we recommend leaving the `base_location_subpath` field blank, as each target has it’s own default path.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm debating keeping this in:

Snowflake has limitations for Dynamic Tables. Check out the Snowflake docs for more information.

On one hand - this is useful to see but this just seems a little random. Maybe reword it to call out that Snowflake has limitations on Iceberg Table Format Dynamic Tables?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know this bullet caveating usage of Dynamic Iceberg tables points to the generic Create Iceberg Table usage notes? what's the point being made here?

also, what does this mean?

By default, we recommend leaving the base_location_subpath field blank, as each target has it’s own default path.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Yeah let's just remove the bullet about the Dynamic table limitations. Users can google it themselves
  2. And in terms fo base_location_subpath. That's a field that is optional and I think most users will leave it blank. @dataders I'm not quite sure what your confusion is - could you expand?

Copy link
Contributor

@dataders dataders left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There needs to be an entire section devoted to base_location and how it works.

Specifically that

  • Snowflake requires a base_location
  • dbt-snowflake does not allow you to define base_location
  • Default behavior is that Snowflake will be given a base_location string that follows this convention: _dbt/{SCHEMA_NAME}/{MODEL_NAME} (see source)
  • however, users are able configure a base_location_subpath that by default is empty, but, when provided, will be concatenated to the end of the above described pattern for base_location string generation.

Arguably more important than the description of adapter behavior, is the rationale behind this and hinting at emerging best-practice. here's something I just wrote and likely needs refinement 👁️ @amychen1776

With Snowflake-managed Iceberg format tables, you assume ownership of the data storage of theses tables. Accordingly some attention should be paid to where the EXTERNAL VOLUME the Iceberg tables are written. This is the purpose of the required base_locationparameter. The Snowflake Iceberg catalog will keep track of your Iceberg tables regardless of where the data lives within theEXTERNAL VOLUMEand whatbase_locationyou provide. So in theory you can pass anything to thebase_location parameter including passing an empty string (''`) for all of your Snowflake-managed Iceberg tables.

However, doing so is only of use if you never plan to be able to:

  • jump into and navigate the underlying object store (S3 / Azure blob)
  • read the Iceberg tables via an object-store integration
  • grant schema-specific access to tables via object store
  • use a crawler pointed at the tables within the external volume to build a new catalog with another tool

Accordingly, dbt-snowflake does not support arbitrary definition of base_location for Iceberg tables. Instead dbt by default write your tables within a _dbt/{SCHEMA_NAME} prefix to ensure easier object-store observability and auditability.


There are some limitations to the implementation you need to be aware of:

- You cannot create transient or temporary Iceberg tables on Snowflake.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why does this matter to end users? perhaps this is a better phrasing?

Suggested change
- You cannot create transient or temporary Iceberg tables on Snowflake.
- Using Iceberg tables with dbt, the end result is that your query is materialized in Iceberg. However, as has always been the case in dbt, often intermediary objects such as temporary and transient tables are made at materialization time. It is not possible to configure these temporary objects to also be Iceberg-formatted.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@amychen1776 Anything to add here?

Copy link
Collaborator

@amychen1776 amychen1776 Sep 30, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's tweak this sentence a little bit:

However, as has always been the case in dbt, often intermediary objects such as temporary and transient tables are made at materialization time.

To:

However, often times dbt creates intermediary objects as temporary and transient tables for certain materializations like incremental. It is not possible to configure these temporary objects also to be Iceberg-formatted. You may see non-Iceberg tables created in the logs to support specific materializations but they will be dropped after usage.


- You cannot create transient or temporary Iceberg tables on Snowflake.
- Supplying an input to `base_location_subpath` will always be apprehended to your schema name. Currently, you cannot override this behavior, which ensures that dbt can differentiate Iceberg model builds based on the environment.
- Snowflake has limitations for Dynamic Tables. Check out the [Snowflake docs]( https://docs.snowflake.com/en/sql-reference/sql/create-iceberg-table-snowflake#usage-notes) for more information. By default, we recommend leaving the `base_location_subpath` field blank, as each target has it’s own default path.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know this bullet caveating usage of Dynamic Iceberg tables points to the generic Create Iceberg Table usage notes? what's the point being made here?

also, what does this mean?

By default, we recommend leaving the base_location_subpath field blank, as each target has it’s own default path.

Copy link
Contributor

@dataders dataders left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

smol things but I think we're there!

@matthewshaver matthewshaver merged commit 7eaf7d2 into current Oct 2, 2024
6 checks passed
@matthewshaver matthewshaver deleted the iceberg-snowflake branch October 2, 2024 14:28
@matthewshaver
Copy link
Contributor Author

Approval from @amychen1776 to move forward with merge

matthewshaver added a commit that referenced this pull request Oct 2, 2024
## What are you changing in this pull request and why?

See
#6162 (comment)

## Checklist
- [ ] I have reviewed the [Content style
guide](https://github.com/dbt-labs/docs.getdbt.com/blob/current/contributing/content-style-guide.md)
so my content adheres to these guidelines.
- [ ] The topic I'm writing about is for specific dbt version(s) and I
have versioned it according to the [version a whole
page](https://github.com/dbt-labs/docs.getdbt.com/blob/current/contributing/single-sourcing-content.md#adding-a-new-version)
and/or [version a block of
content](https://github.com/dbt-labs/docs.getdbt.com/blob/current/contributing/single-sourcing-content.md#versioning-blocks-of-content)
guidelines.
- [ ] I have added checklist item(s) to this list for anything anything
that needs to happen before this PR is merged, such as "needs technical
review" or "change base branch."
<!--
PRE-RELEASE VERSION OF dbt (if so, uncomment):
- [ ] Add a note to the prerelease version [Migration
Guide](https://github.com/dbt-labs/docs.getdbt.com/tree/current/website/docs/docs/dbt-versions/core-upgrade)
-->
<!-- 
ADDING OR REMOVING PAGES (if so, uncomment):
- [ ] Add/remove page in `website/sidebars.js`
- [ ] Provide a unique filename for new pages
- [ ] Add an entry for deleted pages in `website/vercel.json`
- [ ] Run link testing locally with `npm run build` to update the links
that point to deleted pages
-->

---------

Co-authored-by: Amy Chen <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
content Improvements or additions to content Docs team Authored by the Docs team @dbt Labs size: large This change will more than a week to address and might require more than one person
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants