-
Notifications
You must be signed in to change notification settings - Fork 178
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ADAP-910] Support Metadata Freshness #785
Comments
Are external tables being tracked as part of this implementation? I think for external tables the So for example if an external table has stale data, an The |
@sp-tkerlavage Thanks for your insight on this. Our current intention with Snowflake is to implement the metadata option using |
Hi @sp-tkerlavage thanks for raising this! This won't be in-scope for 1.7, but I've opened a new feature request to track this and added it to our Applied State epic. Would you be able to add an example query you might use to determine source freshness, so we make sure we solve your use case? |
In Snowflake an external stage can have a directory which essentially stores a list of files in the external storage (S3 Bucket, etc) along with other useful information like relative_path, url, size, MD5, and last_modified. This directory table can be automatically refreshed when files are added via SQS The directory is not a standalone object, its just a layer on top of a stage. It can be queried like this:
However, you would have to parse the relative path to extract the name of the source table. So if for example you have a bucket that is partitioned like this:
You could then just group by the source_database and source_table to get the max last_modified for each and do the freshness comparison. Obviously different S3 directory structures would require different parsing. I actually have an incremental model for this directory parsing logic in my own project, so that may be a cleaner approach. However, I'm not sure if ref'ing a model in a freshness test is even possible. And even if it is, the model would have to be ran first. But again, I'm not entirely clear on the parsing order so it may not even be possible to ref a model in a freshness test on the face of it. |
Minor concern: |
Ah, good to know. @Fleid, in the context of dbt-managed objects, how large of an issue do you see this as? Is this something worth addressing with some level of urgency? |
This bug has been opened as a separate issue: #899. This issue will remain closed and conversation on it may not necessarily be seen due to status. Please move any new conversation over to the new issue. |
Describe the feature
Support metadata-based freshness by implementing the new macro and feature flag described in #8704.
Who will this benefit?
Everyone who wants faster freshness results from Snowflake!
The text was updated successfully, but these errors were encountered: