-
Notifications
You must be signed in to change notification settings - Fork 181
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ADAP-633] [Feature] Support Hybrid tables #668
Comments
I agree this would be awesome to support. Do you have any materialization as to why node type it should be? Like you said, provide that Dynamic Tables support the standard DDL, the engineering effort to support them is low and this could in theory be supported in my questions:
If someone is interested to help out with this, I'd suggest a good starting point is to make a custom |
This issue has been marked as Stale because it has been open for 180 days with no activity. If you would like the issue to remain open, please comment on the issue or else it will be closed in 7 days. |
Commenting to remove the stale tag! I think it would still be useful. |
Bumping this as hybrid tables are now in public preview, and this is something we want. This will involve a little more elbow grease for the DDL statement than with transient vs permanent tables. |
Bumping this again as hybrid tables are a perfect use case for our DBT snapshotting processes. We also want the ability to enforce primary key and foreign key constraints, and this should in theory be very low lift to add in DBT. |
Usecase: Materialization:
thank you and looking forward to this |
Do y'all have someone to take on this development? I wrote a custom materialization to make use of hybrid tables, I can refactor the code I have and I would be more than happy to take on the development of this. |
+1 bump |
@dakota-kelley-phData is the materializatization code publically available? |
@jtmcn, I don't know if I do. Let me do some general clean up on it and get it in a repo where others can use it. I'll tackle that this weekend and get it shared here for anyone to use 😄 |
Bumping! Would love this feature |
@dakota-kelley-phData I'll likely be writing our own implementation of this soon. Any code you can share would be helpful. Even if it's not cleaned-up yet. |
Bumb bumb bump! Would love to have this feature. Also happy to contribute if some developer can guide me a bit. :) |
As Hybrid Tables can be created with CTAS it should be quite straightforward for someone to create a custom materialization for it. This new materialization could be based of (e.g. copied from and then modified) the |
yes, I'd use this feature. |
I have come up with something that seems to be working pretty decently obviously there would still need to be a lot of testing and I ran into some oddities with dbt trying to combine my schemas together so I wrote a marco to tell dbt to stop being smart. This is a very much rough draft, but this does seem to work pretty decently so far if you have questions or adjustment I might be able to make this repo public. Either way I hope this helps someone down the line custom_macro.sql{% macro generate_schema_name(custom_schema_name, node) -%}
{{ custom_schema_name | trim }}
{%- endmacro %} schema.ymlversion: 2
models:
- name: pre_aggregated_events
description: "Pre-aggregated impression events for livestreams and ad campaigns using hybrid table"
config:
materialized: hybrid_table
schema: SILVER # Optional: Specify a custom schema
database: HYBRID_DB # Optional: Specify a custom database
force_ctas: false # Optional: Set to true when you want to force CTAS
column_definitions:
CUSTOMER_ID: VARCHAR(16777216) NOT NULL
AD_CAMPAIGN_ID: VARCHAR(16777216) NOT NULL
COUNT_IMPRESSIONS: NUMBER(38,0)
TOTAL_WATCH_TIME: FLOAT
INSERT_RAW_TIMESTAMP: TIMESTAMP_NTZ(9)
INSERT_HYBRID_TIMESTAMP: TIMESTAMP_NTZ(9)
UPDATED_AT: TIMESTAMP_NTZ(9)
IS_UPDATED: BOOLEAN
primary_key: LIVESTREAM_ID, AD_CAMPAIGN_ID
indexes:
- name: IDX_CUSTOMER
columns: [CUSTOMER_ID]
- name: IDX_AD_CAMPAIGN
columns: [AD_CAMPAIGN_ID] SQL CALL{{ config(materialized='hybrid_table') }}
{% set target_relation = this %}
{% set existing_relation = load_relation(this) %}
{% if existing_relation is none %}
-- Table doesn't exist, use CTAS
WITH activity_data AS (
SELECT
CUSTOMER_ID,
AD_CAMPAIGN_ID,
COUNT(*) AS COUNT_IMPRESSIONS,
SUM(WATCH_TIME_SECS) AS TOTAL_WATCH_TIME,
MAX(INSERT_RAW_TIMESTAMP) AS INSERT_RAW_TIMESTAMP
FROM {{ ref('stg_activities') }}
WHERE EVENT_NAME = 'impression' AND STREAM_ACTION != 'DELETE'
GROUP BY CUSTOMER_ID, AD_CAMPAIGN_ID
)
SELECT
ad.CUSTOMER_ID,
ad.AD_CAMPAIGN_ID,
ad.COUNT_IMPRESSIONS,
ad.TOTAL_WATCH_TIME,
ad.INSERT_RAW_TIMESTAMP,
CURRENT_TIMESTAMP() AS INSERT_HYBRID_TIMESTAMP,
CURRENT_TIMESTAMP() AS UPDATED_AT,
FALSE AS IS_UPDATED
FROM activity_data ad
{% else %}
-- Table exists, use MERGE
MERGE INTO {{ target_relation }} t
USING (
SELECT
CUSTOMER_ID,
AD_CAMPAIGN_ID,
COUNT(*) AS COUNT_IMPRESSIONS,
SUM(WATCH_TIME_SECS) AS TOTAL_WATCH_TIME,
MAX(INSERT_RAW_TIMESTAMP) AS INSERT_RAW_TIMESTAMP,
CURRENT_TIMESTAMP() AS INSERT_HYBRID_TIMESTAMP
FROM {{ ref('stg_activities') }}
WHERE EVENT_NAME = 'impression' AND STREAM_ACTION != 'DELETE'
GROUP BY CUSTOMER_ID, AD_CAMPAIGN_ID
) s
ON t.CUSTOMER_ID = s.CUSTOMER_ID AND t.AD_CAMPAIGN_ID = s.AD_CAMPAIGN_ID
WHEN MATCHED THEN
UPDATE SET
t.COUNT_IMPRESSIONS = t.COUNT_IMPRESSIONS + s.COUNT_IMPRESSIONS,
t.TOTAL_WATCH_TIME = t.TOTAL_WATCH_TIME + s.TOTAL_WATCH_TIME,
t.INSERT_RAW_TIMESTAMP = GREATEST(t.INSERT_RAW_TIMESTAMP, s.INSERT_RAW_TIMESTAMP),
t.INSERT_HYBRID_TIMESTAMP = s.INSERT_HYBRID_TIMESTAMP,
t.UPDATED_AT = s.INSERT_HYBRID_TIMESTAMP,
t.IS_UPDATED = TRUE
WHEN NOT MATCHED THEN
INSERT (CUSTOMER_ID, AD_CAMPAIGN_ID, COUNT_IMPRESSIONS, TOTAL_WATCH_TIME, INSERT_RAW_TIMESTAMP, INSERT_HYBRID_TIMESTAMP, UPDATED_AT, IS_UPDATED)
VALUES (s.CUSTOMER_ID, s.AD_CAMPAIGN_ID, s.COUNT_IMPRESSIONS, s.TOTAL_WATCH_TIME, s.INSERT_RAW_TIMESTAMP, s.INSERT_HYBRID_TIMESTAMP, s.INSERT_HYBRID_TIMESTAMP, FALSE)
{% endif %}
HYBRID TABLES CUSTOM MATERIALIZATION{% materialization hybrid_table, adapter='snowflake' %}
{%- set config = model['config'] -%}
{% do log("Hybrid table materialization for model: " ~ model.name, info=True) %}
{% do log("Config: " ~ config, info=True) %}
{% do log("Target schema: " ~ target.schema, info=True) %}
{% do log("Config schema: " ~ config.get('schema'), info=True) %}
{% do log("Model schema: " ~ model.schema, info=True) %}
{%- set target_schema = config.get('schema', target.schema) -%}
{% do log("Final target schema: " ~ target_schema, info=True) %}
{% set target_relation = api.Relation.create(
database=target.database,
schema=config.get('schema', target.schema),
identifier=model.alias
) %}
{% do log("Target relation: " ~ target_relation, info=True) %}
{%- set existing_relation = adapter.get_relation(
database=target_relation.database,
schema=target_relation.schema,
identifier=target_relation.identifier) -%}
{% do log("Existing relation: " ~ existing_relation, info=True) %}
{%- set column_definitions = config.get('column_definitions', {}) -%}
{%- set primary_key = config.get('primary_key', []) -%}
{%- set primary_key = primary_key if primary_key is string else (primary_key | join(', ')) -%}
{%- set indexes = config.get('indexes', []) -%}
{%- set force_ctas = config.get('force_ctas', false) -%}
-- Run pre-hooks
{{ run_hooks(pre_hooks) }}
{% if existing_relation is none or force_ctas %}
-- Table doesn't exist or force_ctas is true, use CTAS
{% call statement('main') -%}
CREATE OR REPLACE HYBRID TABLE {{ target_relation }} (
{% for column, definition in column_definitions.items() %}
{{ column }} {{ definition }}{% if not loop.last %},{% endif %}
{% endfor %}
{% if primary_key %},
PRIMARY KEY ({{ primary_key }})
{% endif %}
{% for index in indexes %}
, INDEX {{ index.name }}({{ index.columns | join(', ') }})
{% endfor %}
) AS (
{{ sql }}
)
{%- endcall %}
{% else %}
-- Table exists, use MERGE
{% call statement('main') -%}
{{ sql }}
{%- endcall %}
{% endif %}
-- Run post-hooks
{{ run_hooks(post_hooks) }}
-- Return the relations created in this materialization
{{ return({'relations': [target_relation]}) }}
{% endmaterialization %} This application is not part of the Snowflake Service and is governed by the terms in LICENSE, unless expressly agreed to in writing. You use this application at your own risk, and Snowflake has no obligation to support your use of this application. |
It'd be great if dbt had native support for Snowflake's hybrid tables. |
Looking forward to have support for Hybrid tables! Currently, my solution is replicate the final layer into Hybrid tables. |
@azouari have you tried the custom materialization above this should allow you to use hybrid tables effectively until this feature is fully supported by DBT. |
Is this your first time submitting a feature request?
Describe the feature
We could add support for Unistore/Hybrid tables by allowing users to materialize hybrid tables from dbt.
I couldn't find an official docs page for the feature but there is a blog post talking about it.
The syntax examples shown are
CREATE HYBRID TABLE xxx
orCREATE OR REPLACE HYBRID TABLE xxx
but I don't know if it supportsCREATE HYBRID TABLE AS ()
or not.Describe alternatives you've considered
People could create their own materialization or update some of the core macros to allow creating those tables.
Who will this benefit?
Snowflake customers who would like to use the Unistore capabilities.
Are you interested in contributing this feature?
No response
Anything else?
No response
The text was updated successfully, but these errors were encountered: