Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve Table Materialization to Minimize Downtime and Handle Contract Breakage #189

Closed
ThomsenS opened this issue Jun 6, 2024 · 1 comment

Comments

@ThomsenS
Copy link

ThomsenS commented Jun 6, 2024

Current behavior:

dbt-fabric materializes tables by first renaming the destination table by prefixing __dbt_backup. It then creates the new table using a CTAS statement or a CREATE TABLE followed by an INSERT INTO statement, if contract is being enforced.

Observed disadvantages:

  1. Contract breakage: When a change breaks the contract, the backup table is not being renamed to the original name.
  2. Downtime: The table is unavailable while the model is being rebuild. This is not a problem for small models, but for large models this means significant downtime in which any reports or applications using the model are going to fail.

Proposed change:
Change the materialization macro to build the model in a temporary table first.
Once the build is complete, swap the destination table with the temporary table.
This approach minimizes downtime to split seconds and avoids situations where the build fails and the destination table remains renamed.

Example Implementation:
My knowledge of dbt is limited, so I am sure that below example is far from best practice.
I only included it for illustration purposes.

´´´
{% macro fabric__create_table_as(temporary, relation, sql) -%}

{% set tmp_relation = relation.incorporate(path={"identifier": relation.identifier ~ '_temp_view'}, type='view')-%}
{% do adapter.drop_relation(tmp_relation)%}
{{ get_create_view_as_sql(tmp_relation, sql) }}


{% set contract_config = config.get('contract') %}

{% if contract_config.enforced %}

    DROP TABLE IF EXISTS [{{relation.database}}].[{{relation.schema}}].[{{relation.identifier}}_temp];
    CREATE TABLE [{{relation.database}}].[{{relation.schema}}].[{{relation.identifier}}_temp]
    {{ fabric__build_columns_constraints(relation) }}
    {{ get_assert_columns_equivalent(sql)  }}

    {% set listColumns %}
        {% for column in model['columns'] %}
            {{ "["~column~"]" }}{{ ", " if not loop.last }}
        {% endfor %}
    {%endset%}

    INSERT INTO [{{relation.database}}].[{{relation.schema}}].[{{relation.identifier}}_temp]
    ({{listColumns}}) SELECT {{listColumns}} FROM [{{tmp_relation.database}}].[{{tmp_relation.schema}}].[{{tmp_relation.identifier}}];

    DROP TABLE IF EXISTS [{{relation.database}}].[{{relation.schema}}].[{{relation.identifier}}];

    EXEC sp_rename '[{{relation.database}}].[{{relation.schema}}].[{{relation.identifier}}_temp]', '{{relation.identifier}}';

{%- else %}


  DROP TABLE IF EXISTS [{{relation.database}}].[{{relation.schema}}].[{{relation.identifier}}_temp];
  EXEC('CREATE TABLE [{{relation.database}}].[{{relation.schema}}].[{{relation.identifier}}_temp] AS (SELECT * FROM [{{tmp_relation.database}}].[{{tmp_relation.schema}}].[{{tmp_relation.identifier}}]);');
  
  IF  EXISTS (   SELECT * FROM INFORMATION_SCHEMA.TABLES  where TABLE_TYPE IN ( 'BASE TABLE','VIEW') and TABLE_SCHEMA = '{{relation.schema}}' AND TABLE_NAME ='{{relation.identifier}}'  )
  BEGIN
  DROP TABLE IF EXISTS [{{relation.schema}}].{{relation.identifier}}_temp_old;
  EXEC sp_rename '{{relation.schema}}.{{relation.identifier}}','{{relation.identifier}}_temp_old';
  END

  EXEC sp_rename '{{relation.schema}}.{{relation.identifier}}_temp', '{{relation.identifier}}';

  IF EXISTS ( SELECT 1 FROM INFORMATION_SCHEMA.TABLES WHERE TABLE_TYPE ='BASE TABLE' AND TABLE_NAME = '{{relation.identifier}}_temp_old' AND TABLE_SCHEMA = '{{relation.schema}}' )
  BEGIN 
  DROP TABLE [{{relation.schema}}].{{relation.identifier}}_temp_old;
  END

  IF EXISTS (SELECT 1 FROM INFORMATION_SCHEMA.VIEWS WHERE TABLE_NAME = '{{relation.identifier}}_temp_old' AND TABLE_SCHEMA = '{{relation.schema}}' )
  BEGIN
  DROP VIEW [{{relation.schema}}].{{relation.identifier}}_temp_old;
  END
{% endif %}

{% do adapter.drop_relation(tmp_relation)%}

{% endmacro %}
´´´

@prdpsvs
Copy link
Collaborator

prdpsvs commented Jun 27, 2024

Addressed in PR #192

@prdpsvs prdpsvs closed this as completed Jun 27, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants