diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md index 002b24959..0b251dae9 100644 --- a/CONTRIBUTING.md +++ b/CONTRIBUTING.md @@ -1,16 +1,16 @@ ## We'd love to hear from you -This dbtvault package is very much a work in progress – we’ll up the version number to 1.0 when we’re satisfied it -works out in the wild. +dbtvault is very much a work in progress – we’re constantly adding quality of life improvements and will be adding +new table types regularly. We know that it deserves new features, that the code base can be tidied up and the SQL better tuned. -Rest assured we’re working on it for future releases – our roadmap contains information on what’s coming. -If you spot anything you’d like to bring to our attention, have a request for new features, -have spotted an improvement we could make, or want to tell us about a typo, then please don’t hesitate to let us know -by submitting an issue using the below guidelines +Rest assured we’re working on it for future releases – [our roadmap contains information on what’s coming](roadmap.md). + +If you spot anything you’d like to bring to our attention, have a request for new features, have spotted an improvement we could make, +or want to tell us about a typo or bug, then please don’t hesitate to let us know via [Github](https://github.com/Datavault-UK/dbtvault/issues). -We’d rather know you are making active use of this package than hearing nothing from all of you out there! +We’d rather know you are making active use of this package than hearing nothing from all of you out there! Happy Data Vaulting! @@ -20,6 +20,7 @@ Happy Data Vaulting! We've tested the package rigorously, but if you think you've found a bug please provide the following at a minimum (or use the issue templates) so we can fix it as quickly as possible: +- The version of dbt being used - The version of dbtvault being used. - Steps to reproduce the issue - Any error messages or dbt log files which can give more detail of the problem diff --git a/README.md b/README.md index e01f8d300..3f28b699d 100644 --- a/README.md +++ b/README.md @@ -12,7 +12,7 @@ latest [![Documentation Status](https://readthedocs.org/projects/dbtvault/badge/?version=latest)](https://dbtvault.readthedocs.io/en/latest/?badge=latest) -stable [![Documentation Status](https://readthedocs.org/projects/dbtvault/badge/?version=v0.3.3-pre)](https://dbtvault.readthedocs.io/en/v0.3.3-pre/?badge=v0.3.3-pre) +stable [![Documentation Status](https://readthedocs.org/projects/dbtvault/badge/?version=v0.4)](https://dbtvault.readthedocs.io/en/v0.4/?badge=v0.4) [past docs versions](https://dbtvault.readthedocs.io/en/latest/changelog/) @@ -43,6 +43,7 @@ Get started quickly with our worked example: ## Installation +Ensure you are using dbt 0.14 (0.15 support will be added soon!) Add the following to your ```packages.yml``` @@ -50,12 +51,12 @@ Add the following to your ```packages.yml``` packages: - git: "https://github.com/Datavault-UK/dbtvault" - revision: v0.3.3-pre # Latest stable version + revision: v0.4 # Latest stable version ``` And run ```dbt deps``` -[Read more on package installation](https://docs.getdbt.com/docs/package-management) +[Read more on package installation](https://docs.getdbt.com/v0.14.0/docs/package-management) ## Usage @@ -84,4 +85,4 @@ before anyone else! [View our contribution guidelines](CONTRIBUTING.md) ## License -[Apache 2.0](LICENSE.md) +[Apache 2.0](LICENSE.md) \ No newline at end of file diff --git a/dbt_project.yml b/dbt_project.yml index 98037123b..8e00e9e5d 100644 --- a/dbt_project.yml +++ b/dbt_project.yml @@ -1,5 +1,5 @@ name: 'dbtvault' -version: '0.3.3' +version: '0.4' profile: 'dbtvault' @@ -13,3 +13,7 @@ target-path: "target" clean-targets: - "target" - "dbt_modules" + +models: + vars: + hash: MD5 \ No newline at end of file diff --git a/docs/bestpractices.md b/docs/bestpractices.md index 02f5a4b02..b898e91a0 100644 --- a/docs/bestpractices.md +++ b/docs/bestpractices.md @@ -43,8 +43,9 @@ then there is a chance of a clash: where two different values generate the same For this reason, it **should not be** used for cryptographic purposes either. -In future releases of dbtvault, we will allow you to change the algorithm that is used (e.g. to SHA-256) to reduce the -chance of a clash (at the expense of more processing and a larger column), or switch off hashing entirely. +!!! success + + You may now choose between MD5 and SHA-256 in dbtvault, [read below](bestpractices.md#choosing-a-hashing-algorithm-in-dbtvault). ### Why do we hash? @@ -80,4 +81,45 @@ staging tables the sorting functionality for primary keys. - For **links**, columns must be sorted by the primary key of the hub and arranged alphabetically by the hub name. -The order must also be the same as each hub. \ No newline at end of file +The order must also be the same as each hub. + +### Choosing a hashing algorithm in dbtvault + +With the release of dbtvault 0.4, you may now choose between ```MD5``` and ```SHA-256``` hashing. ```SHA-256``` was added +to dbtvault as an option for users who wish to reduce the hashing collision rates in larger data sets. + +!!! note + + If a hashing algorithm configuration is missing or invalid, dbtvault will use ```MD5``` by default. + +Configuring the hashing algorithm which will be used by dbtvault is simple: simply add a variable to your +```dbt_project.yml``` as follows: + +```dbt_project.yml``` +```yaml + +name: 'my_project' +version: '1' + +profile: 'my_project' + +source-paths: ["models"] +analysis-paths: ["analysis"] +test-paths: ["tests"] +data-paths: ["data"] +macro-paths: ["macros"] + +target-path: "target" +clean-targets: + - "target" + - "dbt_modules" + +models: + vars: + hash: SHA # or MD5 +``` + +It is possible to configure a hashing algorithm on a model-by-model basis using the hierarchical structure of the ```yaml``` file. +We recommend you keep the hashing algorithm consistent across all tables, however, as per best practise. + +Read the [dbt documentation](https://docs.getdbt.com/v0.14.0/docs/var) for further information on variable scoping. \ No newline at end of file diff --git a/docs/changelog.md b/docs/changelog.md index e23aad4d4..7ad437f88 100644 --- a/docs/changelog.md +++ b/docs/changelog.md @@ -4,6 +4,33 @@ All notable changes to this project will be documented in this file. The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/), and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html). +## [v0.4] - 2019-11-27 +[![Documentation Status](https://readthedocs.org/projects/dbtvault/badge/?version=v0.4)](https://dbtvault.readthedocs.io/en/v0.4-pre/?badge=v0.4) + +### Added + +- Table Macros: + - [Transactional Links](macros.md#t_link_template) + +### Improved + +- Hashing: + - You may now choose between ```MD5``` and ```SHA-256``` hashing with a simple yaml configuration + [Learn how!](bestpractices.md#choosing-a-hashing-algorithm-in-dbtvault) + +### Worked example + +- Transactional Links + - Added a transactional link model using a simulated transaction feed. + +### Documentation + +- Updated macros, best practices, roadmap, and other pages to account for new features +- Updated worked example documentation +- Replaced all dbt documentation links with links to the 0.14 documentation as dbtvault +is using dbt 0.14 currently (we will be updating to 0.15 soon!) +- Minor corrections + ## [v0.3.3-pre] - 2019-10-31 [![Documentation Status](https://readthedocs.org/projects/dbtvault/badge/?version=v0.3.3-pre)](https://dbtvault.readthedocs.io/en/v0.3.3-pre/?badge=v0.3.3-pre) @@ -131,7 +158,6 @@ the new and improved features. ### Added - - Table Macros: - [Hub](macros.md#hub_template) - [Link](macros.md#link_template) diff --git a/docs/contributing.md b/docs/contributing.md index 3c4667faf..54c04e74b 100644 --- a/docs/contributing.md +++ b/docs/contributing.md @@ -8,7 +8,7 @@ We know that it deserves new features, that the code base can be tidied up and t Rest assured we’re working on it for future releases – [our roadmap contains information on what’s coming](roadmap.md). If you spot anything you’d like to bring to our attention, have a request for new features, have spotted an improvement we could make, -or want to tell us about a typo, then please don’t hesitate to let us know via [Github](https://github.com/Datavault-UK/dbtvault/issues). +or want to tell us about a typo or bug, then please don’t hesitate to let us know via [Github](https://github.com/Datavault-UK/dbtvault/issues). We’d rather know you are making active use of this package than hearing nothing from all of you out there! @@ -20,6 +20,7 @@ Happy Data Vaulting! :smile: We've tested the package rigorously, but if you think you've found a bug please provide the following at a minimum (or use the issue templates) so we can fix it as quickly as possible: +- The version of dbt being used - The version of dbtvault being used. - Steps to reproduce the issue - Any error messages or dbt log files which can give more detail of the problem diff --git a/docs/hubs.md b/docs/hubs.md index ba67b59a9..653f8dffc 100644 --- a/docs/hubs.md +++ b/docs/hubs.md @@ -26,7 +26,7 @@ The following header is what we use, but feel free to customise it to your needs Hubs are always incremental, as we load and add new records to the existing data set. -[Read more about incremental models](https://docs.getdbt.com/docs/configuring-incremental-models) +[Read more about incremental models](https://docs.getdbt.com/v0.14.0/docs/configuring-incremental-models) !!! note "Dont worry!" The [hub_template](macros.md#hub_template) deals with the Data Vault @@ -39,10 +39,10 @@ Let's look at the metadata we need to provide to the [hub_template](macros.md#hu #### Source table The first piece of metadata we need is the source table. This step is easy, as in this example we created the -new staging layer ourselves. All we need to do is provide a reference to the model we created, and dbt will do the rest for us. +staging layer ourselves. All we need to do is provide a reference to the model we created, and dbt will do the rest for us. dbt ensures dependencies are honoured when defining the source using a reference in this way. -[Read more about the ref function](https://docs.getdbt.com/docs/ref) +[Read more about the ref function](https://docs.getdbt.com/v0.14.0/docs/ref) ```hub_customer.sql``` diff --git a/docs/links.md b/docs/links.md index 7827b13cc..58a06b196 100644 --- a/docs/links.md +++ b/docs/links.md @@ -38,7 +38,7 @@ The first piece of metadata we need is the source table. This step is easy, as w staging layer ourselves. All we need to do is provide a reference to the model we created, and dbt will do the rest for us. dbt ensures dependencies are honoured when defining the source using a reference in this way. -[Read more about the ref function](https://docs.getdbt.com/docs/ref) +[Read more about the ref function](https://docs.getdbt.com/v0.14.0/docs/ref) ```link_customer_nation.sql``` diff --git a/docs/loading.md b/docs/loading.md index b343752e4..c297d3561 100644 --- a/docs/loading.md +++ b/docs/loading.md @@ -29,11 +29,11 @@ This will run all models with the hub tag. Links are another fundamental component in a Data Vault. Links model an association or link, between two business keys. They commonly hold business transactions or structural -information. +information. A link specifically contains the structural information. Our links will contain: -1. A primary key. For links, we take the natural keys (prior to hashing) represented by the foreign key columns below +1. A primary key. For links, we take the natural keys (prior to hashing) represented by the foreign key columns and create a hash on a concatenation of them. 2. Foreign keys holding the primary key for each hub referenced in the link (2 or more depending on the number of hubs referenced) @@ -89,6 +89,32 @@ To compile and load the provided satellite models, run the following command: This will run all models with the satellite tag. +## Transactional Links + +Transactional Links are used to model transactions between entities in a Data Vault. + +Links model an association or link, between two business keys. They commonly hold business transactions or structural +information. A transactional link specifically contains the business transactions. + +Our transactional links will contain: + +1. A primary key. For transactional links, we use the transaction number. If this is not already present in the dataset +then we create this by concatenating the foreign keys and hashing them. +2. Foreign keys holding the primary key for each hub referenced in the transactional link (2 or more depending on the number of hubs +referenced) +3. A payload. This will be data about the transaction itself e.g. the amount, type, date or non-hashed transaction number. +4. An ```EFFECTIVE_FROM``` date. This will usually be the date of the transaction. +5. The load date or load date timestamp. +6. The source for the record + +### Loading transactional links + +To compile and load the provided t_link models, run the following command: + +```dbt run --models tag:t_link``` + +This will run all models with the t_link tag. + ## Loading the full system Each of the commands above load a particular type of table, however, we may want to do a full system load. diff --git a/docs/macros.md b/docs/macros.md index 026513f39..3000b13a5 100644 --- a/docs/macros.md +++ b/docs/macros.md @@ -7,10 +7,6 @@ for your Data Vault. ### Metadata notes #### Using a source reference for the target metadata -!!! note - As of release 0.3, you may now use a source reference as a target metadata value, to streamline metadata entry. - Read below! - In the usage examples for the table template macros in this section, you will see ```source``` provided as the values for some of the target metadata variables. ```source``` has been declared as a variable at the top of the models, and holds a reference to the source table we are loading from. This is shorthand for retaining the name and data types @@ -399,6 +395,88 @@ WHERE src.HASHDIFF IS NULL ___ +### t_link_template + +Generates sql to build a transactional link table using the provided metadata. + +```mysql +dbtvault.t_link_template(src_pk, src_fk, src_payload, src_eff, src_ldts, src_source, + tgt_pk, tgt_fk, tgt_payload, tgt_eff, tgt_ldts, tgt_source, + source) +``` + +#### Parameters + +| Parameter | Description | Type | Required? | +| ------------- | --------------------------------------------------- | -------------- | ------------------------------------------------------------------ | +| src_pk | Source primary key column | String | check_circle | +| src_fk | Source foreign key column(s) | List | check_circle | +| src_payload | Source payload column(s) | List | check_circle | +| src_eff | Source effective from column | String | check_circle | +| src_ldts | Source loaddate timestamp column | String | check_circle | +| src_source | Name of the column containing the source ID | String | check_circle | +| tgt_pk | Target primary key column | List/Reference | check_circle | +| tgt_fk | Target hashdiff column | List/Reference | check_circle | +| tgt_payload | Target foreign key column(s) | List/Reference | check_circle | +| tgt_eff | Target effective from column | List/Reference | check_circle | +| tgt_ldts | Target loaddate timestamp column | List/Reference | check_circle | +| tgt_source | Name of the column which will contain the source ID | List/Reference | check_circle | +| source | Staging model reference or table name | List/Reference | check_circle | + +#### Usage + + +``` yaml + +-- t_link_transactions.sql: + +{{- config(...) -}} + +{%- set source = [ref('stg_transactions_hashed')] -%} + +{%- set src_pk = 'TRANSACTION_PK' -%} +{%- set src_fk = ['CUSTOMER_FK', 'ORDER_FK'] -%} +{%- set src_payload = ['TRANSACTION_NUMBER', 'TRANSACTION_DATE', 'TYPE', 'AMOUNT'] -%} +{%- set src_eff = 'EFFECTIVE_FROM' -%} +{%- set src_ldts = 'LOADDATE' -%} +{%- set src_source = 'SOURCE' -%} + +{%- set tgt_pk = source -%} +{%- set tgt_fk = source -%} +{%- set tgt_payload = source -%} +{%- set tgt_eff = source -%} +{%- set tgt_ldts = source -%} +{%- set tgt_source = source -%} + +{{ dbtvault.t_link_template(src_pk, src_fk, src_payload, src_eff, src_ldts, src_source, + tgt_pk, tgt_fk, tgt_payload, tgt_eff, tgt_ldts, tgt_source, + source) }} +``` + +#### Output + +```mysql +SELECT DISTINCT + CAST(stg.TRANSACTION_PK AS BINARY) AS TRANSACTION_PK, + CAST(stg.CUSTOMER_FK AS BINARY) AS CUSTOMER_FK, + CAST(stg.ORDER_FK AS BINARY) AS ORDER_FK, + CAST(stg.TRANSACTION_NUMBER AS NUMBER(38,0)) AS TRANSACTION_NUMBER, + CAST(stg.TRANSACTION_DATE AS DATE) AS TRANSACTION_DATE, + CAST(stg.TYPE AS VARCHAR) AS TYPE, + CAST(stg.AMOUNT AS NUMBER(12,2)) AS AMOUNT, + CAST(stg.EFFECTIVE_FROM AS DATE) AS EFFECTIVE_FROM, + CAST(stg.LOADDATE AS DATE) AS LOADDATE, + CAST(stg.SOURCE AS VARCHAR) AS SOURCE +FROM ( + SELECT stg.TRANSACTION_PK, stg.CUSTOMER_FK, stg.ORDER_FK, stg.TRANSACTION_NUMBER, stg.TRANSACTION_DATE, stg.TYPE, stg.AMOUNT, stg.EFFECTIVE_FROM, stg.LOADDATE, stg.SOURCE + FROM MYDATABASE.MYSCHEMA.stg_transactions_hashed AS stg +) AS stg +LEFT JOIN MYDATABASE.MYSCHEMA.t_link_transactions AS tgt +ON stg.TRANSACTION_PK = tgt.TRANSACTION_PK +WHERE tgt.TRANSACTION_PK IS NULL +``` +___ + ## Staging Macros ######(macros/staging) @@ -410,20 +488,28 @@ ___ !!! warning This macro ***should not be*** used for cryptographic purposes. - The intended use is for creating checksum-like fields only, so that a record change can be detected. + The intended use is for creating checksum-like values only, so that we may compare records accurately. [Read More](https://www.md5online.org/blog/why-md5-is-not-safe/) !!! seealso "See Also" - [hash](#hash) - [Hashing best practises and why we hash](bestpractices.md#hashing) + - With the release of dbtvault 0.4, you may now choose between ```MD5``` and ```SHA-256``` hashing. + [Learn how](bestpractices.md#choosing-a-hashing-algorithm-in-dbtvault) This macro will generate SQL hashing sequences for one or more columns as below: -```sql + +```sql tab='MD5' CAST(MD5_BINARY(UPPER(TRIM(CAST(column1 AS VARCHAR)))) AS BINARY(16)) AS alias1, CAST(MD5_BINARY(UPPER(TRIM(CAST(column2 AS VARCHAR)))) AS BINARY(16)) AS alias2 ``` +```sql tab='SHA' +CAST(SHA2_BINARY(UPPER(TRIM(CAST(column1 AS VARCHAR)))) AS BINARY(32)) AS alias1, +CAST(SHA2_BINARY(UPPER(TRIM(CAST(column2 AS VARCHAR)))) AS BINARY(32)) AS alias2 +``` + #### Parameters | Parameter | Description | Type | Required? | @@ -444,14 +530,24 @@ CAST(MD5_BINARY(UPPER(TRIM(CAST(column2 AS VARCHAR)))) AS BINARY(16)) AS alias2 #### Output -```mysql +```mysql tab='MD5' CAST(MD5_BINARY(UPPER(TRIM(CAST(CUSTOMERKEY AS VARCHAR)))) AS BINARY(16)) AS CUSTOMER_PK, CAST(MD5_BINARY(CONCAT( - IFNULL(UPPER(TRIM(CAST(CUSTOMERKEY AS VARCHAR))), '^^'), '||', - IFNULL(UPPER(TRIM(CAST(DOB AS VARCHAR))), '^^'), '||', - IFNULL(UPPER(TRIM(CAST(NAME AS VARCHAR))), '^^'), '||', - IFNULL(UPPER(TRIM(CAST(PHONE AS VARCHAR))), '^^') )) AS BINARY(16)) AS HASHDIFF + IFNULL(UPPER(TRIM(CAST(CUSTOMERKEY AS VARCHAR))), '^^'), '||', + IFNULL(UPPER(TRIM(CAST(DOB AS VARCHAR))), '^^'), '||', + IFNULL(UPPER(TRIM(CAST(NAME AS VARCHAR))), '^^'), '||', + IFNULL(UPPER(TRIM(CAST(PHONE AS VARCHAR))), '^^') )) AS BINARY(16)) AS HASHDIFF +``` + +```mysql tab='SHA' +CAST(SHA2_BINARY(UPPER(TRIM(CAST(CUSTOMERKEY AS VARCHAR)))) AS BINARY(32)) AS CUSTOMER_PK, + +CAST(SHA2_BINARY(CONCAT( + IFNULL(UPPER(TRIM(CAST(CUSTOMERKEY AS VARCHAR))), '^^'), '||', + IFNULL(UPPER(TRIM(CAST(DOB AS VARCHAR))), '^^'), '||', + IFNULL(UPPER(TRIM(CAST(NAME AS VARCHAR))), '^^'), '||', + IFNULL(UPPER(TRIM(CAST(PHONE AS VARCHAR))), '^^') )) AS BINARY(32)) AS HASHDIFF ``` !!! success "Column sorting" @@ -542,7 +638,7 @@ FROM MYDATABASE.MYSCHEMA.MYTABLE ``` !!! info - Sources need to be set up in dbt to ensure this works. [Read More](https://docs.getdbt.com/docs/using-sources) + Sources need to be set up in dbt to ensure this works. [Read More](https://docs.getdbt.com/v0.14.0/docs/using-sources) #### Parameters @@ -626,19 +722,26 @@ ___ !!! warning This macro ***should not be*** used for cryptographic purposes. - The intended use is for creating checksum-like fields only, so that a record change can be detected. + The intended use is for creating checksum-like values only, so that we may compare records accurately. [Read More](https://www.md5online.org/blog/why-md5-is-not-safe/) !!! seealso "See Also" - [multi-hash](#multi_hash) - [Hashing best practises and why we hash](bestpractices.md#hashing) + - With the release of dbtvault 0.4, you may now choose between ```MD5``` and ```SHA-256``` hashing. + [Learn how](bestpractices.md#choosing-a-hashing-algorithm-in-dbtvault) A macro for generating hashing SQL for columns: -```sql + +```sql tab='MD5' CAST(MD5_BINARY(UPPER(TRIM(CAST(column AS VARCHAR)))) AS BINARY(16)) AS alias ``` +```sql tab='SHA' +CAST(SHA2_BINARY(UPPER(TRIM(CAST(column AS VARCHAR)))) AS BINARY(32)) AS alias +``` + - Can provide multiple columns as a list to create a concatenated hash - Columns are sorted alphabetically (by alias) if you set the ```sort``` flag to true. - Generally, you should alpha sort hashdiffs using the ```sort``` flag. @@ -667,7 +770,7 @@ CAST(MD5_BINARY(UPPER(TRIM(CAST(column AS VARCHAR)))) AS BINARY(16)) AS alias #### Output -```mysql +```mysql tab = 'MD5' CAST(MD5_BINARY(UPPER(TRIM(CAST(CUSTOMERKEY AS VARCHAR)))) AS BINARY(16)) AS CUSTOMER_PK, CAST(MD5_BINARY(CONCAT(IFNULL(UPPER(TRIM(CAST(CUSTOMERKEY AS VARCHAR))), '^^'), '||', IFNULL(UPPER(TRIM(CAST(DOB AS VARCHAR))), '^^'), '||', @@ -676,6 +779,15 @@ CAST(MD5_BINARY(CONCAT(IFNULL(UPPER(TRIM(CAST(CUSTOMERKEY AS VARCHAR))), '^^'), AS BINARY(16)) AS HASHDIFF ``` +```mysql tab='SHA' +CAST(SHA2_BINARY(UPPER(TRIM(CAST(CUSTOMERKEY AS VARCHAR)))) AS BINARY(32)) AS CUSTOMER_PK, +CAST(SHA2_BINARY(CONCAT(IFNULL(UPPER(TRIM(CAST(CUSTOMERKEY AS VARCHAR))), '^^'), '||', + IFNULL(UPPER(TRIM(CAST(DOB AS VARCHAR))), '^^'), '||', + IFNULL(UPPER(TRIM(CAST(NAME AS VARCHAR))), '^^'), '||', + IFNULL(UPPER(TRIM(CAST(PHONE AS VARCHAR))), '^^') )) + AS BINARY(32)) AS HASHDIFF +``` + ___ ### prefix diff --git a/docs/roadmap.md b/docs/roadmap.md index 03827725c..25d14a095 100644 --- a/docs/roadmap.md +++ b/docs/roadmap.md @@ -13,7 +13,7 @@ We will be releasing changes incrementally, so you can reap the benefits as soon These features are currently planned for the near-future. -- Transactional Links (Also known as non-historised links) +- Effectivity satellites ## Future releases @@ -22,9 +22,9 @@ In future releases, we hope to include the following: ### Tables - Multi-active satellites -- Effectivity satellites - Status tracking satellites - Point-in-Time tables (also know as PITs) - Bridge tables - Reference Tables +- Mart loading helpers - And more! \ No newline at end of file diff --git a/docs/satellites.md b/docs/satellites.md index 0cd252e8d..69c651f2e 100644 --- a/docs/satellites.md +++ b/docs/satellites.md @@ -42,7 +42,7 @@ The following header is what we use, but feel free to customise it to your needs Satellites are always incremental, as we load and add new records to the existing data set. -[Read more about incremental models](https://docs.getdbt.com/docs/configuring-incremental-models) +[Read more about incremental models](https://docs.getdbt.com/v0.14.0/docs/configuring-incremental-models) ### Adding the metadata @@ -51,10 +51,10 @@ Let's look at the metadata we need to provide to the [sat_template](macros.md#sa #### Source table The first piece of metadata we need is the source table. This step is easy, as in this example we created the -new staging layer ourselves. All we need to do is provide a reference to the model we created, and dbt will do the rest for us. +staging layer ourselves. All we need to do is provide a reference to the model we created, and dbt will do the rest for us. dbt ensures dependencies are honoured when defining the source using a reference in this way. -[Read more about the ref function](https://docs.getdbt.com/docs/ref) +[Read more about the ref function](https://docs.getdbt.com/v0.14.0/docs/ref) ```sat_customer_details.sql``` ```sql hl_lines="3" @@ -201,6 +201,4 @@ And our table will look like this: ### Next steps -We have now created a staging layer and a hub, link and satellite. We'll be bringing new -table structures in future releases. We'll also be releasing material which demonstrates these examples in a live -environment soon! \ No newline at end of file +We have now created a staging layer and a hub, link and satellite. Next we will ook at transactional links. \ No newline at end of file diff --git a/docs/setup.md b/docs/setup.md index e7b787b62..e104073cd 100644 --- a/docs/setup.md +++ b/docs/setup.md @@ -36,7 +36,7 @@ In your dbt profiles, you must create a connection with this name and provide th account details so that dbt can connect to your Snowflake databases. dbt provides their own documentation on how to configure profiles, so we suggest reading that -[here](https://docs.getdbt.com/docs/configure-your-profile). +[here](https://docs.getdbt.com/v0.14.0/docs/configure-your-profile). A sample profile configuration is provided below which will get you started: diff --git a/docs/sourceprofile.md b/docs/sourceprofile.md index cef483d15..d4fb1df52 100644 --- a/docs/sourceprofile.md +++ b/docs/sourceprofile.md @@ -53,6 +53,25 @@ Next we looked at the relationship between customers and orders. We wanted to ch We did this by doing a left outer join on the ```ORDERS``` table, with the ```CUSTOMER``` table and discovered that several customers exist without orders. +#### Transactions + +To create transactional links in the demonstration project, we needed to simulate transactions, as there are no suitable +or explicit transaction records present in the dataset. There are implied transactions however, as customers place orders. +To simulate a concrete transactions, we created a raw staging layer as a view, called +```raw_transactions``` and used the following fields: + +- Customer key +- Order key +- Order date +- Total price, aliased as Amount, to mean the order is paid off in full. +- Type, a generated column, using a random selection between ```CR``` or ```DR``` to mean a debit or credit to the customer. +- Transaction Date. A calculated column which is takes the order date and adds 20 days, to mean a customer paid 20 days +after their order was made. +- Transaction number. A calculated column created by concatenating the Order key, Customer key and order date and padding the +result with 0s to ensure the number is 24 digits long. + +The ```ORDERS``` and ```CUSTOMER``` tables are then joined (left outer) to simulate transactions on customer orders. + ### Conclusions To create a source feed simulation with the static data (shown by the logical pattern in the date fields), we can use @@ -69,7 +88,10 @@ to the ```LINEITEM``` table. The relationship between customers and orders tells us that customers without an order will not be loaded into the Data Vault, as we are using the ```ORDERDATE``` for day-feed simulation. -Now that we have profiled the data, we cna make more informed decisions when mapping the source system to the Data Vault +This also means that we can simulate transactions by using the implication that a customer makes a payment on an order +some time after the order has been made. + +Now that we have profiled the data, we can make more informed decisions when mapping the source system to the Data Vault architecture. diff --git a/docs/stagingdemo.md b/docs/stagingdemo.md index f58170f2c..cc1ea8ae2 100644 --- a/docs/stagingdemo.md +++ b/docs/stagingdemo.md @@ -18,6 +18,13 @@ The ```raw_inventory``` model feeds the static inventory from TPC-H. As this dat we do not need to do any additional date processing or use the ```date``` var as we did for the raw orders data. The inventory consists of the ```PARTSUPP```, ```SUPPLIER```, ```PART``` and ```LINEITEM``` tables. +### raw_transactions + +The ```raw_inventory``` simulates transactions so that we can create transactional links. It does this by +making a number of calculations on orders made by customers and creating transaction records. + +[Read more](sourceprofile.md#transactions) + ## Building the raw staging layer To build this layer with dbtvault, run the below command: @@ -30,14 +37,17 @@ two raw staging layer models, so this will compile and run both models. The dbt output should give something like this: ```shell -16:11:33 | Concurrency: 4 threads (target='dev') -16:11:33 | -16:11:33 | 1 of 2 START incremental model DEMO_RAW.raw_inventory................ [RUN] -16:11:33 | 2 of 2 START incremental model DEMO_RAW.raw_orders................... [RUN] -16:12:05 | 2 of 2 OK created incremental model DEMO_RAW.raw_orders.............. [SUCCESS 24627 in 32.46s] -16:12:43 | 1 of 2 OK created incremental model DEMO_RAW.raw_inventory........... [SUCCESS 8000000 in 69.54s] -16:12:43 | -16:12:43 | Finished running 2 incremental models in 81.39s. +14:18:17 | Concurrency: 4 threads (target='dev') +14:18:17 | +14:18:17 | 1 of 3 START view model DEMO_RAW.raw_inventory....................... [RUN] +14:18:17 | 2 of 3 START view model DEMO_RAW.raw_orders.......................... [RUN] +14:18:17 | 3 of 3 START view model DEMO_RAW.raw_transactions.................... [RUN] +14:18:19 | 3 of 3 OK created view model DEMO_RAW.raw_transactions............... [SUCCESS 1 in 1.49s] +14:18:19 | 1 of 3 OK created view model DEMO_RAW.raw_inventory.................. [SUCCESS 1 in 1.71s] +14:18:20 | 2 of 3 OK created view model DEMO_RAW.raw_orders..................... [SUCCESS 1 in 2.06s] +14:18:20 | +14:18:20 | Finished running 3 view models in 8.10s. + ``` ## The hashed staging layer @@ -126,7 +136,13 @@ The ```v_stg_orders``` and ```v_stg_inventory``` models use the raw layer's ```r models as sources, respectively. Both are created as views on the raw staging layer, as they are intended as transformations on the data which already exists. -Eeach view adds a number of primary keys, hashdiffs and additional constants for use in the raw vault. +Each view adds a number of primary keys, hashdiffs and additional constants for use in the raw vault. + +### v_stg_transactions + +The ```v_stg_transactions``` model uses the raw layer's ```raw_transactions``` model as its source. +For the load date, we add a day to the ```TRANSACTION_DATE``` to simulate the fact we are loading the data in the date +after the transaction was made. ## Building the hashed staging layer @@ -140,12 +156,14 @@ two hashed staging layer models, so this will compile and run both models. The dbt output should give something like this: ```shell -16:23:13 | Concurrency: 4 threads (target='dev') -16:23:13 | -16:23:13 | 1 of 2 START view model DEMO_STG.v_stg_inventory..................... [RUN] -16:23:14 | 2 of 2 START view model DEMO_STG.v_stg_orders........................ [RUN] -16:23:19 | 1 of 2 OK created view model DEMO_STG.v_stg_inventory................ [SUCCESS 1 in 5.10s] -16:23:20 | 2 of 2 OK created view model DEMO_STG.v_stg_orders................... [SUCCESS 1 in 5.10s] -16:23:20 | -16:23:20 | Finished running 2 view models in 13.27s. +14:19:17 | Concurrency: 4 threads (target='dev') +14:19:17 | +14:19:17 | 1 of 3 START view model DEMO_STG.v_stg_inventory..................... [RUN] +14:19:17 | 2 of 3 START view model DEMO_STG.v_stg_orders........................ [RUN] +14:19:17 | 3 of 3 START view model DEMO_STG.v_stg_transactions.................. [RUN] +14:19:19 | 3 of 3 OK created view model DEMO_STG.v_stg_transactions............. [SUCCESS 1 in 1.99s] +14:19:20 | 2 of 3 OK created view model DEMO_STG.v_stg_orders................... [SUCCESS 1 in 2.52s] +14:19:20 | 1 of 3 OK created view model DEMO_STG.v_stg_inventory................ [SUCCESS 1 in 2.59s] +14:19:20 | +14:19:20 | Finished running 3 view models in 7.98s. ``` \ No newline at end of file diff --git a/docs/t_links.md b/docs/t_links.md new file mode 100644 index 000000000..338f80036 --- /dev/null +++ b/docs/t_links.md @@ -0,0 +1,182 @@ +# Transactional Links + +Also known as non-historized or no-history links, transactional links record the transaction or 'event' components of +their referenced hub tables. They allow us to model the more granular relationships between entities. Some prime examples +are purchases, flights or emails; there is a record in the table for every event or transaction between the entities +instead of just one record per relation. + +Our transactional links will contain: + +1. A primary key. For t-links, we take the natural keys (prior to hashing) represented by the foreign key columns below and create a hash on a concatenation of them. +2. Foreign keys holding the primary key for each hub referenced in the link (2 or more depending on the number of hubs referenced) +3. A payload. The payload consists of concrete data for an entity, i.e. a transaction record. This could be +a transaction number, an amount paid, transaction type or more. The payload will contain all of the +concrete data for a transaction. +4. An effectivity date. Usually called ```EFFECTIVE_FROM```, this column is the business effective date of a +satellite record. It records that a record is valid from a specific point in time. In the case of a transaction, this +is usually the date on which the transaction occured. + +5. The load date or load date timestamp. +6. The source for the record + +!!! note + ```LOADDATE``` is the time the record is loaded into the database. ```EFFECTIVE_FROM``` is different and may hold a + different value, especially if there is a batch processing delay between when a business event happens and the + record arriving in the database for load. Having both dates allows us to ask the questions 'what did we know when' + and 'what happened when' using the ```LOADDATE``` and ```EFFECTIVE_FROM``` date accordingly. + +### Creating the model header + +Create a new dbt model as before. We'll call this one ```t_link_transactions```. + +The following header is what we use, but feel free to customise it to your needs: + +```t_link_transactions.sql``` +```sql +{{- config(materialized='incremental', schema='MYSCHEMA', enabled=true, tags='t_link') -}} +``` + +Transactional links are always incremental, as we load and add new records to the existing data set. + +[Read more about incremental models](https://docs.getdbt.com/v0.14.0/docs/configuring-incremental-models) + +### Adding the metadata + +Let's look at the metadata we need to provide to the [t_link_template](macros.md#t_link_template) macro. + +#### Source table + +The first piece of metadata we need is the source table. For transactional links this can sometimes be a little +trickier than other table types. We need particular columns to model the transaction or event which has occured in the +relationship between the hubs we are referencing, and therefore may need to create a staging layer specifically for the +purposes of feeding the transactional link. + +For this step, ensure you have the following columns present in the source table: + +1. A hashed transaction number as the primary key +2. Hashed foreign keys, one for each of the referenced hubs. +3. A payload. This will be data about the transaction itself e.g. the amount, type, date or non-hashed transaction number. +4. An ```EFFECTIVE_FROM``` date. This will usually be the date of the transaction. +5. A load date timestamp +6. A source + +Assuming you have a raw source table with these required columns, we can create a hashed staging table +using a dbt model, (let's call it ```stg_transactions_hashed.sql```) and use it for the source table +reference. dbt ensures dependencies are honoured when defining the source using a reference in this way. + +[Read more about the ref function](https://docs.getdbt.com/v0.14.0/docs/ref) + +```t_link_transactions.sql``` +```sql hl_lines="3" +{{- config(materialized='incremental', schema='MYSCHEMA', enabled=true, tags='t_link') -}} + +{%- set source = [ref('stg_transactions_hashed')] -%} +``` + +!!! note + Make sure you surround the ref call with square brackets, as shown in the snippet + above. + + +#### Source columns + +Next, we define the columns which we would like to bring from the source. +We can use the columns we identified in the ```Source table``` section, above. + +```t_link_transactions.sql``` +```sql hl_lines="5 6 7 8 9 10" +{{- config(materialized='incremental', schema='MYSCHEMA', enabled=true, tags='t_link') -}} + +{%- set source = [ref('stg_transactions_hashed')] -%} + +{%- set src_pk = 'TRANSACTION_PK' -%} +{%- set src_fk = ['CUSTOMER_FK', 'ORDER_FK'] -%} +{%- set src_payload = ['TRANSACTION_NUMBER', 'TRANSACTION_DATE', 'TYPE', 'AMOUNT'] -%} +{%- set src_eff = 'EFFECTIVE_FROM' -%} +{%- set src_ldts = 'LOADDATE' -%} +{%- set src_source = 'SOURCE' -%} +``` + +#### Target columns + +Now we can define the target column mapping. The [t_link_template](macros.md#t_link_template) does a lot of work for us if we +provide the metadata it requires. + +```t_link_transactions.sql``` +```sql hl_lines="12 13 14 15 16 17" +{{- config(materialized='incremental', schema='MYSCHEMA', enabled=true, tags='t_link') -}} + +{%- set source = [ref('stg_transactions_hashed')] -%} + +{%- set src_pk = 'TRANSACTION_PK' -%} +{%- set src_fk = ['CUSTOMER_FK', 'ORDER_FK'] -%} +{%- set src_payload = ['TRANSACTION_NUMBER', 'TRANSACTION_DATE', 'TYPE', 'AMOUNT'] -%} +{%- set src_eff = 'EFFECTIVE_FROM' -%} +{%- set src_ldts = 'LOADDATE' -%} +{%- set src_source = 'SOURCE' -%} + +{%- set tgt_pk = source -%} +{%- set tgt_fk = source -%} +{%- set tgt_payload = source -%} +{%- set tgt_eff = source -%} +{%- set tgt_ldts = source -%} +{%- set tgt_source = source -%} +``` + +With these 6 additional lines, we have now informed the macro that we do not want to modify +our source data, we are simply using the ```source``` reference as shorthand for keeping the columns the same as +the source. In other tables in this walkthrough, notably [satellites](satellites.md#target-columns), we carried out +some manual mapping, but this isn't always necessary if we have all the columns we need in the staging layers. + +### Invoking the template + +Now we bring it all together and call the [t_link_template](macros.md#t_link_template) macro: + +```t_link_transactions.sql``` +```sql hl_lines="19 20 21" +{{- config(materialized='incremental', schema='MYSCHEMA', enabled=true, tags='t_link') -}} + +{%- set source = [ref('stg_transactions_hashed')] -%} + +{%- set src_pk = 'TRANSACTION_PK' -%} +{%- set src_fk = ['CUSTOMER_FK', 'ORDER_FK'] -%} +{%- set src_payload = ['TRANSACTION_NUMBER', 'TRANSACTION_DATE', 'TYPE', 'AMOUNT'] -%} +{%- set src_eff = 'EFFECTIVE_FROM' -%} +{%- set src_ldts = 'LOADDATE' -%} +{%- set src_source = 'SOURCE' -%} + +{%- set tgt_pk = source -%} +{%- set tgt_fk = source -%} +{%- set tgt_payload = source -%} +{%- set tgt_eff = source -%} +{%- set tgt_ldts = source -%} +{%- set tgt_source = source -%} + +{{ dbtvault.t_link_template(src_pk, src_fk, src_payload, src_eff, src_ldts, src_source, + tgt_pk, tgt_fk, tgt_payload, tgt_eff, tgt_ldts, tgt_source, + source) }} +``` + +### Running dbt + +With our model complete, we can run dbt to create our ```t_link_transactions``` transactional link. + +```dbt run --models +t_link_transactions``` + +And our table will look like this: + +| TRANSACTION_PK | CUSTOMER_FK | ORDER_FK | TRANSACTION_NUMBER | TYPE | AMOUNT | EFFECTIVE_FROM | LOADDATE | SOURCE | +| --------------- | ----------- | --------- | ------------------ | ---- | ------- | -------------- | ----------- | ------ | +| BDEE76... | CA02D6... | CF97F1... | 123456789101 | CR | 100.00 | 1993-01-28 | 1993-01-29 | 2 | +| . | . | . | . | . | . | . | . | . | +| . | . | . | . | . | . | . | . | . | +| E0E7A8... | F67DF4... | 2C95D4... | 123456789104 | CR | 678.23 | 1993-01-28 | 1993-01-29 | 2 | + + +### Next steps + +We have now created a staging layer and a hub, link, satellite and transactional link. We'll be bringing new +table structures in future releases. + +Take a look at our [worked example](workedexample.md) for a demonstration of a realistic environment with pre-written +models for you to experiment with and learn from. \ No newline at end of file diff --git a/docs/walkthrough.md b/docs/walkthrough.md index d62d41b10..b7ae2bff6 100644 --- a/docs/walkthrough.md +++ b/docs/walkthrough.md @@ -21,7 +21,8 @@ We will: 2. A Snowflake account, trial or otherwise. [Sign up for a free 30-day trial here](https://trial.snowflake.com/ab/) -3. You must have downloaded and installed dbt, and [set up a project](https://docs.getdbt.com/docs/dbt-projects). +3. You must have downloaded and installed dbt 0.14(0.15 support will be added soon!), +and [set up a project](https://docs.getdbt.com/v0.14.0/docs/dbt-projects). 4. Sources should be set up in dbt [(see below)](#setting-up-sources). @@ -38,7 +39,7 @@ We will be using the ```source``` feature of dbt extensively throughout the docu data much easier, cleaner and more modular. We have provided an example below which shows a configuration similar to that used for the examples in our documentation, -however this feature is documented extensively in [the documentation for dbt](https://docs.getdbt.com/docs/using-sources). +however this feature is documented extensively in [the documentation for dbt](https://docs.getdbt.com/v0.14.0/docs/using-sources). After reading the above documentation, we recommend that you place the ```schema.yml``` file you create for your sources, in the root of your ```models``` folder, however you can place it where needed for your specific project and models. @@ -70,4 +71,4 @@ packages: And run ```dbt deps``` -[Read more on package installation (from dbt)](https://docs.getdbt.com/docs/package-management) \ No newline at end of file +[Read more on package installation (from dbt)](https://docs.getdbt.com/v0.14.0/docs/package-management) \ No newline at end of file diff --git a/docs/workedexample.md b/docs/workedexample.md index 1ee127306..292d07ef6 100644 --- a/docs/workedexample.md +++ b/docs/workedexample.md @@ -15,7 +15,7 @@ We will: - examine and profile the TPCH dataset to explore how we can map it to the Data Vault architecture. - create a raw staging layer. - process the raw staging layer. -- create a Data Vault with hubs, links and satellites using dbtvault and pre-written models. +- create a Data Vault with hubs, links, satellites and transactional links using dbtvault and pre-written models. ## Pre-requisites @@ -31,8 +31,9 @@ be the only necessary requirements you will need to get started with the example !!! warning We suggest a trial account so that you have full privileges and assurance that the demo is isolated from any - production warehouses. Whilst there shouldn't be any risk that the demo affects any unrelated data outside of the - scope of this project, you may use a corporate account or existing personal account at your own risk, + production warehouses. Whilst there is no risk that the demo affects any unrelated data outside of the + scope of this project, you will incur compute costs. + You may use a corporate account or existing personal account at your own risk. !!! note We have provided a complete ```requirements.txt``` to install with ```pip install -r requirements.txt``` @@ -41,20 +42,18 @@ be the only necessary requirements you will need to get started with the example ## Performance note -Please be aware that table structures are simulated from the TPCH-H dataset. The TPC-H dataset is a static view of data. +Please be aware that table structures are simulated from the TPC-H dataset. The TPC-H dataset is a static view of data. Only a subset of the data contains dates which allows us to simulate daily feeds. The ```v_stg_orders``` orders view is filtered by date, unfortunately the ```v_stg_inventory``` view cannot be filtered by date, so it ends up being a feed of the entire contents of the view each cycle. -This means that inventory related hubs links and satellites are populated once during the initial load cycle with +This means that inventory related hubs, links and satellites are populated once during the initial load cycle with everything and later cycles insert 0 new records in their left outer joins. As the dataset increases in size, e.g if you run with a larger TPC-H dataset (100, 1000 etc.) then be aware you are processing the entire inventory dataset each cycle, which results in unrepresentative load cycle times. -Unfortunately it's the nature of the dataset, it will not be that way for other datasets. We will look at additonal -datasets in the future! - -If you are feeling adventurous you may disable the inventory feed (```raw_inventory``` and child models) to see a more -accurate representation of performance. \ No newline at end of file +We have minimised the impact of this by adding a join in the raw inventory table on the raw orders table to ensure only +inventory items which are included in orders are fed into raw staging. The outcome is the same, but it significantly +optimises the loading process and thereby reduces load time. \ No newline at end of file diff --git a/macros/supporting/hash.sql b/macros/supporting/hash.sql index e78d0f353..a9130334d 100644 --- a/macros/supporting/hash.sql +++ b/macros/supporting/hash.sql @@ -14,22 +14,36 @@ -#} {%- macro hash(columns, alias, sort=false) -%} +{%- set hash = var('hash', 'MD5') -%} + +{#- Select hashing algorithm -#} +{%- if hash == 'MD5' -%} + {%- set hash_alg = 'MD5_BINARY' -%} + {%- set hash_size = 16 -%} +{%- elif hash == 'SHA' -%} + {%- set hash_alg = 'SHA2_BINARY' -%} + {%- set hash_size = 32 -%} +{%- else -%} + {%- set hash_alg = 'MD5_BINARY' -%} + {%- set hash_size = 32 -%} +{%- endif -%} + {#- Alpha sort columns before hashing -#} {%- if sort and columns is iterable and columns is not string -%} {%- set columns = columns|sort -%} {%- endif -%} {%- if columns is string %} - CAST(MD5_BINARY(UPPER(TRIM(CAST({{columns}} AS VARCHAR)))) AS BINARY(16)) AS {{alias}} + CAST({{- hash_alg -}}(UPPER(TRIM(CAST({{columns}} AS VARCHAR)))) AS BINARY({{- hash_size -}})) AS {{alias}} {%- else %} - CAST(MD5_BINARY(CONCAT( + CAST({{- hash_alg -}}(CONCAT( {%- for column in columns[:-1] %} - IFNULL(UPPER(TRIM(CAST({{column}} AS VARCHAR))), '^^'), '||', + IFNULL(UPPER(TRIM(CAST({{- column }} AS VARCHAR))), '^^'), '||', {%- if loop.last %} - IFNULL(UPPER(TRIM(CAST({{columns[-1]}} AS VARCHAR))), '^^') )) AS BINARY(16)) AS {{alias}} + IFNULL(UPPER(TRIM(CAST({{columns[-1]}} AS VARCHAR))), '^^') )) AS BINARY({{- hash_size -}})) AS {{alias}} {%- endif -%} {%- endfor -%} {%- endif -%} diff --git a/macros/tables/t_link_template.sql b/macros/tables/t_link_template.sql new file mode 100644 index 000000000..b8c1d49e3 --- /dev/null +++ b/macros/tables/t_link_template.sql @@ -0,0 +1,45 @@ +{#- Copyright 2019 Business Thinking LTD. trading as Datavault + + Licensed under the Apache License, Version 2.0 (the "License"); + you may not use this file except in compliance with the License. + You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + + Unless required by applicable law or agreed to in writing, software + distributed under the License is distributed on an "AS IS" BASIS, + WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + See the License for the specific language governing permissions and + limitations under the License. +-#} +{%- macro t_link_template(src_pk, src_fk, src_payload, src_eff, src_ldts, src_source, + tgt_pk, tgt_fk, tgt_payload, tgt_eff, tgt_ldts, tgt_source, + source) -%} + +{%- set tgt_cols = dbtvault.create_tgt_cols(src_pk=src_pk, src_fk=src_fk, src_payload=src_payload, + src_eff=src_eff, src_ldts=src_ldts, src_source=src_source, + tgt_pk=tgt_pk, tgt_fk=tgt_fk, tgt_payload=tgt_payload, + tgt_eff=tgt_eff, tgt_ldts=tgt_ldts, tgt_source=tgt_source, + source=source) -%} + +{%- set tgt_pk = tgt_cols['tgt_pk'] -%} +{%- set tgt_fk = tgt_cols['tgt_fk'] -%} +{%- set tgt_payload = tgt_cols['tgt_payload'] -%} +{%- set tgt_eff = tgt_cols['tgt_eff'] -%} +{%- set tgt_ldts = tgt_cols['tgt_ldts'] -%} +{%- set tgt_source = tgt_cols['tgt_source'] -%} + +{%- set is_union = dbtvault.is_union(source) -%} +-- Generated by dbtvault. Copyright 2019 Business Thinking LTD. trading as Datavault +SELECT DISTINCT {{ dbtvault.cast([tgt_pk, tgt_fk, tgt_payload, tgt_eff, tgt_ldts, tgt_source], 'stg') }} +FROM ( + SELECT {{ dbtvault.prefix([src_pk, src_fk, src_payload, src_eff, + src_ldts, src_source], 'stg') }} + FROM {{ source[0] }} AS stg +) AS stg +{% if is_incremental() -%} +LEFT JOIN {{ this }} AS tgt +ON {{ dbtvault.prefix([tgt_pk|first], 'stg') }} = {{ dbtvault.prefix([tgt_pk|last], 'tgt') }} +WHERE {{ dbtvault.prefix([tgt_pk|last], 'tgt') }} IS NULL +{%- endif -%} +{%- endmacro -%} \ No newline at end of file diff --git a/mkdocs.yml b/mkdocs.yml index c414690df..41a0f97da 100644 --- a/mkdocs.yml +++ b/mkdocs.yml @@ -24,6 +24,7 @@ nav: - Hubs: 'hubs.md' - Links: 'links.md' - Satellites: 'satellites.md' + - T-Links: 't_links.md' - Worked example: - Getting Started: 'workedexample.md' - Project setup: 'setup.md'