Skip to content

Commit

Permalink
Merge branch 'master' into releases
Browse files Browse the repository at this point in the history
  • Loading branch information
Alex Higgs committed Nov 27, 2019
2 parents 8bdd40f + 8fe1011 commit 6417e85
Show file tree
Hide file tree
Showing 21 changed files with 583 additions and 82 deletions.
15 changes: 8 additions & 7 deletions CONTRIBUTING.md
Original file line number Diff line number Diff line change
@@ -1,16 +1,16 @@
## We'd love to hear from you

This dbtvault package is very much a work in progress – we’ll up the version number to 1.0 when we’re satisfied it
works out in the wild.
dbtvault is very much a work in progress – we’re constantly adding quality of life improvements and will be adding
new table types regularly.

We know that it deserves new features, that the code base can be tidied up and the SQL better tuned.
Rest assured we’re working on it for future releases – our roadmap contains information on what’s coming.

If you spot anything you’d like to bring to our attention, have a request for new features,
have spotted an improvement we could make, or want to tell us about a typo, then please don’t hesitate to let us know
by submitting an issue using the below guidelines
Rest assured we’re working on it for future releases – [our roadmap contains information on what’s coming](roadmap.md).

If you spot anything you’d like to bring to our attention, have a request for new features, have spotted an improvement we could make,
or want to tell us about a typo or bug, then please don’t hesitate to let us know via [Github](https://github.com/Datavault-UK/dbtvault/issues).

We’d rather know you are making active use of this package than hearing nothing from all of you out there!
We’d rather know you are making active use of this package than hearing nothing from all of you out there!

Happy Data Vaulting!

Expand All @@ -20,6 +20,7 @@ Happy Data Vaulting!
We've tested the package rigorously, but if you think you've found a bug please provide the following
at a minimum (or use the issue templates) so we can fix it as quickly as possible:

- The version of dbt being used
- The version of dbtvault being used.
- Steps to reproduce the issue
- Any error messages or dbt log files which can give more detail of the problem
Expand Down
17 changes: 13 additions & 4 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,10 +1,18 @@
<p align="left">
<img src="https://user-images.githubusercontent.com/25080503/69713956-6249de80-10fd-11ea-8120-413db42d50ac.png">
<p> There will be a live demonstration of dbtvault at the next UK Data Vault User Group on Tuesday, December 3, 2019 @ 6pm in LONDON.

<a href="https://www.meetup.com/UK-Data-Vault-User-Group/events/266604902/">Sign up for FREE now! </a>
</p>
</p>

<p align="center">
<img src="https://user-images.githubusercontent.com/25080503/65772647-89525700-e132-11e9-80ff-12ad30a25466.png">
</p>

latest [![Documentation Status](https://readthedocs.org/projects/dbtvault/badge/?version=latest)](https://dbtvault.readthedocs.io/en/latest/?badge=latest)

stable [![Documentation Status](https://readthedocs.org/projects/dbtvault/badge/?version=v0.3.3-pre)](https://dbtvault.readthedocs.io/en/v0.3.3-pre/?badge=v0.3.3-pre)
stable [![Documentation Status](https://readthedocs.org/projects/dbtvault/badge/?version=v0.4)](https://dbtvault.readthedocs.io/en/v0.4/?badge=v0.4)

[past docs versions](https://dbtvault.readthedocs.io/en/latest/changelog/)

Expand Down Expand Up @@ -35,20 +43,21 @@ Get started quickly with our worked example:

## Installation

Ensure you are using dbt 0.14 (0.15 support will be added soon!)
Add the following to your ```packages.yml```


```yaml
packages:

- git: "https://github.com/Datavault-UK/dbtvault"
revision: v0.3.3-pre # Latest stable version
revision: v0.4 # Latest stable version
```
And run
```dbt deps```

[Read more on package installation](https://docs.getdbt.com/docs/package-management)
[Read more on package installation](https://docs.getdbt.com/v0.14.0/docs/package-management)

## Usage

Expand Down Expand Up @@ -77,4 +86,4 @@ before anyone else!
[View our contribution guidelines](CONTRIBUTING.md)

## License
[Apache 2.0](LICENSE.md)
[Apache 2.0](LICENSE.md)
6 changes: 5 additions & 1 deletion dbt_project.yml
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
name: 'dbtvault'
version: '0.3.3'
version: '0.4'

profile: 'dbtvault'

Expand All @@ -13,3 +13,7 @@ target-path: "target"
clean-targets:
- "target"
- "dbt_modules"

models:
vars:
hash: MD5
48 changes: 45 additions & 3 deletions docs/bestpractices.md
Original file line number Diff line number Diff line change
Expand Up @@ -43,8 +43,9 @@ then there is a chance of a clash: where two different values generate the same

For this reason, it **should not be** used for cryptographic purposes either.

In future releases of dbtvault, we will allow you to change the algorithm that is used (e.g. to SHA-256) to reduce the
chance of a clash (at the expense of more processing and a larger column), or switch off hashing entirely.
!!! success

You may now choose between MD5 and SHA-256 in dbtvault, [read below](bestpractices.md#choosing-a-hashing-algorithm-in-dbtvault).

### Why do we hash?

Expand Down Expand Up @@ -80,4 +81,45 @@ staging tables
the sorting functionality for primary keys.

- For **links**, columns must be sorted by the primary key of the hub and arranged alphabetically by the hub name.
The order must also be the same as each hub.
The order must also be the same as each hub.

### Choosing a hashing algorithm in dbtvault

With the release of dbtvault 0.4, you may now choose between ```MD5``` and ```SHA-256``` hashing. ```SHA-256``` was added
to dbtvault as an option for users who wish to reduce the hashing collision rates in larger data sets.

!!! note

If a hashing algorithm configuration is missing or invalid, dbtvault will use ```MD5``` by default.

Configuring the hashing algorithm which will be used by dbtvault is simple: simply add a variable to your
```dbt_project.yml``` as follows:

```dbt_project.yml```
```yaml

name: 'my_project'
version: '1'

profile: 'my_project'

source-paths: ["models"]
analysis-paths: ["analysis"]
test-paths: ["tests"]
data-paths: ["data"]
macro-paths: ["macros"]

target-path: "target"
clean-targets:
- "target"
- "dbt_modules"

models:
vars:
hash: SHA # or MD5
```
It is possible to configure a hashing algorithm on a model-by-model basis using the hierarchical structure of the ```yaml``` file.
We recommend you keep the hashing algorithm consistent across all tables, however, as per best practise.

Read the [dbt documentation](https://docs.getdbt.com/v0.14.0/docs/var) for further information on variable scoping.
28 changes: 27 additions & 1 deletion docs/changelog.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,33 @@ All notable changes to this project will be documented in this file.
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).

## [v0.4] - 2019-11-27
[![Documentation Status](https://readthedocs.org/projects/dbtvault/badge/?version=v0.4)](https://dbtvault.readthedocs.io/en/v0.4-pre/?badge=v0.4)

### Added

- Table Macros:
- [Transactional Links](macros.md#t_link_template)

### Improved

- Hashing:
- You may now choose between ```MD5``` and ```SHA-256``` hashing with a simple yaml configuration
[Learn how!](bestpractices.md#choosing-a-hashing-algorithm-in-dbtvault)

### Worked example

- Transactional Links
- Added a transactional link model using a simulated transaction feed.

### Documentation

- Updated macros, best practices, roadmap, and other pages to account for new features
- Updated worked example documentation
- Replaced all dbt documentation links with links to the 0.14 documentation as dbtvault
is using dbt 0.14 currently (we will be updating to 0.15 soon!)
- Minor corrections

## [v0.3.3-pre] - 2019-10-31
[![Documentation Status](https://readthedocs.org/projects/dbtvault/badge/?version=v0.3.3-pre)](https://dbtvault.readthedocs.io/en/v0.3.3-pre/?badge=v0.3.3-pre)

Expand Down Expand Up @@ -131,7 +158,6 @@ the new and improved features.

### Added


- Table Macros:
- [Hub](macros.md#hub_template)
- [Link](macros.md#link_template)
Expand Down
3 changes: 2 additions & 1 deletion docs/contributing.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@ We know that it deserves new features, that the code base can be tidied up and t
Rest assured we’re working on it for future releases – [our roadmap contains information on what’s coming](roadmap.md).

If you spot anything you’d like to bring to our attention, have a request for new features, have spotted an improvement we could make,
or want to tell us about a typo, then please don’t hesitate to let us know via [Github](https://github.com/Datavault-UK/dbtvault/issues).
or want to tell us about a typo or bug, then please don’t hesitate to let us know via [Github](https://github.com/Datavault-UK/dbtvault/issues).

We’d rather know you are making active use of this package than hearing nothing from all of you out there!

Expand All @@ -20,6 +20,7 @@ Happy Data Vaulting! :smile:
We've tested the package rigorously, but if you think you've found a bug please provide the following
at a minimum (or use the issue templates) so we can fix it as quickly as possible:

- The version of dbt being used
- The version of dbtvault being used.
- Steps to reproduce the issue
- Any error messages or dbt log files which can give more detail of the problem
Expand Down
6 changes: 3 additions & 3 deletions docs/hubs.md
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,7 @@ The following header is what we use, but feel free to customise it to your needs

Hubs are always incremental, as we load and add new records to the existing data set.

[Read more about incremental models](https://docs.getdbt.com/docs/configuring-incremental-models)
[Read more about incremental models](https://docs.getdbt.com/v0.14.0/docs/configuring-incremental-models)

!!! note "Dont worry!"
The [hub_template](macros.md#hub_template) deals with the Data Vault
Expand All @@ -39,10 +39,10 @@ Let's look at the metadata we need to provide to the [hub_template](macros.md#hu
#### Source table

The first piece of metadata we need is the source table. This step is easy, as in this example we created the
new staging layer ourselves. All we need to do is provide a reference to the model we created, and dbt will do the rest for us.
staging layer ourselves. All we need to do is provide a reference to the model we created, and dbt will do the rest for us.
dbt ensures dependencies are honoured when defining the source using a reference in this way.

[Read more about the ref function](https://docs.getdbt.com/docs/ref)
[Read more about the ref function](https://docs.getdbt.com/v0.14.0/docs/ref)

```hub_customer.sql```

Expand Down
2 changes: 1 addition & 1 deletion docs/links.md
Original file line number Diff line number Diff line change
Expand Up @@ -38,7 +38,7 @@ The first piece of metadata we need is the source table. This step is easy, as w
staging layer ourselves. All we need to do is provide a reference to the model we created, and dbt will do the rest for us.
dbt ensures dependencies are honoured when defining the source using a reference in this way.

[Read more about the ref function](https://docs.getdbt.com/docs/ref)
[Read more about the ref function](https://docs.getdbt.com/v0.14.0/docs/ref)

```link_customer_nation.sql```

Expand Down
30 changes: 28 additions & 2 deletions docs/loading.md
Original file line number Diff line number Diff line change
Expand Up @@ -29,11 +29,11 @@ This will run all models with the hub tag.
Links are another fundamental component in a Data Vault.

Links model an association or link, between two business keys. They commonly hold business transactions or structural
information.
information. A link specifically contains the structural information.

Our links will contain:

1. A primary key. For links, we take the natural keys (prior to hashing) represented by the foreign key columns below
1. A primary key. For links, we take the natural keys (prior to hashing) represented by the foreign key columns
and create a hash on a concatenation of them.
2. Foreign keys holding the primary key for each hub referenced in the link (2 or more depending on the number of hubs
referenced)
Expand Down Expand Up @@ -89,6 +89,32 @@ To compile and load the provided satellite models, run the following command:

This will run all models with the satellite tag.

## Transactional Links

Transactional Links are used to model transactions between entities in a Data Vault.

Links model an association or link, between two business keys. They commonly hold business transactions or structural
information. A transactional link specifically contains the business transactions.

Our transactional links will contain:

1. A primary key. For transactional links, we use the transaction number. If this is not already present in the dataset
then we create this by concatenating the foreign keys and hashing them.
2. Foreign keys holding the primary key for each hub referenced in the transactional link (2 or more depending on the number of hubs
referenced)
3. A payload. This will be data about the transaction itself e.g. the amount, type, date or non-hashed transaction number.
4. An ```EFFECTIVE_FROM``` date. This will usually be the date of the transaction.
5. The load date or load date timestamp.
6. The source for the record

### Loading transactional links

To compile and load the provided t_link models, run the following command:

```dbt run --models tag:t_link```

This will run all models with the t_link tag.

## Loading the full system

Each of the commands above load a particular type of table, however, we may want to do a full system load.
Expand Down
Loading

0 comments on commit 6417e85

Please sign in to comment.