Skip to content

Commit

Permalink
Merge branch 'master' into releases
Browse files Browse the repository at this point in the history
  • Loading branch information
Alex Higgs committed Oct 24, 2019
2 parents 2d4d566 + bf99075 commit 2316276
Show file tree
Hide file tree
Showing 25 changed files with 862 additions and 567 deletions.
4 changes: 2 additions & 2 deletions CONTRIBUTING.md
Original file line number Diff line number Diff line change
Expand Up @@ -32,5 +32,5 @@ please feel free to submit ideas and thoughts!
Create a post with as much detail as possible; We'll be happy to reply and work with you.

## Pull requests
If you've developed something which we can add via a pull request, we'd prefer that you submit an issue first
so that we can discuss the changes.
If you've developed something which we can add via a pull request, we're more than happy to consider it, but we'd
like to discuss the changes first.
8 changes: 4 additions & 4 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,12 +1,12 @@
**CURRENTLY IN PRE-RELEASE, WE ARE CONTINUALLY ADDING FEATURES AND IMPROVING DOCUMENTATION**

<p align="center">
<img src="https://user-images.githubusercontent.com/25080503/65772647-89525700-e132-11e9-80ff-12ad30a25466.png">
</p>

latest [![Documentation Status](https://readthedocs.org/projects/dbtvault/badge/?version=latest)](https://dbtvault.readthedocs.io/en/latest/?badge=latest)

stable [![Documentation Status](https://readthedocs.org/projects/dbtvault/badge/?version=v0.2.4-pre)](https://dbtvault.readthedocs.io/en/v0.2.4-pre/?badge=v0.2.4-pre)
stable [![Documentation Status](https://readthedocs.org/projects/dbtvault/badge/?version=v0.3-pre)](https://dbtvault.readthedocs.io/en/v0.3-pre/?badge=v0.3-pre)

[past docs versions](https://dbtvault.readthedocs.io/en/latest/changelog/)

# dbtvault by [Datavault](https://www.data-vault.co.uk)

Expand Down Expand Up @@ -34,7 +34,7 @@ Add the following to your ```packages.yml```
packages:

- git: "https://github.com/Datavault-UK/dbtvault"
revision: v0.2.4-pre # Latest stable version
revision: v0.3-pre # Latest stable version
```
And run
Expand Down
2 changes: 1 addition & 1 deletion dbt_project.yml
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
name: 'dbtvault'
version: '0.2.1'
version: '0.3'

profile: 'dbtvault'

Expand Down
Binary file modified docs/assets/images/staging.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
46 changes: 46 additions & 0 deletions docs/bestpractices.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,46 @@
We advise you follow these best practises when using dbtvault.

## Staging

Currently, we are only supporting one load date per load, as per the [prerequisites](gettingstarted.md#prerequisites).

Until a future release solves this limitation, we suggest that if the raw staging layer has a mix of load dates,
create a view on it and filter by the load date column to ensure only a single load date value is present.

After you have done this, follow the below steps:

- Add a reference to the view in your [sources](gettingstarted.md#setting-up-sources).
- Provide the source reference to the view as the source parameter in the [from](macros.md#from)
macro when building your [staging](staging.md) model .

For the next load you then can re-create the view with a different load date and run dbt again, or alternatively
manage a 'water-level' table which tracks the last load date for each source, and is incremented each load cycle.
Do a join to the table to soft-select the next load date.

## Source

We suggest you use a code. This can be anything that makes sense for your particular context, though usually an
integer or alpha-numeric value works well. The code is often used to look-up the full table name in a table.

You may do this with dbtvault by providing the code as a constant in the [staging](staging.md) layer,
using the [add_columns](macros.md#add_columns) macro. The [staging page](staging.md) presents this exact
use-case in the code examples.

If there is already a source in the raw staging layer, you may keep this or override it;
[add_columns](macros.md#add_columns) can do either.

## Hashing

Best practises for hashing include:

- Alpha sorting hashdiff columns. dbtvault does this for us, so no worries! Refer to the [multi-hash](macros.md#multi_hash) docs for how to do this

- Ensure all **hub** columns used to calculate a primary key hash are presented in the same order across all
staging tables

!!! note
Some tables may use different column names for primary key components, so we cannot sort the columns for
you as we do with hashdiffs.

- For **links**, columns must be sorted by the primary key of the hub and arranged alphabetically by the hub name.
The order must also be the same as each hub.
28 changes: 27 additions & 1 deletion docs/changelog.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,26 @@ All notable changes to this project will be documented in this file.
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).

## [v0.3-pre] - 2019-10-24
[![Documentation Status](https://readthedocs.org/projects/dbtvault/badge/?version=v0.3-pre)](https://dbtvault.readthedocs.io/en/v0.3-pre/?badge=v0.3-pre)

### Improvements

- We've removed the need to specify full mappings in the ```tgt``` metadata when creating table models.
Users may now provide a table reference instead, as a shorthand way to keep the column name
and date type the same as the source, [read the docs](macros.md#using-a-source-reference-for-the-target-metadata) for more details.
The option to provide a mapping is still available.

- The check for whether a load is a union load or not is now more reliable.

### Documentation

- Updated code samples and explanations according to new functionality
- Added a best practises page
- Various clarifications added and errors fixed

## [v0.2.4-pre] - 2019-10-17
[![Documentation Status](https://readthedocs.org/projects/dbtvault/badge/?version=v0.2.4-pre)](https://dbtvault.readthedocs.io/en/v0.2.4-pre/?badge=v0.2.4-pre)

### Bug Fixes

Expand All @@ -13,6 +32,7 @@ causing subsequent loads after the initial load, to fail.


## [v0.2.3-pre] - 2019-10-08
[![Documentation Status](https://readthedocs.org/projects/dbtvault/badge/?version=v0.2.3-pre)](https://dbtvault.readthedocs.io/en/v0.2.3-pre/?badge=v0.2.3-pre)

### Macros

Expand All @@ -27,6 +47,7 @@ causing subsequent loads after the initial load, to fail.
- Updated [hash](macros.md#hash) and [multi-hash](macros.md#multi_hash) according to new changes.

## [v0.2.2-pre] - 2019-10-08
[![Documentation Status](https://readthedocs.org/projects/dbtvault/badge/?version=v0.2.2-pre)](https://dbtvault.readthedocs.io/en/v0.2.2-pre/?badge=v0.2.2-pre)

### Documentation

Expand All @@ -36,6 +57,7 @@ causing subsequent loads after the initial load, to fail.
- Renamed ```stg_orders_hashed``` back to ```stg_customers_hashed```

## [v0.2.1-pre] - 2019-10-07
[![Documentation Status](https://readthedocs.org/projects/dbtvault/badge/?version=v0.2.1-pre)](https://dbtvault.readthedocs.io/en/v0.2.1-pre/?badge=v0.2.1-pre)

### Documentation

Expand All @@ -45,6 +67,7 @@ causing subsequent loads after the initial load, to fail.
- Corrected version in dbt_project.yml

## [v0.2-pre] - 2019-10-07
[![Documentation Status](https://readthedocs.org/projects/dbtvault/badge/?version=v0.2-pre)](https://dbtvault.readthedocs.io/en/v0.2-pre/?badge=v0.2-pre)

[Feedback is welcome!](https://github.com/Datavault-UK/dbtvault/issues)

Expand Down Expand Up @@ -76,8 +99,11 @@ the new and improved features.
per best practises.

## [v0.1-pre] - 2019-09 / 2019-10
[![Documentation Status](https://readthedocs.org/projects/dbtvault/badge/?version=v0.1-pre)](https://dbtvault.readthedocs.io/en/v0.1-pre/?badge=v0.1-pre)

### Added


- Table Macros:
- [Hub](macros.md#hub_template)
- [Link](macros.md#link_template)
Expand All @@ -95,4 +121,4 @@ the new and improved features.

### Documentation

- Numerous changes leading up to Version 1.0 release
- Numerous changes for version 0.1 release
4 changes: 2 additions & 2 deletions docs/contributing.md
Original file line number Diff line number Diff line change
Expand Up @@ -32,5 +32,5 @@ please feel free to submit ideas and thoughts!
Create a post with as much detail as possible; We'll be happy to reply and work with you.

## Pull requests
If you've developed something which we can add via a pull request, we'd prefer that you submit an issue first
so that we can discuss the changes.
If you've developed something which we can add via a pull request, we're more than happy to consider it, but we'd
like to discuss the changes first.
2 changes: 1 addition & 1 deletion docs/demonstration.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
## Coming soon

We will soon be making available a downloadable example project running dbtvault with the Snowflake TPCH dataset.
Soon, we will be making available a downloadable example project running dbtvault with Snowflake's TPCH dataset.
This will showcase dbtvault with pre-written models, giving you further understanding of how it all works.

[Sign up](https://www.data-vault.co.uk/dbtvault/) and get notified when this is available!
32 changes: 11 additions & 21 deletions docs/gettingstarted.md
Original file line number Diff line number Diff line change
Expand Up @@ -24,21 +24,22 @@ Happy Data Vaulting! :smile:

5. We assume you already have a raw staging layer.

6. Our macros assume that you are only loading from one set of load dates in a single load cycle (i.e. Your staging layer
6. Our macros assume that you are only loading from one set of load dates in a single load cycle (i.e. your staging layer
contains data for one ```load_datetime``` value only). **We will be removing this restriction in future releases**.

7. You should read our [best practices](bestpractices.md) guidance.

## Setting up sources

We will be using the ```source``` feature of dbt extensively throughout the documentation to make access to source
data much easier, cleaner and more modular. The main advantage of this is that sources will be included in
dbt dependency graphs
data much easier, cleaner and more modular. The main advantage of this is that sources are then included in
dbt dependency graphs.

We have provided an example below which shows a configuration similar to that used for the examples in our documentation,
however this feature is documented extensively in dbts own documentation,
so please [read here](https://docs.getdbt.com/docs/using-sources).
however this feature is documented extensively in [the documentation for dbt itself](https://docs.getdbt.com/docs/using-sources).

After reading the above documentation, we recommend you place the ```schema.yml``` file you create for your sources,
in the root of your ```models``` folder, however you can place it where needed for your specific project.
After reading the above documentation, we recommend that you place the ```schema.yml``` file you create for your sources,
in the root of your ```models``` folder, however you can place it where needed for your specific project and models.

```schema.yml```

Expand All @@ -50,8 +51,8 @@ sources:
database: MYDATABASE
schema: MYSCHEMA
tables:
- name: stg_customer
identifier: table_1
- name: stg_customer # alias
identifier: stg_customer_hashed # table name
- name: ...
```
Expand All @@ -69,15 +70,4 @@ packages:
And run
```dbt deps```

[Read more on package installation (from dbt)](https://docs.getdbt.com/docs/package-management)


## Final note before we start

The documentation is written in the context of a simple example, showing a step by step progression towards
loading a Data Vault 2.0 Data Warehouse. We have documented everything you need to know, but as all use cases will vary,
you will need to adapt this to your own needs and requirements.

If you need any more detail or require specific guidance, do not hesitate to
[submit an issue](https://github.com/Datavault-UK/dbtvault/issues).
We may be able to improve the package based on your feedback, and this will benefit the whole community!
[Read more on package installation (from dbt)](https://docs.getdbt.com/docs/package-management)
Loading

0 comments on commit 2316276

Please sign in to comment.