-
Notifications
You must be signed in to change notification settings - Fork 984
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
dbt Core and DuckDB Quickstart #5783
base: current
Are you sure you want to change the base?
Changes from 250 commits
4bed10f
aa47478
77a9a56
3595de9
67df5d1
d5a484d
3f84a70
9bb262a
ce37ba1
67b118d
cbc6d9f
014c807
35da071
3bef381
4fe47e0
16f30e9
5be2a99
8f41cd5
4a6229a
4f044b5
45570bc
13ab426
5e11419
01a44ba
6447c77
a7adfbc
c99ee45
6b7c7f3
d1a6688
50dadb3
838e0b2
9eedf93
4d880c5
7cf068b
695d074
c12c427
5da6811
1b290b1
52df759
5318fed
f66c476
8076fee
9a90e9f
6581ca4
a1a964d
6086855
9f05b09
77bc71d
ac1dfe9
d8b3160
2ff2a2b
c4f939c
10d6a20
854db28
eccb75b
5de815a
41e1912
54800e2
2362f8f
1aa255c
c2fea76
6b3f4dc
449e38c
f4e204b
2f7ec40
9d25f13
6997420
4d68595
7dabecf
4c1303a
7787011
ab914d5
4d886b4
f7db06a
2b30ec9
2458acc
00b70da
7990ade
eb33964
fdfec1d
28db5a4
727e442
1a1ae4b
c2c23f8
b1030c9
1233c91
4881a8c
59c81d5
438ff11
54df96f
8def9ac
ccabf79
0a38455
02cb68f
5019f26
078ff3a
180a391
3313417
7c8c13c
530073c
28937ab
0dc252d
72fef03
0c32afa
c040e8d
b94f3aa
c6e5be2
db47202
352ae56
9dca2bb
d0bb622
8b6a171
ba3a2a1
a279b70
faa2190
dadbcc4
64772e0
cd486de
1d3f007
0478509
e963124
164657c
023ba09
2d26373
9e3fe77
87558ce
e8ace06
aba8b9f
2eb07d0
049b205
e6d5bdb
7a66355
2539105
af6956c
82094b9
f45aced
d840eac
9dcbb36
aa25afd
1882055
6d935f0
5f1b12e
c7857ed
5f7b929
814802f
7266ac4
f9fecdd
105f31e
f733b80
245a5c3
ff9a764
823eb17
3000b13
739dd23
1d2a98b
a7e7498
964cb6f
9352fbb
84d9d65
3268c04
ea7306e
f5b053f
4a3fe0c
45d31b9
63f6e5d
b7bf003
77ffc6f
9039933
7de7230
19cc442
e2fe381
4d0e17a
9125eb8
6387527
d7ad064
28c238e
ea086a3
692e46a
e1a45a6
3d221a2
5416754
b9e482e
dba1f65
2342c1a
f520d94
5e6dd1a
5a35400
9fff3af
4c5e8b2
888426a
0164950
0e7d0d6
a5ace97
e5290e1
11316dd
4c3f18e
a2a273a
f41bc9e
705f791
1ddcc56
b05f73e
3153185
1627fa7
6134022
b57ea5e
edb3a5b
7d89447
aa2b1c6
dee0bb0
5d94f1c
9fd0d67
4bea022
2c94a82
32f3073
3baa745
c9414c7
0ff2e75
4d10391
d4031c6
1ba4da9
a6e06bc
834f2e6
ec2ef30
c8f9a5f
750f95f
f30869e
116b6fd
9316fa1
367d7bb
5cb9f84
28778fb
a45760d
ff301cf
2a45052
d9b8d91
be0c0f3
9c5312f
6b7ad46
ba33dfe
071852d
b945ad2
b72fa85
735f9b3
83a3542
89ce885
2625c76
74e9110
da13496
217ade2
62d36f3
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
This file was deleted.
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,292 @@ | ||
--- | ||
title: Quickstart for dbt Core using DuckDB | ||
id: duckdb | ||
description: "Learn to use dbt Core using DuckDB." | ||
hoverSnippet: "Learn to use dbt Core using DuckDB." | ||
platform: 'dbt-core' | ||
icon: 'duckdb-seeklogo' | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. |
||
level: 'Beginner' | ||
hide_table_of_contents: true | ||
tags: ['dbt Core','Quickstart'] | ||
--- | ||
|
||
<div style={{maxWidth: '900px'}}> | ||
|
||
## Introduction | ||
|
||
In this quickstart guide, you'll learn how to use dbt Core with DuckDB, enabling you to get set up quickly and efficiently. [DuckDB](https://duckdb.org/) is an open-source database management system which is designed for analytical workloads. It is designed to provide fast and easy access to large datasets, making it well-suited for data analytics tasks. | ||
|
||
|
||
This guide will demonstrate how to: | ||
|
||
- [Create a virtual development environment](/docs/core/pip-install#using-virtual-environments) using a template provided by dbt Labs. | ||
- This sets up a fully functional dbt environment with an operational and executable project. The codespace automatically connects to the DuckDB database and loads a year's worth of data from our fictional Jaffle Shop café, which sells food and beverages in several US cities. | ||
- For additional information, refer to the [README](https://github.com/dbt-labs/jaffle_shop_duckdb/blob/duckdb/README.md) for the Jaffle Shop template. It includes instructions on how to do this, along with animated GIFs. | ||
- Run any dbt command from the environment’s terminal. | ||
- Generate a larger dataset for the Jaffle Shop café (for example, five years of data instead of just one). | ||
|
||
You can learn more through high-quality [dbt Learn courses and workshops](https://learn.getdbt.com). | ||
|
||
|
||
### Related content | ||
|
||
|
||
- [DuckDB setup](/docs/core/connect-data-platform/duckdb-setup) | ||
- [Create a GitHub repository](/guides/manual-install?step=2) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. the rest of these steps link to a diff quickstart, not sure if that's intentional? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Hiya @mirnawong1 Yep - this was intentional. The DuckDB one was to point users to the DuckDB setup page for general info on DuckDB and the GitHub one links users on the page in the DuckDB QS on how to create a repo (if they don't have one setup already. I can remove these / change the links if you think it's best. Kind Regards |
||
- [Build your first models](/guides/manual-install?step=3) | ||
- [Test and document your project](/guides/manual-install?step=4) | ||
|
||
|
||
## Prerequisites | ||
|
||
- When using DuckDB with dbt Core, you'll need to use the dbt command-line interface (CLI). Currently, DuckDB is not supported in dbt Cloud. | ||
- It's important that you know some basics of the terminal. In particular, you should understand `cd`, `ls` , and `pwd` to navigate through the directory structure of your computer easily. | ||
- You have a [GitHub account](https://github.com/join). | ||
|
||
## Set up DuckDB for dbt Core | ||
|
||
This section will provide a step-by-step guide for setting up DuckDB for use in local (Mac and Windows) environments and web browsers. | ||
|
||
In the repository, there's a [`requirements.txt`](https://github.com/dbt-labs/jaffle_shop_duckdb/blob/duckdb/requirements.txt) file which is used to install dbt Core, DuckDB, and all other necessary dependencies. You can check this file to see what will be installed on your machine. It's typically located in the root directory of your project. | ||
|
||
The `requirements.txt` file is placed at the top level of your dbt project directory, alongside other key files like `dbt_project.yml`: | ||
|
||
|
||
```shell | ||
|
||
/my_dbt_project/ | ||
├── dbt_project.yml | ||
├── models/ | ||
│ ├── my_model.sql | ||
├── tests/ | ||
│ ├── my_test.sql | ||
└── requirements.txt | ||
|
||
``` | ||
|
||
For more information on the setup of DuckDB, you can refer to [DuckDB setup](/docs/core/connect-data-platform/duckdb-setup). | ||
|
||
<Tabs> | ||
nataliefiann marked this conversation as resolved.
Show resolved
Hide resolved
|
||
<TabItem value="local" label="Local"> | ||
|
||
|
||
1. First, [clone](https://git-scm.com/docs/git-clone) the Jaffle Shop git repository by running the following command in your terminal: | ||
nataliefiann marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
|
||
|
||
```bash | ||
nataliefiann marked this conversation as resolved.
Show resolved
Hide resolved
|
||
git clone https://github.com/dbt-labs/jaffle_shop_duckdb.git | ||
|
||
``` | ||
nataliefiann marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
nataliefiann marked this conversation as resolved.
Show resolved
Hide resolved
|
||
2. Change into the docs-duckdb directory from the command line: | ||
|
||
```shell | ||
|
||
cd jaffle_shop_duck_db | ||
|
||
``` | ||
|
||
|
||
3. Install dbt Core and DuckDB in a virtual environment. | ||
|
||
<Expandable alt_header="Example for Mac" > | ||
|
||
```shell | ||
|
||
python3 -m venv venv | ||
source venv/bin/activate | ||
python3 -m pip install --upgrade pip | ||
python3 -m pip install -r requirements.txt | ||
nataliefiann marked this conversation as resolved.
Show resolved
Hide resolved
|
||
source venv/bin/activate | ||
|
||
``` | ||
</Expandable> | ||
|
||
<Expandable alt_header="Example for Windows" > | ||
|
||
```shell | ||
|
||
python -m venv venv | ||
venv\Scripts\activate.bat | ||
python -m pip install --upgrade pip | ||
python -m pip install -r requirements.txt | ||
venv\Scripts\activate.bat | ||
|
||
``` | ||
|
||
</Expandable> | ||
|
||
<Expandable alt_header="Example for Windows PowerShell" > | ||
|
||
```shell | ||
|
||
python -m venv venv | ||
venv\Scripts\Activate.ps1 | ||
python -m pip install --upgrade pip | ||
python -m pip install -r requirements.txt | ||
venv\Scripts\Activate.ps1 | ||
|
||
``` | ||
</Expandable> | ||
|
||
|
||
4. Ensure your profile is setup correctly from the command line by running the following [dbt commands](/reference/dbt-commands). | ||
|
||
|
||
- [dbt compile](/reference/commands/compile) — generates executable SQL from your project source files | ||
- [dbt run](https://docs.getdbt.com/reference/commands/run) — compiles and runs your project | ||
- [dbt test](https://docs.getdbt.com/reference/commands/test) — compiles and tests your project | ||
- [dbt build](https://docs.getdbt.com/reference/commands/build) — compiles, runs, and tests your project | ||
- [dbt docs generate](/reference/commands/cmd-docs#dbt-docs-generate) — generates your project's documentation. | ||
- [dbt docs serve](/reference/commands/cmd-docs#dbt-docs-serve) — starts a webserver on port 8080 to serve your documentation locally and opens the documentation site in your default browser. | ||
For complete details, refer to the [dbt command reference](/reference/dbt-commands). | ||
Here's what a successful output will look like: | ||
nataliefiann marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
```jinja | ||
|
||
(venv) ➜ jaffle_shop_duckdb git:(duckdb) dbt build | ||
15:10:12 Running with dbt=1.8.1 | ||
15:10:13 Registered adapter: duckdb=1.8.1 | ||
15:10:13 Found 5 models, 3 seeds, 20 data tests, 416 macros | ||
15:10:13 | ||
15:10:14 Concurrency: 24 threads (target='dev') | ||
15:10:14 | ||
15:10:14 1 of 28 START seed file main.raw_customers ..................................... [RUN] | ||
15:10:14 2 of 28 START seed file main.raw_orders ........................................ [RUN] | ||
15:10:14 3 of 28 START seed file main.raw_payments ...................................... [RUN] | ||
.... | ||
|
||
15:10:15 27 of 28 PASS relationships_orders_customer_id__customer_id__ref_customers_ .... [PASS in 0.32s] | ||
15:10:15 | ||
15:10:15 Finished running 3 seeds, 3 view models, 20 data tests, 2 table models in 0 hours 0 minutes and 1.52 seconds (1.52s). | ||
15:10:15 | ||
15:10:15 Completed successfully | ||
15:10:15 | ||
15:10:15 Done. PASS=28 WARN=0 ERROR=0 SKIP=0 TOTAL=28 | ||
nataliefiann marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
nataliefiann marked this conversation as resolved.
Show resolved
Hide resolved
|
||
``` | ||
To query data, some useful commands you can run from the command line: | ||
|
||
nataliefiann marked this conversation as resolved.
Show resolved
Hide resolved
|
||
- [`dbt show`](/reference/commands/show) — run a query against the data warehouse and preview the results in the terminal. | ||
- [`dbt source`](/reference/commands/source) — provides subcommands such as [`dbt source freshness`](/reference/commands/source#dbt-source-freshness) that are useful when working with source data. | ||
- `dbt source freshness` — checks the freshness (how up to date) a specific source table is. | ||
|
||
nataliefiann marked this conversation as resolved.
Show resolved
Hide resolved
|
||
:::note | ||
|
||
The steps will fail if you decide to run this project in your data warehouse (outside of this DuckDB demo). You will need to reconfigure the project files for your warehouse. Definitely consider this if you are using a community-contributed adapter. | ||
nataliefiann marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
::: | ||
|
||
|
||
### Troubleshoot | ||
|
||
<Expandable alt_header="Could not set lock on file error" > | ||
|
||
```Jinja | ||
|
||
IO Error: Could not set lock on file "jaffle_shop.duckdb": Resource temporarily unavailable | ||
|
||
``` | ||
|
||
This is a known issue in DuckDB. Try disconnecting from any sessions that are locking the database. If you are using DBeaver, this means shutting down DBeaver (disconnecting doesn't always work). | ||
|
||
As a last resort, deleting the database file will get you back in action (_but_ you will lose all your data). | ||
|
||
</Expandable> | ||
|
||
|
||
</TabItem> | ||
|
||
<TabItem value="web" label="Web browser"> | ||
|
||
1. Go to the `jaffle-shop-template` [repository](https://github.com/dbt-labs/jaffle_shop_duckdb) after you log in to your GitHub account. | ||
nataliefiann marked this conversation as resolved.
Show resolved
Hide resolved
nataliefiann marked this conversation as resolved.
Show resolved
Hide resolved
|
||
1. Click **Use this template** at the top of the page and choose **Create new repository**. | ||
1. Click **Create repository from template** when you’re done setting the options for your new repository. | ||
nataliefiann marked this conversation as resolved.
Show resolved
Hide resolved
|
||
1. Click **Code** (at the top of the new repository’s page). Under the **Codespaces** tab, choose **Create codespace on main**. Depending on how you've configured your computer's settings, this either opens a new browser tab with the Codespace development environment with VSCode running in it or opens a new VSCode window with the codespace in it. | ||
1. Wait for the codespace to finish building by waiting for the `postCreateCommand` command to complete; this can take several minutes: | ||
|
||
<Lightbox src="/img/codespace-quickstart/postCreateCommand.png" title="Wait for postCreateCommand to complete" /> | ||
|
||
When this command completes, you can start using the codespace development environment. The terminal the command ran in will close and you will get a prompt in a brand new terminal. | ||
|
||
1. At the terminal's prompt, you can execute any dbt command you want. For example: | ||
|
||
```shell | ||
/workspaces/test (main) $ dbt build | ||
``` | ||
|
||
You can also use the [duckcli](https://duckdb.org/docs/api/cli/overview.html) to write SQL against the warehouse from the command line or build reports in the [Evidence](https://evidence.dev/) project provided in the `reports` directory. | ||
|
||
For complete information, refer to the [dbt command reference](https://docs.getdbt.com/reference/dbt-commands). Common commands are: | ||
|
||
- [dbt compile](/reference/commands/compile) — generates executable SQL from your project source files | ||
- [dbt run](https://docs.getdbt.com/reference/commands/run) — compiles and runs your project | ||
- [dbt test](https://docs.getdbt.com/reference/commands/test) — compiles and tests your project | ||
- [dbt build](https://docs.getdbt.com/reference/commands/build) — compiles, runs, and tests your project | ||
|
||
|
||
</TabItem> | ||
nataliefiann marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
</Tabs> | ||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
## Generate a larger data set | ||
|
||
If you'd like to work with a larger selection of Jaffle Shop data, you can generate an arbitrary number of years of fictitious data from within your codespace. | ||
|
||
1. Install the Python package called [jafgen](https://pypi.org/project/jafgen/). At the terminal's prompt, run: | ||
|
||
```shell | ||
python -m pip install jafgen | ||
``` | ||
|
||
1. When installation is done, run: | ||
```shell | ||
jafgen [number of years to generate] # e.g. jafgen 6 | ||
``` | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. |
||
Replace `NUMBER_OF_YEARS` with the number of years you want to simulate. This command builds the CSV files and stores them in the `jaffle-data` folder, and is automatically sourced based on the `sources.yml` file and the [dbt-duckdb](/docs/core/connect-data-platform/duckdb-setup) adapter. | ||
|
||
As you increase the number of years, it takes exponentially more time to generate the data because the Jaffle Shop stores grow in size and number. For a good balance of data size and time to build, dbt Labs suggests a maximum of 6 years. | ||
## Next steps | ||
|
||
Now that you've got dbt Core, DuckDB, and the Jaffle Shop data up and running, you can explore dbt's capabilities. Refer to these materials to get a better understanding of dbt projects and commands: | ||
|
||
- The [About projects](/docs/build/projects) page guides you through the structure of a dbt project and its components. | ||
- [dbt command reference](/reference/dbt-commands) explains the various commands available and what they do. | ||
- [dbt Labs courses](https://courses.getdbt.com/collections) offer a variety of beginner, intermediate, and advanced learning modules designed to help you become a dbt expert. | ||
- Once you see the potential of dbt and what it can do for your organization, sign up for a free trial of [dbt Cloud](https://www.getdbt.com/signup). It's the fastest and easiest way to deploy dbt today! | ||
- Check out the other [quickstart guides](/quickstarts) to begin integrating into your existing data warehouse. | ||
nataliefiann marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
Additionally, with your new understanding of the basics of using dbt Core with DuckDB, consider optimizing your setup by documenting your project, committing your changes and, scheduling a job. | ||
|
||
### Document your project | ||
|
||
|
||
nataliefiann marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
### Commit your changes | ||
|
||
Commit your changes to ensure the repository is up to date with the latest code. | ||
|
||
In the GitHub repository you created for your project, run the following commands in the terminal. | ||
|
||
```shell | ||
git add | ||
git commit -m "Your commit message" | ||
git push | ||
|
||
nataliefiann marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
### Schedule a job | ||
nataliefiann marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
1. Ensure dbt Core is installed and configured to connect to your DuckDB instance. | ||
2. Create a dbt project and define your [`models`](/docs/build/models), [`seeds`](/reference/seed-properties), and [`tests`](/reference/commands/test). | ||
3. Use a scheduler such [Prefect](/docs/deploy/deployment-tools#prefect) to schedule your dbt runs. You can create a DAG (Directed Acyclic Graph) that triggers dbt commands at specified intervals. | ||
4. Write a script that runs your dbt commands, such as [`dbt run`](/reference/commands/run), `dbt test` and more so. | ||
5. Use your chosen scheduler to run the script at your desired frequency. | ||
|
||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. curious if we want to add add'l steps to this quickstart guide, similar to the core install? e.g.:
nataliefiann marked this conversation as resolved.
Show resolved
Hide resolved
|
||
</div> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just a heads up the duckdb icon isn't appear here:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is the icon name correct? or is it
duckdb-seeklogo
like in the guides?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hiya @mirnawong1
Thanks for checking this out. I saved the file under: duckdb-seeklogo.svg in the light and dark sections of vscode.
Kind Regards
Natalie