+
+
+1. Create a new project in dbt Cloud. From **Account settings** (using the gear menu in the top right corner), click **+ New Project**.
Click **Run**, then check for results from the queries. For example:
@@ -86,28 +214,53 @@ In order to let dbt connect to your warehouse, you'll need to generate a keyfile
## Connect dbt Cloud to BigQuery
1. Create a new project in [dbt Cloud](/docs/cloud/about-cloud/access-regions-ip-addresses). From **Account settings** (using the gear menu in the top right corner), click **+ New Project**.
+
2. Enter a project name and click **Continue**.
-3. For the warehouse, click **BigQuery** then **Next** to set up your connection.
-4. Click **Upload a Service Account JSON File** in settings.
-5. Select the JSON file you downloaded in [Generate BigQuery credentials](#generate-bigquery-credentials) and dbt Cloud will fill in all the necessary fields.
-6. Click **Test Connection**. This verifies that dbt Cloud can access your BigQuery account.
-7. Click **Next** if the test succeeded. If it failed, you might need to go back and regenerate your BigQuery credentials.
+3. For the warehouse, click **Snowflake** then **Next** to set up your connection.
+
+
+
+4. Enter your **Settings** for Snowflake with:
+ * **Account** — Find your account by using the Snowflake trial account URL and removing `snowflakecomputing.com`. The order of your account information will vary by Snowflake version. For example, Snowflake's Classic console URL might look like: `oq65696.west-us-2.azure.snowflakecomputing.com`. The AppUI or Snowsight URL might look more like: `snowflakecomputing.com/west-us-2.azure/oq65696`. In both examples, your account will be: `oq65696.west-us-2.azure`. For more information, see [Account Identifiers](https://docs.snowflake.com/en/user-guide/admin-account-identifier.html) in the Snowflake docs.
+
+
+
+ * **Role** — Leave blank for now. You can update this to a default Snowflake role later.
+ * **Database** — `analytics`. This tells dbt to create new models in the analytics database.
+ * **Warehouse** — `transforming`. This tells dbt to use the transforming warehouse that was created earlier.
+
+
+5. Enter your **Development Credentials** for Snowflake with:
+ * **Username** — The username you created for Snowflake. The username is not your email address and is usually your first and last name together in one word.
+ * **Password** — The password you set when creating your Snowflake account.
+ * **Schema** — You’ll notice that the schema name has been auto created for you. By convention, this is `dbt_`. This is the schema connected directly to your development environment, and it's where your models will be built when running dbt within the Cloud IDE.
+ * **Target name** — Leave as the default.
+ * **Threads** — Leave as 4. This is the number of simultaneous connects that dbt Cloud will make to build models concurrently.
+
+
+
+6. Click **Test Connection**. This verifies that dbt Cloud can access your Snowflake account.
+7. If the connection test succeeds, click **Next**. If it fails, you may need to check your Snowflake settings and credentials.
+
+
+
## Set up a dbt Cloud managed repository
-
+If you used Partner Connect, you can skip to [initializing your dbt project](#initialize-your-dbt-project-and-start-developing) as the Partner Connect provides you with a managed repository. Otherwise, you will need to create your repository connection.
+
## Initialize your dbt project and start developing
Now that you have a repository configured, you can initialize your project and start development in dbt Cloud:
1. Click **Start developing in the IDE**. It might take a few minutes for your project to spin up for the first time as it establishes your git connection, clones your repo, and tests the connection to the warehouse.
-2. Above the file tree to the left, click **Initialize dbt project**. This builds out your folder structure with example models.
-3. Make your initial commit by clicking **Commit and sync**. Use the commit message `initial commit` and click **Commit**. This creates the first commit to your managed repo and allows you to open a branch where you can add new dbt code.
+2. Above the file tree to the left, click **Initialize your project**. This builds out your folder structure with example models.
+3. Make your initial commit by clicking **Commit and sync**. Use the commit message `initial commit`. This creates the first commit to your managed repo and allows you to open a branch where you can add new dbt code.
4. You can now directly query data from your warehouse and execute `dbt run`. You can try this out now:
- - Click **+ Create new file**, add this query to the new file, and click **Save as** to save the new file:
+ - Click **+ Create new file**, add this query to the new file, and click **Save as** to save the new file:
```sql
- select * from `dbt-tutorial.jaffle_shop.customers`
+ select * from raw.jaffle_shop.customers
```
- In the command line bar at the bottom, enter `dbt run` and click **Enter**. You should see a `dbt run succeeded` message.
@@ -124,7 +277,6 @@ Name the new branch `add-customers-model`.
2. Name the file `customers.sql`, then click **Create**.
3. Copy the following query into the file and click **Save**.
-
```sql
with customers as (
@@ -133,7 +285,7 @@ with customers as (
first_name,
last_name
- from `dbt-tutorial`.jaffle_shop.customers
+ from raw.jaffle_shop.customers
),
@@ -145,7 +297,7 @@ orders as (
order_date,
status
- from `dbt-tutorial`.jaffle_shop.orders
+ from raw.jaffle_shop.orders
),
@@ -187,14 +339,6 @@ select * from final
Later, you can connect your business intelligence (BI) tools to these views and tables so they only read cleaned up data rather than raw data in your BI tool.
-#### FAQs
-
-
-
-
-
-
-
## Change the way your model is materialized
@@ -218,7 +362,7 @@ Later, you can connect your business intelligence (BI) tools to these views and
first_name,
last_name
- from `dbt-tutorial`.jaffle_shop.customers
+ from raw.jaffle_shop.customers
```
@@ -232,7 +376,7 @@ Later, you can connect your business intelligence (BI) tools to these views and
order_date,
status
- from `dbt-tutorial`.jaffle_shop.orders
+ from raw.jaffle_shop.orders
```
@@ -295,16 +439,80 @@ Later, you can connect your business intelligence (BI) tools to these views and
This time, when you performed a `dbt run`, separate views/tables were created for `stg_customers`, `stg_orders` and `customers`. dbt inferred the order to run these models. Because `customers` depends on `stg_customers` and `stg_orders`, dbt builds `customers` last. You do not need to explicitly define these dependencies.
-
#### FAQs {#faq-2}
-
+## Build models on top of sources
+
+Sources make it possible to name and describe the data loaded into your warehouse by your extract and load tools. By declaring these tables as sources in dbt, you can:
+- select from source tables in your models using the `{{ source() }}` function, helping define the lineage of your data
+- test your assumptions about your source data
+- calculate the freshness of your source data
+
+1. Create a new YML file `models/sources.yml`.
+2. Declare the sources by copying the following into the file and clicking **Save**.
+
+
+
+ ```yml
+ version: 2
+
+ sources:
+ - name: jaffle_shop
+ description: This is a replica of the Postgres database used by our app
+ database: raw
+ schema: jaffle_shop
+ tables:
+ - name: customers
+ description: One record per customer.
+ - name: orders
+ description: One record per order. Includes cancelled and deleted orders.
+ ```
+
+
+
+3. Edit the `models/stg_customers.sql` file to select from the `customers` table in the `jaffle_shop` source.
+
+
+
+ ```sql
+ select
+ id as customer_id,
+ first_name,
+ last_name
+
+ from {{ source('jaffle_shop', 'customers') }}
+ ```
+
+
+
+4. Edit the `models/stg_orders.sql` file to select from the `orders` table in the `jaffle_shop` source.
+
+
+
+ ```sql
+ select
+ id as order_id,
+ user_id as customer_id,
+ order_date,
+ status
+
+ from {{ source('jaffle_shop', 'orders') }}
+ ```
+
+
+
+5. Execute `dbt run`.
+
+ The results of your `dbt run` will be exactly the same as the previous step. Your `stg_customers` and `stg_orders`
+ models will still query from the same raw data source in Snowflake. By using `source`, you can
+ test and document your raw data and also understand the lineage of your sources.
+
+
-
diff --git a/website/docs/reference/resource-configs/fabric-configs.md b/website/docs/reference/resource-configs/fabric-configs.md
index 8ab0a63a644..094f1cc8e1d 100644
--- a/website/docs/reference/resource-configs/fabric-configs.md
+++ b/website/docs/reference/resource-configs/fabric-configs.md
@@ -3,103 +3,910 @@ title: "Microsoft Fabric DWH configurations"
id: "fabric-configs"
---
-## Materializations
+
-Ephemeral materialization is not supported due to T-SQL not supporting nested CTEs. It may work in some cases when you're working with very simple ephemeral models.
+## Use `project` and `dataset` in configurations
-### Tables
+- `schema` is interchangeable with the BigQuery concept `dataset`
+- `database` is interchangeable with the BigQuery concept of `project`
-Tables are default materialization.
+For our reference documentation, you can declare `project` in place of `database.`
+This will allow you to read and write from multiple BigQuery projects. Same for `dataset`.
+
+## Using table partitioning and clustering
+
+### Partition clause
+
+BigQuery supports the use of a [partition by](https://cloud.google.com/bigquery/docs/data-definition-language#specifying_table_partitioning_options) clause to easily partition a by a column or expression. This option can help decrease latency and cost when querying large tables. Note that partition pruning [only works](https://cloud.google.com/bigquery/docs/querying-partitioned-tables#pruning_limiting_partitions) when partitions are filtered using literal values (so selecting partitions using a won't improve performance).
+
+The `partition_by` config can be supplied as a dictionary with the following format:
+
+```python
+{
+ "field": "",
+ "data_type": "",
+ "granularity": ""
+
+ # Only required if data_type is "int64"
+ "range": {
+ "start": ,
+ "end": ,
+ "interval":
+ }
+}
+```
+
+#### Partitioning by a date or timestamp
+
+When using a `datetime` or `timestamp` column to partition data, you can create partitions with a granularity of hour, day, month, or year. A `date` column supports granularity of day, month and year. Daily partitioning is the default for all column types.
+
+If the `data_type` is specified as a `date` and the granularity is day, dbt will supply the field as-is
+when configuring table partitioning.
+ defaultValue="source"
+ values={[
+ { label: 'Source code', value: 'source', },
+ { label: 'Compiled code', value: 'compiled', },
+ ]
+}>
+
-
+
-
+```sql
+{{ config(
+ materialized='table',
+ partition_by={
+ "field": "created_at",
+ "data_type": "timestamp",
+ "granularity": "day"
+ }
+)}}
+
+select
+ user_id,
+ event_name,
+ created_at
+
+from {{ ref('events') }}
+```
+
+
+
+
+
+
+
+
+```sql
+create table `projectname`.`analytics`.`bigquery_table`
+partition by timestamp_trunc(created_at, day)
+as (
+
+ select
+ user_id,
+ event_name,
+ created_at
+
+ from `analytics`.`events`
+
+)
+```
+
+
+
+
+
+
+#### Partitioning by an "ingestion" date or timestamp
+
+BigQuery supports an [older mechanism of partitioning](https://cloud.google.com/bigquery/docs/partitioned-tables#ingestion_time) based on the time when each row was ingested. While we recommend using the newer and more ergonomic approach to partitioning whenever possible, for very large datasets, there can be some performance improvements to using this older, more mechanistic approach. [Read more about the `insert_overwrite` incremental strategy below](#copying-ingestion-time-partitions).
+
+dbt will always instruct BigQuery to partition your table by the values of the column specified in `partition_by.field`. By configuring your model with `partition_by.time_ingestion_partitioning` set to `True`, dbt will use that column as the input to a `_PARTITIONTIME` pseudocolumn. Unlike with newer column-based partitioning, you must ensure that the values of your partitioning column match exactly the time-based granularity of your partitions.
+
+
+
+
+
+
+```sql
+{{ config(
+ materialized="incremental",
+ partition_by={
+ "field": "created_date",
+ "data_type": "timestamp",
+ "granularity": "day",
+ "time_ingestion_partitioning": true
+ }
+) }}
+
+select
+ user_id,
+ event_name,
+ created_at,
+ -- values of this column must match the data type + granularity defined above
+ timestamp_trunc(created_at, day) as created_date
+
+from {{ ref('events') }}
+```
+
+
+
+
+
+
+
+
+```sql
+create table `projectname`.`analytics`.`bigquery_table` (`user_id` INT64, `event_name` STRING, `created_at` TIMESTAMP)
+partition by timestamp_trunc(_PARTITIONTIME, day);
+
+insert into `projectname`.`analytics`.`bigquery_table` (_partitiontime, `user_id`, `event_name`, `created_at`)
+select created_date as _partitiontime, * EXCEPT(created_date) from (
+ select
+ user_id,
+ event_name,
+ created_at,
+ -- values of this column must match granularity defined above
+ timestamp_trunc(created_at, day) as created_date
+
+ from `projectname`.`analytics`.`events`
+);
+```
+
+
+
+
+
+
+#### Partitioning With Integer Buckets
+
+If the `data_type` is specified as `int64`, then a `range` key must also
+be provided in the `partition_by` dict. dbt will use the values provided in
+the `range` dict to generate the partitioning clause for the table.
+
+
+
+
+
+
+```sql
+{{ config(
+ materialized='table',
+ partition_by={
+ "field": "user_id",
+ "data_type": "int64",
+ "range": {
+ "start": 0,
+ "end": 100,
+ "interval": 10
+ }
+ }
+)}}
+
+select
+ user_id,
+ event_name,
+ created_at
+
+from {{ ref('events') }}
+```
+
+
+
+
+
+
+
+
+```sql
+create table analytics.bigquery_table
+partition by range_bucket(
+ customer_id,
+ generate_array(0, 100, 10)
+)
+as (
+
+ select
+ user_id,
+ event_name,
+ created_at
+
+ from analytics.events
+
+)
+```
+
+
+
+
+
+
+#### Additional partition configs
+
+If your model has `partition_by` configured, you may optionally specify two additional configurations:
+
+- `require_partition_filter` (boolean): If set to `true`, anyone querying this model _must_ specify a partition filter, otherwise their query will fail. This is recommended for very large tables with obvious partitioning schemes, such as event streams grouped by day. Note that this will affect other dbt models or tests that try to select from this model, too.
+
+- `partition_expiration_days` (integer): If set for date- or timestamp-type partitions, the partition will expire that many days after the date it represents. E.g. A partition representing `2021-01-01`, set to expire after 7 days, will no longer be queryable as of `2021-01-08`, its storage costs zeroed out, and its contents will eventually be deleted. Note that [table expiration](#controlling-table-expiration) will take precedence if specified.
+
+
+
+```sql
+{{ config(
+ materialized = 'table',
+ partition_by = {
+ "field": "created_at",
+ "data_type": "timestamp",
+ "granularity": "day"
+ },
+ require_partition_filter = true,
+ partition_expiration_days = 7
+)}}
+
+```
+
+
+
+### Clustering Clause
+
+BigQuery tables can be [clustered](https://cloud.google.com/bigquery/docs/clustered-tables) to colocate related data.
+
+Clustering on a single column:
+
+
```sql
{{
- config(
- materialized='table'
- )
+ config(
+ materialized = "table",
+ cluster_by = "order_id",
+ )
}}
-select *
-from ...
+select * from ...
```
-
+Clustering on a multiple columns:
+
+