diff --git a/.github/pull_request_template.md b/.github/pull_request_template.md index c9b25d3b71c..309872dd818 100644 --- a/.github/pull_request_template.md +++ b/.github/pull_request_template.md @@ -12,7 +12,8 @@ Uncomment if you're publishing docs for a prerelease version of dbt (delete if n - [ ] Add versioning components, as described in [Versioning Docs](https://github.com/dbt-labs/docs.getdbt.com/blob/current/contributing/single-sourcing-content.md#versioning-entire-pages) - [ ] Add a note to the prerelease version [Migration Guide](https://github.com/dbt-labs/docs.getdbt.com/tree/current/website/docs/docs/dbt-versions/core-upgrade) --> -- [ ] Review the [Content style guide](https://github.com/dbt-labs/docs.getdbt.com/blob/current/contributing/content-style-guide.md) and [About versioning](https://github.com/dbt-labs/docs.getdbt.com/blob/current/contributing/single-sourcing-content.md#adding-a-new-version) so my content adheres to these guidelines. +- [ ] Review the [Content style guide](https://github.com/dbt-labs/docs.getdbt.com/blob/current/contributing/content-style-guide.md) so my content adheres to these guidelines. +- [ ] For [docs versioning](https://github.com/dbt-labs/docs.getdbt.com/blob/current/contributing/single-sourcing-content.md#about-versioning), review how to [version a whole page](https://github.com/dbt-labs/docs.getdbt.com/blob/current/contributing/single-sourcing-content.md#adding-a-new-version) and [version a block of content](https://github.com/dbt-labs/docs.getdbt.com/blob/current/contributing/single-sourcing-content.md#versioning-blocks-of-content). - [ ] Add a checklist item for anything that needs to happen before this PR is merged, such as "needs technical review" or "change base branch." Adding new pages (delete if not applicable): @@ -22,4 +23,4 @@ Adding new pages (delete if not applicable): Removing or renaming existing pages (delete if not applicable): - [ ] Remove page from `website/sidebars.js` - [ ] Add an entry `website/static/_redirects` -- [ ] [Ran link testing](https://github.com/dbt-labs/docs.getdbt.com#running-the-cypress-tests-locally) to update the links that point to the deleted page +- [ ] Run link testing locally with `npm run build` to update the links that point to the deleted page diff --git a/.github/workflows/asana-connection.yml b/.github/workflows/asana-connection.yml new file mode 100644 index 00000000000..aced477bdac --- /dev/null +++ b/.github/workflows/asana-connection.yml @@ -0,0 +1,17 @@ +name: Show PR Status in Asana +on: + pull_request: + types: [opened, reopened] + +jobs: + create-asana-attachment-job: + runs-on: ubuntu-latest + name: Create pull request attachments on Asana tasks + steps: + - name: Create pull request attachments + uses: Asana/create-app-attachment-github-action@latest + id: postAttachment + with: + asana-secret: ${{ secrets.ASANA_SECRET }} + - name: Log output status + run: echo "Status is ${{ steps.postAttachment.outputs.status }}" diff --git a/website/docs/dbt-cli/cli-overview.md b/website/docs/dbt-cli/cli-overview.md deleted file mode 100644 index 3e44bab801b..00000000000 --- a/website/docs/dbt-cli/cli-overview.md +++ /dev/null @@ -1,24 +0,0 @@ ---- -title: "CLI overview" -description: "Run your dbt project from the command line." ---- - -dbt Core ships with a command-line interface (CLI) for running your dbt project. dbt Core and its CLI are free to use and available as an [open source project](https://github.com/dbt-labs/dbt-core). - -When using the command line, you can run commands and do other work from the current or _working directory_ on your computer. Before running the dbt project from the command line, make sure the working directory is your dbt project directory. For more details, see "[Creating a dbt project](/docs/build/projects)." - - - - -Once you verify your dbt project is your working directory, you can execute dbt commands. A full list of dbt commands can be found in the [reference section](/reference/dbt-commands). - - - -:::tip Pro tip: Using the --help flag - -Most command-line tools, including dbt, have a `--help` flag that you can use to show available commands and arguments. For example, you can use the `--help` flag with dbt in two ways: -• `dbt --help`: Lists the commands available for dbt -• `dbt run --help`: Lists the flags available for the `run` command - -::: - diff --git a/website/docs/docs/about/overview.md b/website/docs/docs/about/overview.md deleted file mode 100644 index e34866fa3fe..00000000000 --- a/website/docs/docs/about/overview.md +++ /dev/null @@ -1,53 +0,0 @@ ---- -title: "What is dbt? " -id: "overview" ---- - -dbt is a productivity tool that helps analysts get more done and produce higher quality results. - -Analysts commonly spend 50-80% of their time modeling raw data—cleaning, reshaping, and applying fundamental business logic to it. dbt empowers analysts to do this work better and faster. - -dbt's primary interface is its CLI. Using dbt is a combination of editing code in a text editor and running that code using dbt from the command line using `dbt [command] [options]`. - -# How does dbt work? - -dbt has two core workflows: building data models and testing data models. (We call any transformed of raw data a data model.) - -To create a data model, an analyst simply writes a SQL `SELECT` statement. dbt then takes that statement and builds it in the database, materializing it as either a view or a . This model can then be queried by other models or by other analytics tools. - -To test a data model, an analyst asserts something to be true about the underlying data. For example, an analyst can assert that a certain field should never be null, should always hold unique values, or should always map to a field in another table. Analysts can also write assertions that express much more customized logic, such as “debits and credits should always be equal within a given journal entry”. dbt then tests all assertions against the database and returns success or failure responses. - -# Does dbt really help me get more done? - -One dbt user has this to say: *“At this point when I have a new question, I can answer it 10-100x faster than I could before.”* Here’s how: - -- dbt allows analysts to avoid writing boilerplate and : managing transactions, dropping tables, and managing schema changes. All business logic is expressed in SQL `SELECT` statements, and dbt takes care of . -- dbt creates leverage. Instead of starting at the raw data with every analysis, analysts instead build up reusable data models that can be referenced in subsequent work. -- dbt includes optimizations for data model materialization, allowing analysts to dramatically reduce the time their queries take to run. - -There are many other optimizations in the dbt to help you work quickly: macros, hooks, and package management are all accelerators. - -# Does dbt really help me produce more reliable analysis? - -It does. Here’s how: - -- Writing SQL frequently involves a lot of copy-paste, which leads to errors when logic changes. With dbt, analysts don’t need to copy-paste. Instead, they build reusable data models that then get pulled into subsequent models and analysis. Change a model once and everything that relies on it reflects that change. -- dbt allows subject matter experts to publish the canonical version of a particular data model, encapsulating all complex business logic. All analysis on top of this model will incorporate the same business logic without needing to understand it. -- dbt plays nicely with source control. Using dbt, analysts can use mature source control processes like branching, pull requests, and code reviews. -- dbt makes it easy and fast to write functional tests on the underlying data. Many analytic errors are caused by edge cases in the data: testing helps analysts find and handle those edge cases. - -# Why SQL? - -While there are a large number of great languages for manipulating data, we’ve chosen SQL as the primary [data transformation](https://www.getdbt.com/analytics-engineering/transformation/) language at the heart of dbt. There are three reasons for this: - -1. SQL is a very widely-known language for working with data. Using SQL gives the largest-possible group of users access. -2. Modern analytic databases are extremely performant and have sophisticated optimizers. Writing data transformations in SQL allows users to describe transformations on their data but leave the execution plan to the underlying database technology. In practice, this provides excellent results with far less work on the part of the author. -3. SQL `SELECT` statements enjoy a built-in structure for describing dependencies: `FROM X` and `JOIN Y`. This results in less setup and maintenance overhead in ensuring that transforms execute in the correct order, compared to other languages and tools. - -# What databases does dbt currently support? - -See [Supported Data Platforms](/docs/supported-data-platforms) to view the full list of supported databases, warehouses, and query engines. - -# How do I get started? - -dbt is open source and completely free to download and use. See our [Getting Started guide](/docs/introduction) for more. diff --git a/website/docs/docs/build/measures.md b/website/docs/docs/build/measures.md index e06b5046976..74d37b70e94 100644 --- a/website/docs/docs/build/measures.md +++ b/website/docs/docs/build/measures.md @@ -6,19 +6,13 @@ sidebar_label: "Measures" tags: [Metrics, Semantic Layer] --- -Measures are aggregations performed on columns in your model. They can be used as final metrics or serve as building blocks for more complex metrics. Measures have several inputs, which are described in the following table along with their field types. - -| Parameter | Description | Type | -| --------- | ----------- | ---- | -| [`name`](#name) | Provide a name for the measure, which must be unique and can't be repeated across all semantic models in your dbt project. | Required | -| [`description`](#description) | Describes the calculated measure. | Optional | -| [`agg`](#aggregation) | dbt supports aggregations such as `sum`, `min`, `max`, and more. Refer to [Aggregation](/docs/build/measures#aggregation) for the full list of supported aggregation types. | Required | -| [`expr`](#expr) | You can either reference an existing column in the table or use a SQL expression to create or derive a new one. | Optional | -| [`non_additive_dimension`](#non-additive-dimensions) | Non-additive dimensions can be specified for measures that cannot be aggregated over certain dimensions, such as bank account balances, to avoid producing incorrect results. | Optional | -| `agg_params` | specific aggregation properties such as a percentile. | Optional | -| `agg_time_dimension` | The time field. Defaults to the default agg time dimension for the semantic model. | Optional | -| `label` | How the metric appears in project docs and downstream integrations. | Required | +Measures are aggregations performed on columns in your model. They can be used as final metrics or serve as building blocks for more complex metrics. +Measures have several inputs, which are described in the following table along with their field types. + +import MeasuresParameters from '/snippets/_sl-measures-parameters.md'; + + ## Measure spec diff --git a/website/docs/docs/build/semantic-models.md b/website/docs/docs/build/semantic-models.md index 99ccef237f9..09f808d7a17 100644 --- a/website/docs/docs/build/semantic-models.md +++ b/website/docs/docs/build/semantic-models.md @@ -40,17 +40,17 @@ The complete spec for semantic models is below: ```yaml semantic_models: - - name: the_name_of_the_semantic_model ## Required - description: same as always ## Optional - model: ref('some_model') ## Required - defaults: ## Required - agg_time_dimension: dimension_name ## Required if the model contains dimensions - entities: ## Required - - see more information in entities - measures: ## Optional - - see more information in measures section - dimensions: ## Required - - see more information in dimensions section + - name: the_name_of_the_semantic_model ## Required + description: same as always ## Optional + model: ref('some_model') ## Required + default: ## Required + agg_time_dimension: dimension_name ## Required if the model contains dimensions + entities: ## Required + - see more information in entities + measures: ## Optional + - see more information in measures section + dimensions: ## Required + - see more information in dimensions section primary_entity: >- if the semantic model has no primary entity, then this property is required. #Optional if a primary entity exists, otherwise Required ``` @@ -230,16 +230,14 @@ For semantic models with a measure, you must have a [primary time group](/docs/b ### Measures -[Measures](/docs/build/measures) are aggregations applied to columns in your data model. They can be used as the foundational building blocks for more complex metrics, or be the final metric itself. Measures have various parameters which are listed in a table along with their descriptions and types. +[Measures](/docs/build/measures) are aggregations applied to columns in your data model. They can be used as the foundational building blocks for more complex metrics, or be the final metric itself. + +Measures have various parameters which are listed in a table along with their descriptions and types. + +import MeasuresParameters from '/snippets/_sl-measures-parameters.md'; + + -| Parameter | Description | Field type | -| --- | --- | --- | -| `name`| Provide a name for the measure, which must be unique and can't be repeated across all semantic models in your dbt project. | Required | -| `description` | Describes the calculated measure. | Optional | -| `agg` | dbt supports the following aggregations: `sum`, `max`, `min`, `count_distinct`, and `sum_boolean`. | Required | -| `expr` | You can either reference an existing column in the table or use a SQL expression to create or derive a new one. | Optional | -| `non_additive_dimension` | Non-additive dimensions can be specified for measures that cannot be aggregated over certain dimensions, such as bank account balances, to avoid producing incorrect results. | Optional | -| `create_metric` | You can create a metric directly from a measure with `create_metric: True` and specify its display name with create_metric_display_name. Default is false. | Optional | import SetUpPages from '/snippets/_metrics-dependencies.md'; diff --git a/website/docs/docs/cloud/connect-data-platform/connect-redshift-postgresql-alloydb.md b/website/docs/docs/cloud/connect-data-platform/connect-redshift-postgresql-alloydb.md index 486aa787936..06b9dd62f1a 100644 --- a/website/docs/docs/cloud/connect-data-platform/connect-redshift-postgresql-alloydb.md +++ b/website/docs/docs/cloud/connect-data-platform/connect-redshift-postgresql-alloydb.md @@ -13,10 +13,12 @@ The following fields are required when creating a Postgres, Redshift, or AlloyDB | Port | Usually 5432 (Postgres) or 5439 (Redshift) | `5439` | | Database | The logical database to connect to and run queries against. | `analytics` | -**Note**: When you set up a Redshift or Postgres connection in dbt Cloud, SSL-related parameters aren't available as inputs. +**Note**: When you set up a Redshift or Postgres connection in dbt Cloud, SSL-related parameters aren't available as inputs. +For dbt Cloud users, please log in using the default Database username and password. Note this is because [`IAM` authentication](https://docs.aws.amazon.com/redshift/latest/mgmt/generating-user-credentials.html) is not compatible with dbt Cloud. + ### Connecting via an SSH Tunnel To connect to a Postgres, Redshift, or AlloyDB instance via an SSH tunnel, select the **Use SSH Tunnel** option when creating your connection. When configuring the tunnel, you must supply the hostname, username, and port for the [bastion server](#about-the-bastion-server-in-aws). diff --git a/website/docs/docs/collaborate/govern/model-contracts.md b/website/docs/docs/collaborate/govern/model-contracts.md index 442a20df1b6..342d86c1a77 100644 --- a/website/docs/docs/collaborate/govern/model-contracts.md +++ b/website/docs/docs/collaborate/govern/model-contracts.md @@ -125,8 +125,8 @@ Select the adapter-specific tab for more information on [constraint](/reference/ | Constraint type | Support | Platform enforcement | |:-----------------|:-------------|:---------------------| | not_null | ✅ Supported | ✅ Enforced | -| primary_key | ✅ Supported | ✅ Enforced | -| foreign_key | ✅ Supported | ✅ Enforced | +| primary_key | ✅ Supported | ❌ Not enforced | +| foreign_key | ✅ Supported | ❌ Not enforced | | unique | ❌ Not supported | ❌ Not enforced | | check | ❌ Not supported | ❌ Not enforced | diff --git a/website/docs/docs/core/connect-data-platform/redshift-setup.md b/website/docs/docs/core/connect-data-platform/redshift-setup.md index 175d5f6a715..006f026ea94 100644 --- a/website/docs/docs/core/connect-data-platform/redshift-setup.md +++ b/website/docs/docs/core/connect-data-platform/redshift-setup.md @@ -70,8 +70,9 @@ pip is the easiest way to install the adapter: The authentication methods that dbt Core supports are: - `database` — Password-based authentication (default, will be used if `method` is not provided) -- `IAM` — IAM +- `IAM` — IAM +For dbt Cloud users, log in using the default **Database username** and **password**. This is necessary because dbt Cloud does not support `IAM` authentication. Click on one of these authentication methods for further details on how to configure your connection profile. Each tab also includes an example `profiles.yml` configuration file for you to review. diff --git a/website/docs/docs/core/connect-data-platform/teradata-setup.md b/website/docs/docs/core/connect-data-platform/teradata-setup.md index 1ba8e506b88..85767edee72 100644 --- a/website/docs/docs/core/connect-data-platform/teradata-setup.md +++ b/website/docs/docs/core/connect-data-platform/teradata-setup.md @@ -61,6 +61,7 @@ pip is the easiest way to install the adapter: | dbt-teradata | dbt-core | dbt-teradata-util | dbt-util | |--------------|------------|-------------------|----------------| | 1.2.x | 1.2.x | 0.1.0 | 0.9.x or below | +| 1.6.7 | 1.6.7 | 1.1.1 | 1.1.1 |

Configuring {frontMatter.meta.pypi_package}

diff --git a/website/docs/docs/dbt-cloud-apis/project-state.md b/website/docs/docs/dbt-cloud-apis/project-state.md index a5ee71ebb1b..62136b35463 100644 --- a/website/docs/docs/dbt-cloud-apis/project-state.md +++ b/website/docs/docs/dbt-cloud-apis/project-state.md @@ -66,7 +66,7 @@ Most Discovery API use cases will favor the _applied state_ since it pertains to | Seed | Yes | Yes | Yes | Downstream | Applied & definition | | Snapshot | Yes | Yes | Yes | Upstream & downstream | Applied & definition | | Test | Yes | Yes | No | Upstream | Applied & definition | -| Exposure | No | No | No | Upstream | Applied & definition | +| Exposure | No | No | No | Upstream | Definition | | Metric | No | No | No | Upstream & downstream | Definition | | Semantic model | No | No | No | Upstream & downstream | Definition | | Group | No | No | No | Downstream | Definition | diff --git a/website/docs/docs/dbt-cloud-apis/sl-graphql.md b/website/docs/docs/dbt-cloud-apis/sl-graphql.md index f73007c9a02..b7d13d0d453 100644 --- a/website/docs/docs/dbt-cloud-apis/sl-graphql.md +++ b/website/docs/docs/dbt-cloud-apis/sl-graphql.md @@ -48,7 +48,7 @@ Authentication uses a dbt Cloud [service account tokens](/docs/dbt-cloud-apis/se {"Authorization": "Bearer "} ``` -Each GQL request also requires a dbt Cloud `environmentId`. The API uses both the service token in the header and environmentId for authentication. +Each GQL request also requires a dbt Cloud `environmentId`. The API uses both the service token in the header and `environmentId` for authentication. ### Metadata calls @@ -150,6 +150,60 @@ metricsForDimensions( ): [Metric!]! ``` +**Metric Types** + +```graphql +Metric { + name: String! + description: String + type: MetricType! + typeParams: MetricTypeParams! + filter: WhereFilter + dimensions: [Dimension!]! + queryableGranularities: [TimeGranularity!]! +} +``` + +``` +MetricType = [SIMPLE, RATIO, CUMULATIVE, DERIVED] +``` + +**Metric Type parameters** + +```graphql +MetricTypeParams { + measure: MetricInputMeasure + inputMeasures: [MetricInputMeasure!]! + numerator: MetricInput + denominator: MetricInput + expr: String + window: MetricTimeWindow + grainToDate: TimeGranularity + metrics: [MetricInput!] +} +``` + + +**Dimension Types** + +```graphql +Dimension { + name: String! + description: String + type: DimensionType! + typeParams: DimensionTypeParams + isPartition: Boolean! + expr: String + queryableGranularities: [TimeGranularity!]! +} +``` + +``` +DimensionType = [CATEGORICAL, TIME] +``` + +### Querying + **Create Dimension Values query** ```graphql @@ -205,59 +259,128 @@ query( ): QueryResult! ``` -**Metric Types** +The GraphQL API uses a polling process for querying since queries can be long-running in some cases. It works by first creating a query with a mutation, `createQuery, which returns a query ID. This ID is then used to continuously check (poll) for the results and status of your query. The typical flow would look as follows: +1. Kick off a query ```graphql -Metric { - name: String! - description: String - type: MetricType! - typeParams: MetricTypeParams! - filter: WhereFilter - dimensions: [Dimension!]! - queryableGranularities: [TimeGranularity!]! +mutation { + createQuery( + environmentId: 123456 + metrics: [{name: "order_total"}] + groupBy: [{name: "metric_time"}] + ) { + queryId # => Returns 'QueryID_12345678' + } } ``` - -``` -MetricType = [SIMPLE, RATIO, CUMULATIVE, DERIVED] +2. Poll for results +```graphql +{ + query(environmentId: 123456, queryId: "QueryID_12345678") { + sql + status + error + totalPages + jsonResult + arrowResult + } +} ``` +3. Keep querying 2. at an appropriate interval until status is `FAILED` or `SUCCESSFUL` + +### Output format and pagination + +**Output format** + +By default, the output is in Arrow format. You can switch to JSON format using the following parameter. However, due to performance limitations, we recommend using the JSON parameter for testing and validation. The JSON received is a base64 encoded string. To access it, you can decode it using a base64 decoder. The JSON is created from pandas, which means you can change it back to a dataframe using `pandas.read_json(json, orient="table")`. Or you can work with the data directly using `json["data"]`, and find the table schema using `json["schema"]["fields"]`. Alternatively, you can pass `encoded:false` to the jsonResult field to get a raw JSON string directly. -**Metric Type parameters** ```graphql -MetricTypeParams { - measure: MetricInputMeasure - inputMeasures: [MetricInputMeasure!]! - numerator: MetricInput - denominator: MetricInput - expr: String - window: MetricTimeWindow - grainToDate: TimeGranularity - metrics: [MetricInput!] +{ + query(environmentId: BigInt!, queryId: Int!, pageNum: Int! = 1) { + sql + status + error + totalPages + arrowResult + jsonResult(orient: PandasJsonOrient! = TABLE, encoded: Boolean! = true) + } } ``` +The results default to the table but you can change it to any [pandas](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.to_json.html) supported value. -**Dimension Types** +**Pagination** -```graphql -Dimension { - name: String! - description: String - type: DimensionType! - typeParams: DimensionTypeParams - isPartition: Boolean! - expr: String - queryableGranularities: [TimeGranularity!]! +By default, we return 1024 rows per page. If your result set exceeds this, you need to increase the page number using the `pageNum` option. + +### Run a Python query + +The `arrowResult` in the GraphQL query response is a byte dump, which isn't visually useful. You can convert this byte data into an Arrow table using any Arrow-supported language. Refer to the following Python example explaining how to query and decode the arrow result: + + +```python +import base64 +import pyarrow as pa +import time + +headers = {"Authorization":"Bearer "} +query_result_request = """ +{ + query(environmentId: 70, queryId: "12345678") { + sql + status + error + arrowResult + } } -``` +""" -``` -DimensionType = [CATEGORICAL, TIME] +while True: + gql_response = requests.post( + "https://semantic-layer.cloud.getdbt.com/api/graphql", + json={"query": query_result_request}, + headers=headers, + ) + if gql_response.json()["data"]["status"] in ["FAILED", "SUCCESSFUL"]: + break + # Set an appropriate interval between polling requests + time.sleep(1) + +""" +gql_response.json() => +{ + "data": { + "query": { + "sql": "SELECT\n ordered_at AS metric_time__day\n , SUM(order_total) AS order_total\nFROM semantic_layer.orders orders_src_1\nGROUP BY\n ordered_at", + "status": "SUCCESSFUL", + "error": null, + "arrowResult": "arrow-byte-data" + } + } +} +""" + +def to_arrow_table(byte_string: str) -> pa.Table: + """Get a raw base64 string and convert to an Arrow Table.""" + with pa.ipc.open_stream(base64.b64decode(res)) as reader: + return pa.Table.from_batches(reader, reader.schema) + + +arrow_table = to_arrow_table(gql_response.json()["data"]["query"]["arrowResult"]) + +# Perform whatever functionality is available, like convert to a pandas table. +print(arrow_table.to_pandas()) +""" +order_total ordered_at + 3 2023-08-07 + 112 2023-08-08 + 12 2023-08-09 + 5123 2023-08-10 +""" ``` -### Create Query examples +### Additional Create Query examples The following section provides query examples for the GraphQL API, such as how to query metrics, dimensions, where filters, and more. @@ -359,7 +482,7 @@ mutation { } ``` -**Query with Explain** +**Query with just compiling SQL** This takes the same inputs as the `createQuery` mutation. @@ -374,89 +497,3 @@ mutation { } } ``` - -### Output format and pagination - -**Output format** - -By default, the output is in Arrow format. You can switch to JSON format using the following parameter. However, due to performance limitations, we recommend using the JSON parameter for testing and validation. The JSON received is a base64 encoded string. To access it, you can decode it using a base64 decoder. The JSON is created from pandas, which means you can change it back to a dataframe using `pandas.read_json(json, orient="table")`. Or you can work with the data directly using `json["data"]`, and find the table schema using `json["schema"]["fields"]`. Alternatively, you can pass `encoded:false` to the jsonResult field to get a raw JSON string directly. - - -```graphql -{ - query(environmentId: BigInt!, queryId: Int!, pageNum: Int! = 1) { - sql - status - error - totalPages - arrowResult - jsonResult(orient: PandasJsonOrient! = TABLE, encoded: Boolean! = true) - } -} -``` - -The results default to the table but you can change it to any [pandas](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.to_json.html) supported value. - -**Pagination** - -By default, we return 1024 rows per page. If your result set exceeds this, you need to increase the page number using the `pageNum` option. - -### Run a Python query - -The `arrowResult` in the GraphQL query response is a byte dump, which isn't visually useful. You can convert this byte data into an Arrow table using any Arrow-supported language. Refer to the following Python example explaining how to query and decode the arrow result: - - -```python -import base64 -import pyarrow as pa - -headers = {"Authorization":"Bearer "} -query_result_request = """ -{ - query(environmentId: 70, queryId: "12345678") { - sql - status - error - arrowResult - } -} -""" - -gql_response = requests.post( - "https://semantic-layer.cloud.getdbt.com/api/graphql", - json={"query": query_result_request}, - headers=headers, -) - -""" -gql_response.json() => -{ - "data": { - "query": { - "sql": "SELECT\n ordered_at AS metric_time__day\n , SUM(order_total) AS order_total\nFROM semantic_layer.orders orders_src_1\nGROUP BY\n ordered_at", - "status": "SUCCESSFUL", - "error": null, - "arrowResult": "arrow-byte-data" - } - } -} -""" - -def to_arrow_table(byte_string: str) -> pa.Table: - """Get a raw base64 string and convert to an Arrow Table.""" - with pa.ipc.open_stream(base64.b64decode(res)) as reader: - return pa.Table.from_batches(reader, reader.schema) - - -arrow_table = to_arrow_table(gql_response.json()["data"]["query"]["arrowResult"]) - -# Perform whatever functionality is available, like convert to a pandas table. -print(arrow_table.to_pandas()) -""" -order_total ordered_at - 3 2023-08-07 - 112 2023-08-08 - 12 2023-08-09 - 5123 2023-08-10 -""" -``` diff --git a/website/docs/guides/dremio-lakehouse.md b/website/docs/guides/dremio-lakehouse.md new file mode 100644 index 00000000000..1c59c04d175 --- /dev/null +++ b/website/docs/guides/dremio-lakehouse.md @@ -0,0 +1,196 @@ +--- +title: Build a data lakehouse with dbt Core and Dremio Cloud +id: build-dremio-lakehouse +description: Learn how to build a data lakehouse with dbt Core and Dremio Cloud. +displayText: Build a data lakehouse with dbt Core and Dremio Cloud +hoverSnippet: Learn how to build a data lakehouse with dbt Core and Dremio Cloud +# time_to_complete: '30 minutes' commenting out until we test +platform: 'dbt-core' +icon: 'guides' +hide_table_of_contents: true +tags: ['Dremio', 'dbt Core'] +level: 'Intermediate' +recently_updated: true +--- +## Introduction + +This guide will demonstrate how to build a data lakehouse with dbt Core 1.5 or new and Dremio Cloud. You can simplify and optimize your data infrastructure with dbt's robust transformation framework and Dremio’s open and easy data lakehouse. The integrated solution empowers companies to establish a strong data and analytics foundation, fostering self-service analytics and enhancing business insights while simplifying operations by eliminating the necessity to write complex Extract, Transform, and Load (ETL) pipelines. + +### Prerequisites + +* You must have a [Dremio Cloud](https://docs.dremio.com/cloud/) account. +* You must have Python 3 installed. +* You must have dbt Core v1.5 or newer [installed](/docs/core/installation). +* You must have the Dremio adapter 1.5.0 or newer [installed and configured](/docs/core/connect-data-platform/dremio-setup) for Dremio Cloud. +* You must have basic working knowledge of Git and the command line interface (CLI). + +## Validate your environment + +Validate your environment by running the following commands in your CLI and verifying the results: + +```shell + +$ python3 --version +Python 3.11.4 # Must be Python 3 + +``` + +```shell + +$ dbt --version +Core: + - installed: 1.5.0 # Must be 1.5 or newer + - latest: 1.6.3 - Update available! + + Your version of dbt-core is out of date! + You can find instructions for upgrading here: + https://docs.getdbt.com/docs/installation + +Plugins: + - dremio: 1.5.0 - Up to date! # Must be 1.5 or newer + +``` + +## Getting started + +1. Clone the Dremio dbt Core sample project from the [GitHub repo](https://github.com/dremio-brock/DremioDBTSample/tree/master/dremioSamples). + +2. In your integrated development environment (IDE), open the relation.py file in the Dremio adapter directory: + `$HOME/Library/Python/3.9/lib/python/site-packages/dbt/adapters/dremio/relation.py` + +3. Find and update lines 51 and 52 to match the following syntax: + +```python + +PATTERN = re.compile(r"""((?:[^."']|"[^"]*"|'[^']*')+)""") +return ".".join(PATTERN.split(identifier)[1::2]) + +``` + +The complete selection should look like this: + +```python +def quoted_by_component(self, identifier, componentName): + if componentName == ComponentName.Schema: + PATTERN = re.compile(r"""((?:[^."']|"[^"]*"|'[^']*')+)""") + return ".".join(PATTERN.split(identifier)[1::2]) + else: + return self.quoted(identifier) + +``` + +You need to update this pattern because the plugin doesn’t support schema names in Dremio containing dots and spaces. + +## Build your pipeline + +1. Create a `profiles.yml` file in the `$HOME/.dbt/profiles.yml` path and add the following configs: + +```yaml + +dremioSamples: + outputs: + cloud_dev: + dremio_space: dev + dremio_space_folder: no_schema + object_storage_path: dev + object_storage_source: $scratch + pat: + cloud_host: api.dremio.cloud + cloud_project_id: + threads: 1 + type: dremio + use_ssl: true + user: + target: dev + + ``` + + 2. Execute the transformation pipeline: + + ```shell + + $ dbt run -t cloud_dev + + ``` + + If the above configurations have been implemented, the output will look something like this: + +```shell + +17:24:16 Running with dbt=1.5.0 +17:24:17 Found 5 models, 0 tests, 0 snapshots, 0 analyses, 348 macros, 0 operations, 0 seed files, 2 sources, 0 exposures, 0 metrics, 0 groups +17:24:17 +17:24:29 Concurrency: 1 threads (target='cloud_dev') +17:24:29 +17:24:29 1 of 5 START sql view model Preparation.trips .................................. [RUN] +17:24:31 1 of 5 OK created sql view model Preparation. trips ............................. [OK in 2.61s] +17:24:31 2 of 5 START sql view model Preparation.weather ................................ [RUN] +17:24:34 2 of 5 OK created sql view model Preparation.weather ........................... [OK in 2.15s] +17:24:34 3 of 5 START sql view model Business.Transportation.nyc_trips .................. [RUN] +17:24:36 3 of 5 OK created sql view model Business.Transportation.nyc_trips ............. [OK in 2.18s] +17:24:36 4 of 5 START sql view model Business.Weather.nyc_weather ....................... [RUN] +17:24:38 4 of 5 OK created sql view model Business.Weather.nyc_weather .................. [OK in 2.09s] +17:24:38 5 of 5 START sql view model Application.nyc_trips_with_weather ................. [RUN] +17:24:41 5 of 5 OK created sql view model Application.nyc_trips_with_weather ............ [OK in 2.74s] +17:24:41 +17:24:41 Finished running 5 view models in 0 hours 0 minutes and 24.03 seconds (24.03s). +17:24:41 +17:24:41 Completed successfully +17:24:41 +17:24:41 Done. PASS=5 WARN=0 ERROR=0 SKIP=0 TOTAL=5 + +``` + +Now that you have a running environment and a completed job, you can view the data in Dremio and expand your code. This is a snapshot of the project structure in an IDE: + + + +## About the schema.yml + +The `schema.yml` file defines Dremio sources and models to be used and what data models are in scope. In this guides sample project, there are two data sources: + +1. The `NYC-weather.csv` stored in the **Samples** database and +2. The `sample_data` from the **Samples database**. + +The models correspond to both weather and trip data respectively and will be joined for analysis. + +The sources can be found by navigating to the **Object Storage** section of the Dremio Cloud UI. + + + +## About the models + +**Preparation** — `preparation_trips.sql` and `preparation_weather.sql` are building views on top of the trips and weather data. + +**Business** — `business_transportation_nyc_trips.sql` applies some level of transformation on `preparation_trips.sql` view. `Business_weather_nyc.sql` has no transformation on the `preparation_weather.sql` view. + +**Application** — `application_nyc_trips_with_weather.sql` joins the output from the Business model. This is what your business users will consume. + +## The Job output + +When you run the dbt job, it will create a **dev** space folder that has all the data assets created. This is what you will see in Dremio Cloud UI. Spaces in Dremio is a way to organize data assets which map to business units or data products. + + + +Open the **Application folder** and you will see the output of the simple transformation we did using dbt. + + + +## Query the data + +Now that you have run the job and completed the transformation, it's time to query your data. Click on the `nyc_trips_with_weather` view. That will take you to the SQL Runner page. Click **Show SQL Pane** on the upper right corner of the page. + +Run the following query: + +```sql + +SELECT vendor_id, + AVG(tip_amount) +FROM dev.application."nyc_treips_with_weather" +GROUP BY vendor_id + +``` + + + +This completes the integration setup and data is ready for business consumption. \ No newline at end of file diff --git a/website/docs/guides/redshift-qs.md b/website/docs/guides/redshift-qs.md index 9296e6c6568..890be27e50a 100644 --- a/website/docs/guides/redshift-qs.md +++ b/website/docs/guides/redshift-qs.md @@ -57,7 +57,7 @@ You can check out [dbt Fundamentals](https://courses.getdbt.com/courses/fundamen -7. You might be asked to Configure account. For the purpose of this sandbox environment, we recommend selecting “Configure account”. +7. You might be asked to Configure account. For this sandbox environment, we recommend selecting “Configure account”. 8. Select your cluster from the list. In the **Connect to** popup, fill out the credentials from the output of the stack: - **Authentication** — Use the default which is **Database user name and password** (NOTE: IAM authentication is not supported in dbt Cloud). @@ -82,8 +82,7 @@ Now we are going to load our sample data into the S3 bucket that our Cloudformat 2. Now we are going to use the S3 bucket that you created with CloudFormation and upload the files. Go to the search bar at the top and type in `S3` and click on S3. There will be sample data in the bucket already, feel free to ignore it or use it for other modeling exploration. The bucket will be prefixed with `dbt-data-lake`. - - + 3. Click on the `name of the bucket` S3 bucket. If you have multiple S3 buckets, this will be the bucket that was listed under “Workshopbucket” on the Outputs page. diff --git a/website/docs/reference/configs-and-properties.md b/website/docs/reference/configs-and-properties.md index 8a557c762ed..c6458babeaa 100644 --- a/website/docs/reference/configs-and-properties.md +++ b/website/docs/reference/configs-and-properties.md @@ -157,9 +157,9 @@ You can find an exhaustive list of each supported property and config, broken do * Model [properties](/reference/model-properties) and [configs](/reference/model-configs) * Source [properties](/reference/source-properties) and [configs](source-configs) * Seed [properties](/reference/seed-properties) and [configs](/reference/seed-configs) -* [Snapshot Properties](snapshot-properties) +* Snapshot [properties](snapshot-properties) * Analysis [properties](analysis-properties) -* [Macro Properties](/reference/macro-properties) +* Macro [properties](/reference/macro-properties) * Exposure [properties](/reference/exposure-properties) ## FAQs diff --git a/website/docs/reference/dbt_project.yml.md b/website/docs/reference/dbt_project.yml.md index caf501c27ab..34af0f696c7 100644 --- a/website/docs/reference/dbt_project.yml.md +++ b/website/docs/reference/dbt_project.yml.md @@ -22,7 +22,85 @@ dbt uses YAML in a few different places. If you're new to YAML, it would be wort ::: - + + + + +```yml +[name](/reference/project-configs/name): string + +[config-version](/reference/project-configs/config-version): 2 +[version](/reference/project-configs/version): version + +[profile](/reference/project-configs/profile): profilename + +[model-paths](/reference/project-configs/model-paths): [directorypath] +[seed-paths](/reference/project-configs/seed-paths): [directorypath] +[test-paths](/reference/project-configs/test-paths): [directorypath] +[analysis-paths](/reference/project-configs/analysis-paths): [directorypath] +[macro-paths](/reference/project-configs/macro-paths): [directorypath] +[snapshot-paths](/reference/project-configs/snapshot-paths): [directorypath] +[docs-paths](/reference/project-configs/docs-paths): [directorypath] +[asset-paths](/reference/project-configs/asset-paths): [directorypath] + +[target-path](/reference/project-configs/target-path): directorypath +[log-path](/reference/project-configs/log-path): directorypath +[packages-install-path](/reference/project-configs/packages-install-path): directorypath + +[clean-targets](/reference/project-configs/clean-targets): [directorypath] + +[query-comment](/reference/project-configs/query-comment): string + +[require-dbt-version](/reference/project-configs/require-dbt-version): version-range | [version-range] + +[dbt-cloud](/docs/cloud/cloud-cli-installation): + [project-id](/docs/cloud/configure-cloud-cli#configure-the-dbt-cloud-cli): project_id # Required + [defer-env-id](/docs/cloud/about-cloud-develop-defer#defer-in-dbt-cloud-cli): environment_id # Optional + +[quoting](/reference/project-configs/quoting): + database: true | false + schema: true | false + identifier: true | false + +metrics: + + +models: + [](/reference/model-configs) + +seeds: + [](/reference/seed-configs) + +semantic-models: + + +snapshots: + [](/reference/snapshot-configs) + +sources: + [](source-configs) + +tests: + [](/reference/test-configs) + +vars: + [](/docs/build/project-variables) + +[on-run-start](/reference/project-configs/on-run-start-on-run-end): sql-statement | [sql-statement] +[on-run-end](/reference/project-configs/on-run-start-on-run-end): sql-statement | [sql-statement] + +[dispatch](/reference/project-configs/dispatch-config): + - macro_namespace: packagename + search_order: [packagename] + +[restrict-access](/docs/collaborate/govern/model-access): true | false + +``` + + + + + diff --git a/website/docs/reference/resource-properties/config.md b/website/docs/reference/resource-properties/config.md index e6021def852..55d2f64d9ff 100644 --- a/website/docs/reference/resource-properties/config.md +++ b/website/docs/reference/resource-properties/config.md @@ -16,6 +16,7 @@ datatype: "{dictionary}" { label: 'Sources', value: 'sources', }, { label: 'Metrics', value: 'metrics', }, { label: 'Exposures', value: 'exposures', }, + { label: 'Semantic models', value: 'semantic models', }, ] }> @@ -182,6 +183,36 @@ exposures: + + + + +Support for the `config` property on `semantic_models` was added in dbt Core v1.7 + + + + + + + +```yml +version: 2 + +semantic_models: + - name: + config: + enabled: true | false + group: + meta: {dictionary} +``` + + + + + + + +## Definition The `config` property allows you to configure resources at the same time you're defining properties in YAML files. diff --git a/website/docs/terms/data-catalog.md b/website/docs/terms/data-catalog.md index feb529e82e6..64c6ea6448e 100644 --- a/website/docs/terms/data-catalog.md +++ b/website/docs/terms/data-catalog.md @@ -79,7 +79,7 @@ Data teams may choose to use third-party tools with data cataloging capabilities ## Conclusion -Data catalogs are a valuable asset to any data team and business as a whole. They allow people within an organization to find the data that they need when they need it and understand its quality or sensitivity. This makes communication across teams more seamless, preventing problems that impact the business in the long run. Weigh your options in terms of whether to go with open source and enterprise, trusting that the decision you land on will be best for your organization. +Data catalogs are a valuable asset to any data team and business as a whole. They allow people within an organization to find the data that they need when they need it and understand its quality or sensitivity. This makes communication across teams more seamless, preventing problems that impact the business in the long run. Weigh your options in terms of whether to go with open source or enterprise, trusting that the decision you land on will be best for your organization. ## Additional reading diff --git a/website/docs/terms/primary-key.md b/website/docs/terms/primary-key.md index 5921d3ca655..4acd1e8c46d 100644 --- a/website/docs/terms/primary-key.md +++ b/website/docs/terms/primary-key.md @@ -73,7 +73,7 @@ The table below gives an overview of primary key support and enforcement in some Google BigQuery - ❌ + ✅ ❌ diff --git a/website/docusaurus.config.js b/website/docusaurus.config.js index ee593e568f4..c753b854e53 100644 --- a/website/docusaurus.config.js +++ b/website/docusaurus.config.js @@ -1,6 +1,7 @@ const path = require("path"); const math = require("remark-math"); const katex = require("rehype-katex"); + const { versions, versionedPages, versionedCategories } = require("./dbt-versions"); require("dotenv").config(); @@ -193,7 +194,7 @@ var siteSettings = { @@ -258,6 +259,8 @@ var siteSettings = { src: "https://cdn.jsdelivr.net/npm/featherlight@1.7.14/release/featherlight.min.js", defer: true, }, + "https://cdn.jsdelivr.net/npm/clipboard@2.0.11/dist/clipboard.min.js", + "/js/headerLinkCopy.js", "/js/gtm.js", "/js/onetrust.js", "https://kit.fontawesome.com/7110474d41.js", diff --git a/website/snippets/_sl-measures-parameters.md b/website/snippets/_sl-measures-parameters.md new file mode 100644 index 00000000000..4bd32311fda --- /dev/null +++ b/website/snippets/_sl-measures-parameters.md @@ -0,0 +1,12 @@ +| Parameter | Description | +| --- | --- | --- | +| [`name`](/docs/build/measures#name) | Provide a name for the measure, which must be unique and can't be repeated across all semantic models in your dbt project. | Required | +| [`description`](/docs/build/measures#description) | Describes the calculated measure. | Optional | +| [`agg`](/docs/build/measures#description) | dbt supports the following aggregations: `sum`, `max`, `min`, `count_distinct`, and `sum_boolean`. | Required | +| [`expr`](/docs/build/measures#expr) | Either reference an existing column in the table or use a SQL expression to create or derive a new one. | Optional | +| [`non_additive_dimension`](/docs/build/measures#non-additive-dimensions) | Non-additive dimensions can be specified for measures that cannot be aggregated over certain dimensions, such as bank account balances, to avoid producing incorrect results. | Optional | +| `agg_params` | Specific aggregation properties such as a percentile. | Optional | +| `agg_time_dimension` | The time field. Defaults to the default agg time dimension for the semantic model. | Optional | 1.6 and higher | +| `label`* | How the metric appears in project docs and downstream integrations. | Required | +| `create_metric`* | You can create a metric directly from a measure with `create_metric: True` and specify its display name with `create_metric_display_name`. | Optional | +*Available on dbt version 1.7 or higher. diff --git a/website/src/css/custom.css b/website/src/css/custom.css index fc51ef8a8ef..1feb5510cc5 100644 --- a/website/src/css/custom.css +++ b/website/src/css/custom.css @@ -2034,6 +2034,45 @@ html[data-theme="dark"] .theme-doc-sidebar-container>div>button.button:hover { color: #818589; /* You can adjust the color as needed */ } +h3.anchor a.hash-link:before, +h2.anchor a.hash-link:before { + content: ""; + background-image: url('/img/copy.png'); + background-size: 18px 18px; + height: 18px; + width: 18px; + display: inline-block; +} + +h3.anchor.clicked a.hash-link:before, +h2.anchor.clicked a.hash-link:before { + background-image: url('/img/check.png'); + background-size: 18px 13px; + height: 13px; +} + +.copy-popup { + position: fixed; + top: 10px; + left: 50%; + transform: translateX(-50%); + background-color: #047377; + color: rgb(236, 236, 236); + padding: 10px; + border-radius: 5px; + z-index: 9999; +} + +.close-button { + cursor: pointer; + margin: 0 10px; + font-size: 20px; +} + +.close-button:hover { + color: #fff; /* Change color on hover if desired */ +} + @media (max-width: 996px) { .quickstart-container { flex-direction: column; diff --git a/website/static/img/check.png b/website/static/img/check.png new file mode 100644 index 00000000000..91192b50cb1 Binary files /dev/null and b/website/static/img/check.png differ diff --git a/website/static/img/copy.png b/website/static/img/copy.png new file mode 100644 index 00000000000..6b0f5f086c7 Binary files /dev/null and b/website/static/img/copy.png differ diff --git a/website/static/img/guides/dremio/dremio-cloned-repo.png b/website/static/img/guides/dremio/dremio-cloned-repo.png new file mode 100644 index 00000000000..ad60a770d49 Binary files /dev/null and b/website/static/img/guides/dremio/dremio-cloned-repo.png differ diff --git a/website/static/img/guides/dremio/dremio-dev-application.png b/website/static/img/guides/dremio/dremio-dev-application.png new file mode 100644 index 00000000000..66076beb856 Binary files /dev/null and b/website/static/img/guides/dremio/dremio-dev-application.png differ diff --git a/website/static/img/guides/dremio/dremio-dev-space.png b/website/static/img/guides/dremio/dremio-dev-space.png new file mode 100644 index 00000000000..423c3d7c107 Binary files /dev/null and b/website/static/img/guides/dremio/dremio-dev-space.png differ diff --git a/website/static/img/guides/dremio/dremio-nyc-weather.png b/website/static/img/guides/dremio/dremio-nyc-weather.png new file mode 100644 index 00000000000..83b9989eb29 Binary files /dev/null and b/website/static/img/guides/dremio/dremio-nyc-weather.png differ diff --git a/website/static/img/guides/dremio/dremio-sample-data.png b/website/static/img/guides/dremio/dremio-sample-data.png new file mode 100644 index 00000000000..348360c11f6 Binary files /dev/null and b/website/static/img/guides/dremio/dremio-sample-data.png differ diff --git a/website/static/img/guides/dremio/dremio-test-results.png b/website/static/img/guides/dremio/dremio-test-results.png new file mode 100644 index 00000000000..42c17b3b875 Binary files /dev/null and b/website/static/img/guides/dremio/dremio-test-results.png differ diff --git a/website/static/js/headerLinkCopy.js b/website/static/js/headerLinkCopy.js new file mode 100644 index 00000000000..a41f4f4e7ce --- /dev/null +++ b/website/static/js/headerLinkCopy.js @@ -0,0 +1,58 @@ +/* eslint-disable */ + + // Get all the headers with anchor links. + // The 'click' event worked over 'popstate' because click captures page triggers, as well as back/forward button triggers + // Adding the 'load' event to also capture the initial page load + window.addEventListener("click", copyHeader); + window.addEventListener("load", copyHeader); + + // separating function from eventlistener to understand they are two separate things + function copyHeader () { + const headers = document.querySelectorAll("h2.anchor, h3.anchor"); + +headers.forEach((header) => { + header.style.cursor = "pointer"; + const clipboard = new ClipboardJS(header, { + text: function(trigger) { + const anchorLink = trigger.getAttribute("id"); + return window.location.href.split('#')[0] + '#' + anchorLink; + } + }); + + clipboard.on('success', function(e) { + // Provide user feedback (e.g., alert or tooltip) here + const popup = document.createElement('div'); + popup.classList.add('copy-popup'); + popup.innerText = 'Link copied!'; + document.body.appendChild(popup); + + // Set up timeout to remove the popup after 3 seconds + setTimeout(() => { + document.body.removeChild(popup); + }, 3000); + + // Add close button ('x') + const closeButton = document.createElement('span'); + closeButton.classList.add('close-button'); + closeButton.innerHTML = ' ×'; // '×' symbol for 'x' + closeButton.addEventListener('click', () => { + if (document.body.contains(popup)) { + document.body.removeChild(popup); + } + }); + popup.appendChild(closeButton); + + // Add and remove the 'clicked' class for styling purposes + e.trigger.classList.add("clicked"); + setTimeout(() => { + if (document.body.contains(popup)) { + document.body.removeChild(popup); + } + }, 3000); + }); + + clipboard.on('error', function(e) { + console.error("Unable to copy to clipboard: " + e.text); + }); +}); +}; diff --git a/website/vercel.json b/website/vercel.json index 269ae29116d..149aaaeb09a 100644 --- a/website/vercel.json +++ b/website/vercel.json @@ -2,6 +2,16 @@ "cleanUrls": true, "trailingSlash": false, "redirects": [ + { + "source": "/docs/about/overview", + "destination": "/docs/introduction", + "permanent": true + }, + { + "source": "/dbt-cli/cli-overview", + "destination": "/docs/core/about-dbt-core", + "permanent": true + }, { "source": "/guides/advanced/creating-new-materializations", "destination": "/guides/create-new-materializations", @@ -1604,7 +1614,7 @@ }, { "source": "/docs/dbt-cloud/cloud-overview", - "destination": "/docs/get-started/getting-started/set-up-dbt-cloud", + "destination": "/docs/cloud/about-cloud/dbt-cloud-features", "permanent": true }, { @@ -2639,7 +2649,7 @@ }, { "source": "/docs/cloud-overview", - "destination": "/docs/dbt-cloud/cloud-overview", + "destination": "/docs/cloud/about-cloud/dbt-cloud-features", "permanent": true }, {