diff --git a/README.md b/README.md
index 6203bbaba9a..83aa4f7e5bc 100644
--- a/README.md
+++ b/README.md
@@ -23,7 +23,14 @@ You can use components documented in the [docusaurus library](https://v2.docusau
# Writing content
-When writing content, you should refer to the [style guide](https://github.com/dbt-labs/docs.getdbt.com/blob/current/contributing/content-style-guide.md) and [content types](/contributing/content-types.md) to help you understand our writing standards and how we break down information in the product documentation.
+The dbt Labs docs are written in Markdown and sometimes HTML. When writing content, refer to the [style guide](https://github.com/dbt-labs/docs.getdbt.com/blob/current/contributing/content-style-guide.md) and [content types](/contributing/content-types.md) to help you understand our writing standards and how we break down information in the product documentation.
+
+## SME and editorial reviews
+
+All PRs that are submitted will be reviewed by the dbt Labs Docs team for editorial review.
+
+Content that is submitted by our users and the open-source community are also reviewed by our dbt Labs subject matter experts (SMEs) to help ensure technical accuracy.
+
## Versioning and single-sourcing content
diff --git a/contributing/content-types.md b/contributing/content-types.md
index f1094e29b2c..4654ada9255 100644
--- a/contributing/content-types.md
+++ b/contributing/content-types.md
@@ -102,7 +102,7 @@ Procedural content should include troubleshooting tips as frequently as possible
## Guide
-Guides (formerly called long-form procedural articles) are highly-approachable articles that group information in context to help readers complete a complex task or set of related tasks. Guides eliminate duplication and ensure the customer finds contextual content in the right place. Guides may be a set of tasks within the reader’s larger workflow, such as including use cases.
+Guides are highly-approachable articles that group information in context to help readers complete a complex task or set of related tasks. Guides eliminate duplication and ensure people find contextual content in the right place. Guides may be a set of tasks within the reader’s larger workflow, such as including use cases.
Guides combine the content types within a single article to illustrate an entire workflow within a single page, rather than splitting the workflow out into separate pieces. Guides containing multiple procedures help us scale as more options are added to the product. Users may need to complete different procedures within the guide at different times, or refer back to the guide for conceptual content or to complete a followup task.
Example usage: If there is a large number of the same type of setting, use a guide that gathers all of the tasks in context.
diff --git a/website/blog/2021-11-29-dbt-airflow-spiritual-alignment.md b/website/blog/2021-11-29-dbt-airflow-spiritual-alignment.md
index 743461d180e..9edcb84fd4f 100644
--- a/website/blog/2021-11-29-dbt-airflow-spiritual-alignment.md
+++ b/website/blog/2021-11-29-dbt-airflow-spiritual-alignment.md
@@ -22,7 +22,7 @@ In my experience, these are false dichotomies, that sound great as hot takes but
-In my days as a data consultant and now as a member of the dbt Labs Solutions Architecture team, I’ve frequently seen Airflow, dbt Core & dbt Cloud ([via the API](https://docs.getdbt.com/dbt-cloud/api-v2)) blended as needed, based on the needs of a specific data pipeline, or a team’s structure and skillset.
+In my days as a data consultant and now as a member of the dbt Labs Solutions Architecture team, I’ve frequently seen Airflow, dbt Core & dbt Cloud ([via the official provider](https://registry.astronomer.io/providers/dbt-cloud?type=Operators&utm_campaign=Monthly+Product+Updates&utm_medium=email&_hsmi=208603877&utm_content=208603877&utm_source=hs_email)) blended as needed, based on the needs of a specific data pipeline, or a team’s structure and skillset.
More fundamentally, I think it’s important to call out that Airflow + dbt are **spiritually aligned** in purpose. They both exist to facilitate clear communication across data teams, in service of producing trustworthy data.
@@ -123,8 +123,6 @@ When a dbt run fails within an Airflow pipeline, an engineer monitoring the over
dbt provides common programmatic interfaces (the [dbt Cloud Admin + Metadata APIs](/docs/dbt-cloud/dbt-cloud-api/cloud-apis), and [.json-based artifacts](/reference/artifacts/dbt-artifacts) in the case of dbt Core) that provide the context needed for the engineer to self-serve—either by rerunning from a point of failure or reaching out to the owner.
-![dbt run log](/img/blog/airflow-dbt-run-log.png "dbt run log")
-
## Why I ❤️ dbt Cloud + Airflow
dbt Core is a fantastic framework for developing data transformation + testing logic. It is less fantastic as a shared interface for data analysts + engineers to collaborate **_on production runs of transformation jobs_**.
@@ -191,25 +189,7 @@ This means that whether you’re actively developing or you simply want to rerun
### dbt Cloud + Airflow
-With dbt Cloud and its aforementioned [APIs](https://docs.getdbt.com/docs/dbt-cloud/dbt-cloud-api/cloud-apis), any dbt user can configure dbt runs from the UI.
-
-In Airflow, engineers can then call the API, and everyone can move on with their lives. This allows the API to be a programmatic interface between analysts and data engineers, vs relying on the human interface.
-
-If you look at what this practically looks like in code (my [airflow-toolkit repo is here](https://github.com/sungchun12/airflow-toolkit/blob/demo-sung/dags/examples/dbt_cloud_example.py)), just a few settings need to be configured after you create the initial python API call: [here](https://github.com/sungchun12/airflow-toolkit/blob/95d40ac76122de337e1b1cdc8eed35ba1c3051ed/dags/dbt_cloud_utils.py)
-
-```
-
-dbt_cloud_job_runner_config = dbt_cloud_job_runner(
-
- account_id=4238, project_id=12220, job_id=12389, cause=dag_file_name
-
-)
-
-```
-
-If the operator fails, it’s an Airflow problem. If the dbt run returns a model or test failure, it’s a dbt problem and the analyst can be notified to hop into the dbt Cloud UI to debug.
-
-#### Using the new dbt Cloud Provider
+#### Using the dbt Cloud Provider
With the new dbt Cloud Provider, you can use Airflow to orchestrate and monitor your dbt Cloud jobs without any of the overhead of dbt Core. Out of the box, the dbt Cloud provider comes with:
@@ -221,26 +201,7 @@ TL;DR - This combines the end-to-end visibility of everything (from ingestion th
#### Setting up Airflow and dbt Cloud
-To set up Airflow and dbt Cloud, you can:
-
-
-1. Set up a dbt Cloud job, as in the example below.
-
-![job settings](/img/blog/2021-11-29-dbt-airflow-spiritual-alignment/job-settings.png)
-
-2. Set up an Airflow Connection ID
-
-![airflow dbt run select](/img/blog/2021-11-29-dbt-airflow-spiritual-alignment/airflow-connection-ID.png)
-
-3. ~~Set up your Airflow DAG similar to this example.~~
-
-4. You can use Airflow to call the dbt Cloud API via the new `DbtCloudRunJobOperator` to run the job and monitor it in real time through the dbt Cloud interface.
-
-![dbt Cloud API graph](/img/blog/2021-11-29-dbt-airflow-spiritual-alignment/dbt-Cloud-API-graph.png)
-
-![Monitor Job Runs](/img/blog/2021-11-29-dbt-airflow-spiritual-alignment/Monitor-Job-Runs.png)
-
-![run number](/img/blog/2021-11-29-dbt-airflow-spiritual-alignment/run-number.png)
+To set up Airflow and dbt Cloud, you can follow the step by step instructions: [here](https://docs.getdbt.com/guides/orchestration/airflow-and-dbt-cloud/2-setting-up-airflow-and-dbt-cloud)
If your task errors or fails in any of the above use cases, you can view the logs within dbt Cloud (think: data engineers can trust analytics engineers to resolve errors).
diff --git a/website/blog/2022-05-19-redshift-configurations-dbt-model-optimizations.md b/website/blog/2022-05-19-redshift-configurations-dbt-model-optimizations.md
index 239fa7148c6..c01194360f1 100644
--- a/website/blog/2022-05-19-redshift-configurations-dbt-model-optimizations.md
+++ b/website/blog/2022-05-19-redshift-configurations-dbt-model-optimizations.md
@@ -234,13 +234,13 @@ I won’t get into our modeling methodology at dbt Labs in this article, but the
### Staggered joins
-![Staggered-Joins.png](/img/blog/2022-05-19-redshift-configurations-dbt-model-optimizations/Staggered-Joins.png)
+![Staggered-Joins.png](/img/blog/2022-05-19-redshift-configurations-dbt-model-optimizations/Staggered-Joins.jpg)
In this method, you piece out your joins based on the main table they’re joining to. For example, if you had five tables that were all joined using `person_id`, then you would stage your data (doing your clean up too, of course), distribute those by using `dist='person_id'`, and then marry them up in some table downstream. Now with that new table, you can choose the next distribution key you’ll need for the next process that will happen. In our example above, the next step is joining to the `anonymous_visitor_profiles` table which is distributed by `mask_id`, so the results of our join should also distribute by `mask_id`.
### Resolve to a single key
-![Resolve-to-single-key](/img/blog/2022-05-19-redshift-configurations-dbt-model-optimizations/Resolve-to-single-key.png)
+![Resolve-to-single-key](/img/blog/2022-05-19-redshift-configurations-dbt-model-optimizations/Resolve-to-single-key.jpg)
This method takes some time to think about, and it may not make sense to do it depending on what you need. This is definitely balance between coherence, usability, and performance.
diff --git a/website/blog/2022-10-24-demystifying-event-streams.md b/website/blog/2022-10-24-demystifying-event-streams.md
new file mode 100644
index 00000000000..39829c3bca0
--- /dev/null
+++ b/website/blog/2022-10-24-demystifying-event-streams.md
@@ -0,0 +1,277 @@
+---
+title: "Demystifying event streams: Transforming events into tables with dbt"
+description: "Pulling data directly out of application databases is commonplace in the MDS, but also risky. Apps change quickly, and application teams might update database schemas in unexpected ways, leading to pipeline failures, data quality issues, data delivery slow-downs. There is a better way. In this blog post, Charlie Summers (Merit) describes how their organization transforms application event streams into analytics-ready tables, more resilient to event scheme changes."
+slug: demystifying-event-streams
+
+authors: [charlie_summers]
+
+tags: [analytics craft]
+hide_table_of_contents: false
+
+date: 2022-11-04
+is_featured: true
+---
+
+Let’s discuss how to convert events from an event-driven microservice architecture into relational tables in a warehouse like Snowflake. Here are a few things we’ll address:
+
+- Why you may want to use an architecture like this
+- How to structure your event messages
+- How to use dbt macros to make it easy to ingest new event streams
+
+
+
+## Event Streams at Merit
+
+At Merit, we’re building the leading verified identity platform. One key focus of our platform is data quality. Quality problems lead to first responders unable to check into disaster sites or parents unable to access ESA funds. In this blog post we’ll dive into how we tackled one source of quality issues: directly relying on upstream database schemas.
+
+Under the hood, the Merit platform consists of a series of microservices. Each of these microservices has its own database. We use Snowflake as our data warehouse where we build dashboards both for internal use and for customers.
+
+![](/img/blog/2022-10-24-demystifying-event-streams/merit-platform.png)
+
+In the past we relied upon an ETL tool (Stitch) to pull data out of microservice databases and into Snowflake. This data would become the main dbt sources used by our report models in BI.
+
+![](/img/blog/2022-10-24-demystifying-event-streams/merit-platform-stitch.png)
+
+This approach worked well, but as engineering velocity increased, we came up with a new policy that required we rethink this approach: **no service should directly access another microservice’s database**. This rule empowers microservices to change their database schemas however they like without worrying about breaking other systems.
+
+Modern tools like Fivetran and Stitch can flexibly handle schema changes - for example, if a new column is created they can propagate that creation to Snowflake. However, BI tools and dbt models aren’t typically written this way. For example, if a column your BI tool filters on has a name change in the upstream database, that filter will become useless and customers will complain.
+
+The approach we used before required over-communicating about schema changes. Engineers would need to talk to Data before any change or it could risk a data outage. Tools that provide column-level lineage can improve detecting how schema changes affect dashboards. But a migration is still required should a used column be updated by a schema change.
+
+This old approach frequently resulted in either busted dashboards or delayed schema changes. These issues were the exact reason engineering implemented the new policy.
+
+The core challenge is contractual: in our old approach the contract between engineering and data was the database schema. But the database schema was intended to be a tool to help the microservice efficiently store and query data, not a contract.
+
+So our solution was to start using an intentional contract: **Events**.
+
+What are Events? Events are facts about what happened within your service. For example, somebody logged in or a new user was created. At Merit (and at many companies), we use an Event-Driven Architecture. That means that microservices primarily communicate information through events, often leveraging messaging platforms like Kafka.
+
+![](/img/blog/2022-10-24-demystifying-event-streams/merit-platform-kafka.png)
+
+Microservices consume messages from others that they’re interested in. We choose to use **thick messages** that store as much information as possible about each event - this means that consuming microservices can store and refer to event data instead of requesting fresh data from microservices. For distributed systems nerds: this improves Availability at the cost of Consistency.
+
+Event schemas can still change, just like database schemas, but the expectation is that they are already a contract between this microservice and other systems. And the sole intention of events is to be this contract - unlike database schemas which are also used by microservices internally to store and query data. So, when an event schema changes, there already is a meeting between that team and all teams that consume the event - now Data is just another team at the meeting.
+
+## Events as Contracts
+
+Each event output by a microservice is inserted into a single Kafka topic with a well-defined schema. This schema is managed as part of the [Kafka Schema Registry](https://docs.confluent.io/platform/current/schema-registry/index.html). The Schema Registry doesn’t strictly enforce that events comply with the topic’s schema, but any microservice that produces an event that does not comply with the schema will cause downstream failures - a high-priority bug. These bad events are replayed with the correct schema when the microservice is fixed.
+
+We use [Avro](https://avro.apache.org/) to encode all of our event schemas. We also tried out [Protobuf](https://developers.google.com/protocol-buffers), but found that the Avro tooling was a bit better for Kafka.
+
+Event schema design (what should the data contract be?) is a deep topic that we can only touch on briefly here. At a high level, we must design for change. A schema will almost always be tweaked and tuned over time as your product changes.
+
+As an example, consider a LicenseCreated event. The internal License data model might have several boolean fields in its schema such as IsValid, IsCurrent, IsRestricted, etc. We would recommend instead modeling a License with a single Status field that has a VARCHAR representing the status of the License. New values are easier to add to a VARCHAR than adding or removing boolean fields.
+
+One very useful feature of the Kafka Schema Registry is it can restrict changes that aren’t compatible with old schema versions. For example, if a data type is changed from an INT to a VARCHAR it will throw an error as the new schema is added. This can be an extra line of defense as schemas change. [Read more about this awesome feature here](https://docs.confluent.io/platform/current/schema-registry/avro.html).
+
+## OMG Contract
+
+So we started consuming events from Kafka into Snowflake using [Kafka’s Snowflake Connector](https://docs.snowflake.com/en/user-guide/kafka-connector.html).
+
+![](/img/blog/2022-10-24-demystifying-event-streams/merit-platform-kafka-load.png)
+
+The Snowflake Connector creates a new for every Kafka topic and adds a new row for every event. In each row there’s a record_metadata column and a record_content column. Each column is a variant type in Snowflake.
+
+![](/img/blog/2022-10-24-demystifying-event-streams/kafka-topic-table.png)
+
+Since we use **thick messages** we actually can consider ourselves done. The messages have as much information as the underlying database, so we could make queries against tables like the above.
+
+However, working with these blobs is much less convenient than a relational table for the following reasons:
+
+1. There may be multiple topics related to the same domain model (ex: Users or Customers). So there may be a CustomerCreated topic, a CustomerDeleted topic, a CustomerUpdated topic, and so on. We need to know to join between these tables to determine what the latest Customer data is.
+1. We must know whether an event implies a create, an update, or a delete.
+1. We must be aware of the ordering of events - the latest update will include the most up-to-date state unless there’s a delete. This can lead to some gnarly time logic that must be considered across all models.
+ 1. One challenge is partial updates - we disallow those currently so that we never need to recreate the state of a domain model across multiple json blobs.
+ 1. Distributed systems folks will identify another problem: relying on timing. Due to clock skew, we can’t guarantee that event A’s timestamp being earlier than another B’s means that A occurred before B. If both messages are sent on the same Kafka topic then Kafka can ensure ordering (if configured properly), but we don’t want to limit all events to using the same topic. So we choose to ignore this problem since we have relatively low traffic and low machine volume compared to the Googles and Facebooks of the world. We can also verify the likelihood of clock skew affecting our data by looking for events with the same identifying ID happening within the same second - it doesn’t happen often for us.
+
+Instead of repeatedly working with the above challenges, we decided to create a relational layer on top of the raw event streams. This takes the form of [dbt macros](https://docs.getdbt.com/docs/building-a-dbt-project/jinja-macros) that handle all of the above problems.
+
+In order to make the dbt macros easier to write, we requested that engineering add some metadata to all of their events. This formalized the contract between engineering and data - any domain models that don’t comply with the contract will not be able to be used in reports unless the engineering team themself builds a custom pipeline. We named this the Obvious Model Generation (OMG) Contract since providing the metadata leads to obvious domain model generation. And we liked the acronym.
+
+The OMG contract states that every Kafka message related to a domain model:
+1. Must have its topic name added to a dbt variable associated with that domain model in our dbt_project.yml
+1. Must have a single uniquely identifying field for each object. We provide a default - id - and a way to override it in our dbt_project.yml. We currently disallow composite ids, but they wouldn’t be too hard to support in the future.
+1. Must have a field `changeType` set to one of the following values: INSERT, UPDATE, DELETE.
+1. If an INSERT or UPDATE, it must specify a field **data** that encodes the state of the domain model object after the change.
+1. If a DELETE, it must specify a field `deletedID` that is set to the identifying field for the deleted domain model object.
+
+We now can run obvious model generation streams processing on all data that complies with the OMG contract.
+
+![](/img/blog/2022-10-24-demystifying-event-streams/omg-contract.png)
+
+## Generic table pipelines via dbt macros
+
+After solidifying the OMG contract, we built the macros to execute obvious model generation. We wanted to make these as generic as possible while also following good engineering practices. We ended up building three macros that together process event streams into tables. All three macros take in `streams_var` - a list of all the event stream tables related to this domain model. We pull streams_var in from dbt_project.yml. We also take in `streams_schema` which defaults to ‘streams’ but allows overriding for our internal testing.
+
+The first model is called `stream_model_extract_columns` which iterates through every row in the event stream tables to identify all of the columns that will be part of the domain model table.
+
+```sql
+{%- macro stream_model_extract_columns_macro(streams_var, streams_schema='streams') -%}
+
+SELECT DISTINCT
+ CONCAT('DATA:', KEY, ' ', 'AS', ' ', UPPER(e.KEY)) AS COLUMN_NAME
+FROM
+(
+{% for stream in streams_var %}
+ SELECT
+ '{{ stream }}' as streamName,
+ RECORD_CONTENT:data AS data
+ FROM {{ source(streams_schema, stream ) }}
+ {%- if not loop.last %} UNION ALL{% endif -%}
+{% endfor %}
+), LATERAL FLATTEN( INPUT => data ) AS e
+
+{%- endmacro -%}
+```
+
+The second macro is called `stream_model_latest_snapshot`. It includes the logic to identify the latest state of every domain model object in the table, applying deletes when it finds them.
+
+```sql
+{%- macro stream_model_latest_snapshot_macro(streams_var, streams_schema='streams') -%}
+{%- set identityFields = var("overriddenIdentityFields") -%}
+
+WITH changeStream AS (
+{% for stream in streams_var %}
+ SELECT
+ '{{ stream }}' as streamName,
+ -- Need to alias ID column here to custom column if its overwritten in the variable
+ RECORD_CONTENT:data.{{ identityFields.get(stream,'id') }} AS idCol,
+ RECORD_METADATA:CreateTime AS createTime,
+ RECORD_CONTENT:changeType::STRING AS changeType,
+ RECORD_CONTENT:data AS data,
+ GET(RECORD_CONTENT,'deletedID') AS deletedID
+ FROM {{ source(streams_schema, stream ) }}
+ {%- if not loop.last %} UNION ALL{% endif -%}
+{% endfor %}
+),
+
+orderedStream AS (
+ SELECT
+ cs.*
+ , cs.deletedID IN (SELECT deletedID FROM changeStream WHERE changeType = 'DELETE') AS isDeleted
+ , ROW_NUMBER() OVER (PARTITION BY cs.idCol ORDER BY cs.createTime DESC, cs.changeType DESC) AS LatestRow
+ FROM changeStream AS cs
+ WHERE changeType IN ('INSERT', 'UPDATE')
+),
+selectedStream AS (
+ SELECT
+ *
+ FROM orderedStream
+ WHERE LatestRow = 1
+)
+
+{%- endmacro -%}
+```
+
+The final macro is called `stream_model` and it coordinates the usage of the first two. Particularly, it uses [run_query()](https://docs.getdbt.com/reference/dbt-jinja-functions/run_query) to run the first macro, then uses the results to execute the final query which leverages the second macro.
+
+```sql
+{%- macro stream_model_macro(streams_var, streams_schema='streams') -%}
+
+{%- set column_name_query -%}
+{{ stream_model_extract_columns_macro(streams_var, streams_schema) }}
+{%- endset -%}
+
+{%- set results = run_query(column_name_query) -%}
+
+{% if execute %}
+{# Return the first column #}
+{%- set column_names = results.columns[0].values() -%}
+{% else %}
+{%- set column_names = [] -%}
+{% endif %}
+
+{{ stream_model_latest_snapshot_macro(streams_var, streams_schema) }}
+,
+dynamicStream AS (
+ SELECT
+ {# rendering_a_new_line_in_sql_block_code #}
+ {%- for columns in column_names -%}
+ {{ ", " if not loop.first }}{{columns}}
+ {%- if not loop.last -%}
+ {# rendering_a_new_line_in_sql_block_code #}
+ {% endif %}
+ {%- endfor %}
+ FROM selectedStream AS e
+)
+SELECT * FROM dynamicStream
+
+{%- endmacro -%}
+```
+
+Now all we need to do is call the final macro in a dbt model and provide the list specified as a variable in `dbt_project.yml`. This file is in `src_container.sql`:
+
+```sql
+{{ stream_model_macro(var('container')) }}
+```
+
+In `src_container.yml` we explicitly set and have tests for the columns we expect to be associated with this model. This is the first time we introduce the actual column names anywhere in our dbt code.
+
+```yaml
+---
+version: 2
+
+models:
+ - name: src_container
+ description: pass the OMG model variable to generate the data
+ columns:
+ - name: templateName
+ description: STRING Specifies the templateName
+ tests:
+ - not_null
+ - name: complete
+ description: STRING Specifies the complete
+ - name: aggregateID
+ description: STRING Specifies the aggregateID
+ - name: recipientID
+ description: STRING Specifies the recipientID
+ - name: templateID
+ description: STRING Specifies the templateID
+ - name: templateType
+ description: STRING Specifies the templateType
+ - name: state
+ description: STRING Specifies the state
+ - name: id
+ description: STRING Specifies the id
+ - name: orgID
+```
+```yaml
+---
+version: 2
+
+models:
+ - name: users
+ description: Lovely humans that use our app
+ columns:
+ - name: id
+ description: INT The id of this user
+ tests:
+ - not_null
+ - unique
+ - name: email
+ description: STRING User's contact email
+ tests:
+ - not_null
+ - name: state
+ description: STRING The current state of the user
+ tests:
+ - accepted_values:
+ values:
+ - "active"
+ - "invited"
+ - not_null
+```
+
+## Future ideas
+
+We learned a lot from both working with event streams and building these macros.
+
+One consideration that we haven’t discussed yet is [materialization](https://docs.getdbt.com/docs/building-a-dbt-project/building-models/materializations) strategy. Since event stream tables are append-only, this is a natural fit for incremental models. At Merit, we haven’t worked much with incremental models, so we’re opting to start with views. As we roll this out to production models we’ll be doing a ton of performance testing to figure out the perfect materialization strategy for us.
+
+We also plan on adding a dbt test that alerts whenever the columns of any domain model table changes. This may indicate that an unexpected change has happened to an event schema, which could affect dashboards.
+
+These were certainly the most complicated dbt macros that we’ve built so far. This has inspired us to build a test framework to make sure that macros work as expected - including features like mocking run_query() calls. We’re considering open sourcing this framework - if you’re interested then let us know!
+
+## Let's talk!
+
+We’ve used dbt macros to transform event streams into tables so that we don’t need our data pipelines to rely directly on database schemas. I’ll be talking about this more at Coalesce 2022 - come check out my talk [Demystifying event streams: Transforming events into tables with dbt](https://coalesce.getdbt.com/agenda/demystifying-event-streams-transforming-events-into-tables-with-dbt). You can also reach out to me in the dbt slack (@Charlie Summers) or [LinkedIn](https://www.linkedin.com/in/charliesummers/).
diff --git a/website/blog/2022-11-21-wasilas-foundry-experience.md b/website/blog/2022-11-21-wasilas-foundry-experience.md
new file mode 100644
index 00000000000..ca28038fd4f
--- /dev/null
+++ b/website/blog/2022-11-21-wasilas-foundry-experience.md
@@ -0,0 +1,53 @@
+---
+title: "A journey through the Foundry: Becoming an analytics engineer at dbt Labs"
+description: "The Foundry Program is an apprenticeship at dbt Labs designed to turn data newbies into fully-fledged analytics engineers over the course of six months. As one of the inaugural foundry apprentices, Wasila shares in this blog post her journey into analytics engineering."
+slug: wasila-foundry-experience
+
+authors: [wasila_quader]
+
+tags: [analytics craft]
+hide_table_of_contents: false
+
+date: 2022-11-22
+is_featured: true
+---
+
+Data is [an industry of sidesteppers](https://analyticsengineers.club/data-education-is-broken/). Most folks in the field stumble into it, look around, and if they like what they see, they’ll build a career here. This is particularly true in the analytics engineering space. Every AE I’ve talked to had envisioned themselves doing something different before finding this work in a moment of serendipity. This raises the question, how can someone become an analytics engineer *intentionally*? This is the question [dbt Labs’ Foundry Program](https://www.getdbt.com/blog/announcing-the-foundry-program/) aims to address.
+
+
+
+## About the Foundry
+
+The Foundry Program is an apprenticeship designed to turn data newbies into fully-fledged analytics engineers over the course of six months. As one of the inaugural foundry apprentices, I’m here to share my journey into analytics engineering along with the takeaways I picked up along the way.
+
+We’re continuing to improve the program with each iteration but the curriculum for my cohort was split into two parts—three months of training followed by three months of hands-on work.
+
+## Where I started
+
+Before diving into the foundry experience, I’d like to tell you a bit about my background before dbt Labs. In my previous job, I had done some very basic work with data in Excel. Prior to dbt, I had also done a data science bootcamp. The first time I heard about analytics engineering was when I saw a post about the foundry program in Code for Philadelphia’s Slack channel. Even as someone who didn’t understand what analytics engineering was, I was struck by dbt Labs’ strong opinions about analytics and data: [there was a vision towards the future informed by lessons from the past](https://www.getdbt.com/blog/of-the-community-by-the-community-for-the-community/) (i.e. reflecting on the history of software engineering). There was a desire to optimize the way data was done, transparency on the plan to get there, and where better to get my feet wet than a company committed to doing analytics in the best way possible?
+
+## The Foundry journey
+
+### Ramping up
+
+My first couple weeks at dbt Labs was a whirlwind of information and discovery. Week two was when I began to understand what analytics engineering really meant: the organization of data. There was a lot to love about it; there was a promise of both the technical and the creative (the code and the problem solving). As someone who loves organizing, analytics engineering was a natural fit. It came with a [KonMari zen](https://docs.getdbt.com/blog/marie-kondo-query-migration).
+
+I had originally focused my job search on data analytics, but for me, analytics engineering was a much better fit. It felt less like reaching around for a lightbulb moment and more like building a library I can take pride in.
+
+As my knowledge of the “what” and “why” behind analytics engineering was growing, I started learning the “how”. SQL, Jinja, best practices, and the in’s and out’s of working in a dbt project. The best part was applying my knowledge to exercises. I remember going through my refactored code from an exercise with [Dave Connors](https://docs.getdbt.com/author/dave_connors), my foundry mentor (shout out to Dave! He was a huge help during my apprenticeship). Going through my modeling and the different ways an AE could refactor the code showed me the creative problem solving that this job requires. Often there’s a clear best path in the code. But sometimes there isn’t. Playing with those trade offs made me feel like a kid in a candy store.
+
+Along the way, I was able to utilize some great resources. My apprenticeship was on our professional services team, who excelled not only in dbt work but in conveying an understanding of dbt and the analytics engineering way of thinking to our clients. Within our team as well as the larger dbt community, there was a culture of sharing perspectives, sharing solutions, and growing as a space. We have [guides on analytics engineering](https://www.getdbt.com/analytics-engineering/start-here), [articles on the MDS ecosystem](https://continual.ai/post/the-modern-data-stack-ecosystem-spring-2022-edition), and [a number of robust writers sharing their latest takes on the field](https://roundup.getdbt.com/). These are invaluable resources for hopeful analytics engineers.
+
+Of course, I can’t talk about the Community without mentioning Coalesce. My first Coalesce came on the heels of the training section of my apprenticeship, right before I dove into real hands-on consulting work. It was amazing to see so many folks excited and engaged in analytics engineering. Talks ranged from getting-your-hands-dirty technical problems to reflections on the broader industry. [Coalesce 2021](https://www.getdbt.com/coalesce-2021/) reaffirmed for me that the real magic of this field wasn’t dbt, but the Community that had coalesced around it.
+
+### On the ground
+
+And then it was time for the real work. I was paired on projects with more senior team members. The need to prove myself gave way to some imposter syndrome. Was I ready? Had I learned enough and was I capable of applying the knowledge when it really came down to it? As is often the case when you shift from an academic application to a practical one, I found that there were challenges I hadn’t anticipated.
+
+The first project I worked on was a solutions review on a client’s project, where we review the project and suggest where it can be improved, as well as highlighting where it shines. I was armed with dbt Labs’ best practices, but when I first opened up a DAG of over 200 models, I was overwhelmed and didn’t know where to start. That’s when I learned that context gathering (like going through the DAG and project before diving into the work) is a very important part of the job! In the long term, the contributions I made to those initial client engagements were the first step in growing my confidence.
+
+## Post-Foundry
+
+Once the foundry wrapped up, I was offered a permanent position on the professional service’s team! I continue to benefit from [the knowledge loop](https://github.com/dbt-labs/corp/blob/main/values.md#we-contribute-to-the-knowledge-loop), but now I’m also able to contribute to it. I’ve worked on more dbt projects. I’ve made package contributions. I’ve gone from being a starry-eyed Coalesce attendee to being a starry-eyed Coalesce attendee *and* [co-facilitating workshops at Coalesce](https://www.youtube.com/watch?v=W3CyTmVYro8). Over a year later, I can happily say that the Foundry program brought me where I wanted to be.
+
+If you’re looking for resources to help a hopeful analytics engineer (whether you are one or a manager of one), feel free to reach out to me on the community Slack (@Wasila)!
\ No newline at end of file
diff --git a/website/blog/2022-11-22-move-spreadsheets-to-your-dwh.md b/website/blog/2022-11-22-move-spreadsheets-to-your-dwh.md
new file mode 100644
index 00000000000..67f217c76a4
--- /dev/null
+++ b/website/blog/2022-11-22-move-spreadsheets-to-your-dwh.md
@@ -0,0 +1,195 @@
+---
+title: "How to move data from spreadsheets into your data warehouse"
+description: "A thankless, humble, and inevitable task: getting spreadsheet data into your data warehouse. Let's look at some of the different options, and the pros and cons of each."
+slug: moving-spreadsheet-data
+
+authors: [joel_labes]
+
+tags: [analytics craft]
+hide_table_of_contents: false
+
+date: 2022-11-23
+is_featured: true
+---
+
+Once your is built out, the vast majority of your data will have come from other SaaS tools, internal databases, or customer data platforms (CDPs). But there’s another unsung hero of the analytics engineering toolkit: the humble spreadsheet.
+
+Spreadsheets are the Swiss army knife of data processing. They can add extra context to otherwise inscrutable application identifiers, be the only source of truth for bespoke processes from other divisions of the business, or act as the translation layer between two otherwise incompatible tools.
+
+Because of spreadsheets’ importance as the glue between many business processes, there are different tools to load them into your data warehouse and each one has its own pros and cons, depending on your specific use case.
+
+
+
+In general, there are a few questions to ask yourself about your data before choosing one of these tools:
+
+- Who at your company will be loading the data?
+- Does it have a consistent format?
+- How frequently will it change?
+- How big is the dataset?
+- Do changes need to be tracked?
+- Where are the files coming from?
+
+Let’s have a look at some of the offerings to help you get your spreadsheets into your data warehouse.
+
+## dbt seeds
+
+dbt comes with an inbuilt csv loader ([seeds](https://docs.getdbt.com/docs/building-a-dbt-project/seeds)) to populate your data warehouse with any files you put inside of your project’s `seeds` folder. It will automatically infer data types from your file’s contents, but you can always override it by [providing explicit instructions in your dbt_project.yml](https://docs.getdbt.com/reference/resource-configs/column_types) file.
+
+However, since dbt creates these tables by inserting rows one at a time, it doesn’t perform well at scale (there’s no hard limit but aim for hundreds of rows rather than thousands). [The dbt docs](https://docs.getdbt.com/docs/building-a-dbt-project/seeds#faqs) suggest using seeds for “files that contain business-specific logic, for example, a list of country codes or user IDs of employees.”
+
+A big benefit of using seeds is that your file will be checked into source control, allowing you to easily see when the file was updated and retrieve deleted data if necessary.
+
+#### Good fit for:
+
+- Small files such as mapping employee identifiers to employees
+- Infrequently modified files such as mapping country codes to country names
+- Data that would benefit from source control
+- Programmatic control of data types
+
+#### Not a good fit for:
+
+- Files greater than 1MB in size
+- Files that need regular updates
+
+## ETL tools
+
+An obvious choice if you have data to load into your warehouse would be your existing [ETL tool](https://www.getdbt.com/analytics-engineering/etl-tools-a-love-letter/) such as Fivetran or Stitch, which I'll dive into in this section. Below is a summary table highlighting the core benefits and drawbacks of certain ETL tooling options for getting spreadsheet data in your data warehouse.
+
+### Summary table
+
+| Option/connector | Data updatable after load | Configurable data types | Multiple tables per schema | Good for large datasets |
+| --- | --- | --- | --- | --- |
+| dbt seeds | ✅ | ✅ | ✅ | ❌ |
+| Fivetran Browser Upload | ✅ | ✅ | ✅ | ✅ |
+| Fivetran Google Sheets connector | ✅ | ❌ | ❌ | ✅ |
+| Fivetran Google Drive connector | ❌ | ❌ | ✅ | ✅ |
+| Stitch Google Sheets integration | ✅ | ❌ | ❌ | ✅ |
+| Airbyte Google Sheets connector | ✅ | ❌ | ❌ | ✅ |
+
+### Fivetran browser upload
+
+[Fivetran’s browser uploader](https://fivetran.com/docs/files/browser-upload) does exactly what it says on the tin: you upload a file to their web portal and it creates a table containing that data in a predefined schema in your warehouse. With a visual interface to modify data types, it’s easy for anyone to use. And with an account type with the permission to only upload files, you don’t need to worry about your stakeholders accidentally breaking anything either.
+
+
+
+
+
+A nice benefit of the uploader is support for updating data in the table over time. If a file with the same name and same columns is uploaded, any new records will be added, and existing records (per the ) will be updated.
+
+However, keep in mind that there is no source control on these changes or a to revert them; you might want to consider [snapshotting changes](https://docs.getdbt.com/docs/building-a-dbt-project/snapshots) in dbt if that’s a concern.
+
+Also, Fivetran won’t delete records once they’re created, so the only way to remove records created using this process is by manually [deleting](https://docs.getdbt.com/terms/dml#delete) them from your warehouse. If you have an ad-hoc connector, consider having an automated process to drop these tables regularly, especially if you have PII management concerns.
+
+#### Good fit for:
+
+- Files that are frequently updated by someone
+- Allowing anyone in the company to upload files
+- Ad-hoc data loads
+- Updating a table instead of creating a new one
+- Basic data type changes (including handling currency columns)
+- Larger files
+
+#### Not a good fit for:
+
+- Tracking changes to data
+- Complex type mappings
+
+### Fivetran Google Sheets connector
+
+The main benefit of connecting to Google Sheets instead of a static spreadsheet should be obvious—teammates can change the sheet from anywhere and new records will be loaded into your warehouse automatically. [Fivetran’s Google Sheets connector](https://fivetran.com/docs/files/google-sheets) requires some additional initial configuration, but collaborative editing can make the effort worthwhile.
+
+Instead of syncing all cells in a sheet, you create a [named range](https://fivetran.com/docs/files/google-sheets/google-sheets-setup-guide) and connect Fivetran to that range. Each Fivetran connector can only read a single range—if you have multiple tabs then you’ll need to create multiple connectors, each with its own schema and table in the target warehouse. When a sync takes place, it will [truncate](https://docs.getdbt.com/terms/ddl#truncate) and reload the table from scratch as there is no primary key to use for matching.
+
+
+
+Beware of inconsistent data types though—if someone types text into a column that was originally numeric, Fivetran will automatically convert the column to a string type which might cause issues in your downstream transformations. [The recommended workaround](https://fivetran.com/docs/files/google-sheets#typetransformationsandmapping) is to explicitly cast your types in [staging models](https://docs.getdbt.com/guides/best-practices/how-we-structure/2-staging) to ensure that any undesirable records are converted to null.
+
+#### Good fit for:
+
+- Large, long-lived documents
+- Files that are updated by many people (and somewhat often)
+
+#### Not a good fit for:
+
+- Ad-hoc loads—you need to create an entire schema for every connected spreadsheet, and preparing the sheet is a fiddly process
+- Tracking changes to data
+- Documents with many tabs
+
+### Fivetran Google Drive connector
+
+I’m a big fan of [Fivetran’s Google Drive connector](https://fivetran.com/docs/files/google-drive); in the past I’ve used it to streamline a lot of weekly reporting. It allows stakeholders to use a tool they’re already familiar with (Google Drive) instead of dealing with another set of credentials. Every file uploaded into a specific folder on Drive (or [Box, or consumer Dropbox](https://fivetran.com/docs/files/magic-folder)) turns into a table in your warehouse.
+
+
+
+Like the Google Sheets connector, the data types of the columns are determined automatically. Dates, in particular, are finicky though—if you can control your input data, try to get it into [ISO 8601 format](https://xkcd.com/1179/) to minimize the amount of cleanup you have to do on the other side.
+
+I used two macros in the dbt_utils package ([get_relations_by_pattern](https://github.com/dbt-labs/dbt-utils#get_relations_by_pattern-source) and [union_relations](https://github.com/dbt-labs/dbt-utils#union_relations-source)) to combine weekly exports from other tools into a single [model](https://docs.getdbt.com/docs/building-a-dbt-project/building-models) for easy cleanup in a staging model. Make sure you grant your transformer account permission to access all tables in the schema (including future ones) to avoid having to manually intervene after every new file is uploaded.
+
+#### Good fit for:
+
+- Allowing anyone in the company to upload files
+- Weekly exports from another tool
+- Large files
+- Many files (each will be created as another table in a single schema, unlike the Google Sheets integration)
+
+#### Not a good fit for:
+
+- Data that needs to be updated after load
+- Custom type mappings (without further processing in dbt)
+
+### Stitch Google Sheets integration
+
+[The Google Sheets integration by Stitch](https://www.stitchdata.com/docs/integrations/saas/google-sheets) is a little more straightforward to set up than Fivetran’s as it imports the entire sheet without requiring you to configure named ranges. Beyond that, it works in the same way with the same benefits and the same drawbacks.
+
+#### Good fit for:
+
+- Large, long-lived documents
+- Files that are updated by many people
+
+#### Not a good fit for:
+
+- Ad-hoc loads—you need to create an entire schema for every connected spreadsheet
+- Tracking changes to data
+- Documents with many tabs
+
+### Airbyte Google Sheets connector
+
+Airbyte, an open source and cloud ETL tool, [supports a Google Sheets source connector](https://airbytehq.github.io/integrations/sources/google-sheets/) very similar to Stitch’s and Fivetran’s integration. You’ll need to authenticate your Google Account using an OAuth or a service account key and provide the link of the Google Sheet you want to pull into your data warehouse. Note that all sheet columns are loaded as strings, so you will need to explicitly cast them in a downstream model. Airbyte’s connector here also supports both full refreshes and appends.
+
+#### Good fit for:
+
+- Large, long-lived documents
+- Files that are updated by many people
+- Teams that may be on a budget
+
+#### Not a good fit for:
+
+- Non-string type data you want preserved in your raw source tables in your data warehouse
+
+## Native warehouse integrations
+
+Each of the major data warehouses also has native integrations to import spreadsheet data. While the fundamentals are the same, there are some differences amongst the various warehousing vendors.
+
+### Snowflake
+
+Snowflake’s options are robust and user-friendly, offering both a [web-based loader](https://docs.snowflake.com/en/user-guide/data-load-web-ui.html) as well as [a bulk importer](https://docs.snowflake.com/en/user-guide/data-load-bulk.html). The web loader is suitable for small to medium files (up to 50MB) and can be used for specific files, all files in a folder, or files in a folder that match a given pattern. It’s also the most provider-agnostic, with support for Amazon S3, Google Cloud Storage, Azure and the local file system.
+
+
+
+### BigQuery
+
+BigQuery only supports importing data from external sources hosted by Google such as Google Drive and Google Cloud Storage (as BigQuery and Sheets are both Google products, BigQuery is the only platform on this list that has a native integration that doesn't require 3rd-party tooling). The data it references isn’t copied into BigQuery but can be referenced in queries as though it was. If needed, you can write a copy to BigQuery or just leave it as an external source. The team at supercooldata has written [a great how-to guide on setting up Google Sheets with BigQuery](https://blog.supercooldata.com/working-with-sheets-in-bigquery/).
+
+### Redshift
+
+Unsurprisingly for an AWS product, Redshift prefers to [import CSV files from S3](https://docs.aws.amazon.com/redshift/latest/dg/tutorial-loading-data.html). As with Snowflake, this is achieved with the COPY command, and you can easily control which file(s) are imported from the source bucket. Using S3 as a source compared to a web-based loader or Google Drive means this option isn’t as user-friendly for non-technical folks, but is still a great option to sync files that are automatically generated from other tools.
+
+### Databricks
+
+Databricks also supports [pulling in data, such as spreadsheets, from external cloud sources](https://docs.databricks.com/external-data/index.html) like Amazon S3 and Google Cloud Storage. In addition, the ability to [load data via a simple UI](https://docs.databricks.com/ingestion/add-data/index.html) within Databricks is currently in public preview.
+
+## Conclusion
+
+Beyond the options we’ve already covered, there’s an entire world of other tools that can load data from your spreadsheets into your data warehouse. This is a living document, so if your preferred method isn't listed then please [open a PR](https://github.com/dbt-labs/docs.getdbt.com) and I'll check it out.
+
+The most important things to consider are your files’ origins and formats—if you need your colleagues to upload files on a regular basis then try to provide them with a more user-friendly process; but if you just need two computers to talk to each other, or it’s a one-off file that will hardly ever change, then a more technical integration is totally appropriate.
\ No newline at end of file
diff --git a/website/blog/2022-11-30-dbt-project-evaluator.md b/website/blog/2022-11-30-dbt-project-evaluator.md
new file mode 100644
index 00000000000..0ab3c5d2b31
--- /dev/null
+++ b/website/blog/2022-11-30-dbt-project-evaluator.md
@@ -0,0 +1,123 @@
+---
+title: "Introducing the dbt_project_evaluator: Automatically evaluate your dbt project for alignment with best practices "
+description: "The dbt_project_evaluator is a dbt package created by the Professional Services team at dbt Labs to help analytics engineers automatically audit their dbt projects for bad practices. Goodbye auditing nightmares, hello beautiful DAG."
+slug: align-with-dbt-project-evaluator
+
+authors: [grace_goheen]
+
+tags: [analytics craft]
+hide_table_of_contents: false
+
+date: 2022-11-30
+is_featured: true
+---
+
+## Why we built this: A brief history of the dbt Labs Professional Services team
+
+If you attended [Coalesce 2022](https://www.youtube.com/watch?v=smbRwmcM1Ok), you’ll know that the secret is out — the dbt Labs Professional Services team is not just [a group of experienced data consultants](https://www.getdbt.com/dbt-labs/services/); we’re also an intergalactic group of aliens traveling the Milky Way on a mission to enable analytics engineers to successfully adopt and manage dbt throughout the galaxy.
+
+
+
+Don’t believe me??? Here’s photographic proof.
+
+
+
+Since the inception of dbt Labs, our team has been embedded with a variety of different data teams — from an over-stretched-data-team-of-one to a data-mesh-multiverse.
+
+Throughout these engagements, we began to take note of the common issues many analytics engineers face when scaling their dbt projects:
+
+- No alerts when data models produce incorrect outputs
+- Long execution times when building or querying a model
+- Duplicated code and differing metric definitions across teams
+- Lack of knowledge of what a model or field represents
+- Wasted developer time locating and reading through messy SQL files
+
+Maybe your team is facing some of these issues right now 👀 And that’s okay! We know that building an effective, scalable dbt project takes a lot of effort and brain power. Maybe you’ve inherited a legacy dbt project with a mountain of tech debt. Maybe you’re starting from scratch. Either way it can be difficult to know the best way to set your team up for success. Don’t worry, you’re in the right place!
+
+Through solving these problems over and over, the Professional Services team began to hone our best practices for working with dbt and how analytics engineers could improve their dbt project. We added “solutions reviews'' to our list of service offerings — client engagements in which we evaluate a given dbt project and provide specific recommendations to improve performance, save developer time, and prevent misuse of dbt’s features. And in an effort to share these best practices with the wider dbt community, we developed a *lot* of content. We wrote articles on the Developer Blog (see [1](https://docs.getdbt.com/blog/on-the-importance-of-naming), [2](https://discourse.getdbt.com/t/your-essential-dbt-project-checklist/1377), and [3](https://docs.getdbt.com/guides/best-practices/how-we-structure/1-guide-overview)), gave [Coalesce talks](https://www.getdbt.com/coalesce-2020/auditing-model-layers-and-modularity-with-your-dag/), and created [training courses](https://courses.getdbt.com/courses/refactoring-sql-for-modularity).
+
+TIme and time again, we found that when teams are aligned with these best practices, their projects are more:
+
+- **U**sable: Data outputs are reliable with proper alerting in place
+- **F**ast: Jobs are more efficient without long-running model bottlenecks
+- **O**rganized: Developers can quickly find, read, and understand the code they need to update
+- **S**calable: No more "black holes", duplicated code is eliminated allowing your project to grow with ease
+
+Even with all of these great resources, evaluating a dbt project still took considerable upfront development time to discover exactly where and how to apply these best practices.
+
+**That’s when we came up with a space-altering idea: what if we could compress all of our ideas about best practices into a single, actionable tool to automate the process of discovering these misalignments, so that analytics engineers could immediately understand exactly where their projects deviated from our best practices and *be empowered to improve their projects on their own*.**
+
+Flash forward through a six month long development process…
+
+The [dbt_project_evaluator](https://github.com/dbt-labs/dbt-project-evaluator) was born: a dbt package that uses the shared language of SQL, models, and tests to identify and assert specific recommendations for a given dbt project.
+
+## How the `dbt_project_evaluator` package works
+
+When you install and run this package in your own dbt project, it will:
+
+1. Convert the [graph](https://docs.getdbt.com/reference/dbt-jinja-functions/graph) object — which is a variable that contains information about the nodes in your dbt project — into a query-able table. This enables us to write SQL queries against a tabular representation of your .
+2. Capture each misalignment of an established “best practice” in a dbt model.
+3. Test these new models to alert you to the presence of misalignments in your dbt project.
+
+Currently, the dbt_project_evaluator package covers five main categories:
+
+| Category | Example Best Practices |
+| --- | --- |
+| Modeling | - Every [raw source](https://docs.getdbt.com/docs/build/sources) has a one-to-one relationship with a [staging model](https://docs.getdbt.com/guides/best-practices/how-we-structure/1-guide-overview) to centralize data cleanup. - Every model can be traced back to a declared source in the dbt project (i.e. no "root" models). - End-of-DAG fanout remains under a specified threshold. |
+| Testing | - Every model has a that is appropriately tested. - The percentage of models that have minimum 1 test applied is greater than or equal to a specified threshold. |
+| Documentation | - Every model has a [description](https://docs.getdbt.com/reference/resource-properties/description). - The percentage of models that have a description is greater than or equal to a specified threshold. |
+| Structure | - All models are named with the appropriate prefix aligned according to their model types (e.g. staging models are prefixed with `stg_`). - The sql file for each model is in the subdirectory aligned with the model type (e.g. intermediate models are in an [intermediate subdirectory](https://docs.getdbt.com/guides/best-practices/how-we-structure/3-intermediate)). - Each models subdirectory contains one .yml file that includes tests and documentation for all models within the given subdirectory. |
+| Performance | - Every model that directly feeds into an [exposure](https://docs.getdbt.com/docs/build/exposures) is materialized as a . - No models are dependent on chains of "non-physically-materialized" models greater than a specified threshold. |
+
+For the full up-to-date list of covered rules, check out the package’s [README](https://github.com/dbt-labs/dbt-project-evaluator#rules-1), which outlines for each misalignment of a best practice:
+
+- Definition and clarifying example
+- Reason for flagging the misalignment
+- Any known exceptions to the rule
+- How to remediate the issue
+
+There might be specific situations where you need to depart from our best practices. *That’s actually okay*, as long as you’ve reviewed the misalignment and made the active choice to do something different. We built this tool with simple mechanisms to customize the package behavior, including:
+
+- Disabling a package model to exclude a best practice from the entire evaluation process
+- Overriding variables to adjust *how* a best practice is evaluated
+- Documenting specific project exceptions to a best practice in a seed file
+
+For instructions and code snippets for each customization method, check out the [README](https://github.com/dbt-labs/dbt-project-evaluator#customization-1).
+
+## Try it out!
+
+To try out the package in your own project:
+
+1. **Install the package**: Check [dbt Hub](https://hub.getdbt.com/dbt-labs/dbt_project_evaluator/latest/) for the latest installation instructions, or read [the docs](https://docs.getdbt.com/docs/build/packages) for more information on installing packages.
+2. **Run and test all of the models in the package**: Execute a `dbt build --select package:dbt_project_evaluator` command.
+3. **Identify any warnings**: Each test warning indicates the presence of a type of misalignment.
+
+For *each warning* that pops up:
+
+1. Identify the model name.
+2. Locate the related documentation in the package [README](https://github.com/dbt-labs/dbt-project-evaluator#rules-1).
+3. Query the model to find the specific instances of the issue within your project.
+4. Either fix the issue(s) or [customize](https://github.com/dbt-labs/dbt-project-evaluator#customization-1) the package to exclude the issue(s).
+
+In order to automatically maintain project quality as your team expands, you can enforce alignment with dbt Lab’s best practices on all future code changes by [adding this package as a CI check](https://github.com/dbt-labs/dbt-project-evaluator#running-this-package-as-a-ci-check-1). Every time one of your team members (or yourself) opens a PR, the CI check will automatically ensure that new code changes don’t introduce new misalignments.
+
+You can think of this as “linting” your dbt project to keep it aligned with our best practices — in the same way you might lint your SQL code to keep it aligned with your style guide.
+
+To add this package as a CI check:
+
+1. Override the severity of your tests using an [environment variable](https://docs.getdbt.com/docs/build/environment-variables).
+2. Run this package as a step in your CI job.
+
+To watch a full demo of using this package in greater detail, make sure to check out [my Coalesce talk below](https://youtu.be/smbRwmcM1Ok) [demo starts at 7:35].
+
+
+
+
+
+
+
+If something isn’t working quite right or you have ideas for future functionality, [open an issue in the Github repository](https://github.com/dbt-labs/dbt-project-evaluator/issues) or even contribute code of your own!
+
+Together, we can ensure that dbt projects across the galaxy are set up for success as they grow to infinity and beyond.
+
+
\ No newline at end of file
diff --git a/website/blog/authors.yml b/website/blog/authors.yml
index 7bea8c29395..b8437601b5b 100644
--- a/website/blog/authors.yml
+++ b/website/blog/authors.yml
@@ -354,4 +354,19 @@ brittany_krauth:
- url: https://www.linkedin.com/in/brittanykrauth
icon: fa-linkedin
+charlie_summers:
+ name: Charlie Summers
+ job_title: Staff Software Engineer
+ description: Charlie is the Data Engineer Tech Lead at Merit. He introduced Merit to dbt and it's been a fantastic fit for a wide variety of data pipelines. He likes thinking about the future of data - integrating event streams, analyzing encrypted data, capturing fine-grained lineage, and making it easy to build simple apps on top of data warehouses/lakes.
+ organization: Merit
+ image_url: /img/blog/authors/charlie-summers.jpeg
+ links:
+ - url: https://www.linkedin.com/in/charliesummers
+ icon: fa-linkedin
+wasila_quader:
+ name: Wasila Quader
+ job_title: Associate Analytics Engineer
+ description: After a winding road through healthcare spreadsheets and data science projects, Wasila discovered analytics engineering as an apprentice of dbt Labs' Foundry Program. She now works as an analytics engineer on dbt Labs' professional services team.
+ organization: dbt Labs
+ image_url: /img/blog/authors/wasila-quader.png
diff --git a/website/dbt-versions.js b/website/dbt-versions.js
index 008f81cedcd..03f2721e42d 100644
--- a/website/dbt-versions.js
+++ b/website/dbt-versions.js
@@ -13,7 +13,7 @@ exports.versions = [
},
{
version: "1.0",
- EOLDate: "2023-12-03"
+ EOLDate: "2022-12-03"
},
{
version: "0.21",
diff --git a/website/docs/community/resources/code-of-conduct.md b/website/docs/community/resources/code-of-conduct.md
index 7af49279e83..6788f3ae39f 100644
--- a/website/docs/community/resources/code-of-conduct.md
+++ b/website/docs/community/resources/code-of-conduct.md
@@ -68,6 +68,8 @@ Ways to demonstrate this value:
- Share things you have learned on Discourse
- Host events
+Be mindful that others may not want their image or name on social media, and when attending or hosting an in-person event, ask permission prior to posting about another person.
+
### Be curious.
Always ask yourself “why?” and strive to be continually learning.
diff --git a/website/docs/docs/build/metrics.md b/website/docs/docs/build/metrics.md
index 94c52811934..681aec63dca 100644
--- a/website/docs/docs/build/metrics.md
+++ b/website/docs/docs/build/metrics.md
@@ -69,7 +69,7 @@ metrics:
expression: user_id
timestamp: signup_date
- time_grains: [day, week, month, quarter, year]
+ time_grains: [day, week, month, quarter, year, all_time]
dimensions:
- plan
@@ -123,7 +123,7 @@ metrics:
sql: user_id
timestamp: signup_date
- time_grains: [day, week, month, quarter, year]
+ time_grains: [day, week, month, quarter, year, all_time]
dimensions:
- plan
@@ -165,14 +165,14 @@ Metrics can have many declared **properties**, which define aspects of your metr
|-------------|-------------------------------------------------------------|---------------------------------|-----------|
| name | A unique identifier for the metric | new_customers | yes |
| model | The dbt model that powers this metric | dim_customers | yes (no for `derived` metrics)|
-| label | A short for name / label for the metric | New Customers | no |
+| label | A short for name / label for the metric | New Customers | yes |
| description | Long form, human-readable description for the metric | The number of customers who.... | no |
| calculation_method | The method of calculation (aggregation or derived) that is applied to the expression | count_distinct | yes |
| expression | The expression to aggregate/calculate over | user_id, cast(user_id as int) | yes |
| timestamp | The time-based component of the metric | signup_date | yes |
| time_grains | One or more "grains" at which the metric can be evaluated. For more information, see the "Custom Calendar" section. | [day, week, month, quarter, year] | yes |
| dimensions | A list of dimensions to group or filter the metric by | [plan, country] | no |
-| window | A dictionary for aggregating over a window of time. Used for rolling metrics such as 14 day rolling average. Acceptable periods are: [`day`,`week`,`month`, `year`] | {count: 14, period: day} | no |
+| window | A dictionary for aggregating over a window of time. Used for rolling metrics such as 14 day rolling average. Acceptable periods are: [`day`,`week`,`month`, `year`, `all_time`] | {count: 14, period: day} | no |
| filters | A list of filters to apply before calculating the metric | See below | no |
| config | [Optional configurations](https://github.com/dbt-labs/dbt_metrics#accepted-metric-configurations) for calculating this metric | {treat_null_values_as_zero: true} | no |
| meta | Arbitrary key/value store | {team: Finance} | no |
@@ -185,12 +185,12 @@ Metrics can have many declared **properties**, which define aspects of your metr
|-------------|-------------------------------------------------------------|---------------------------------|-----------|
| name | A unique identifier for the metric | new_customers | yes |
| model | The dbt model that powers this metric | dim_customers | yes (no for `derived` metrics)|
-| label | A short for name / label for the metric | New Customers | no |
+| label | A short for name / label for the metric | New Customers |yes |
| description | Long form, human-readable description for the metric | The number of customers who.... | no |
| type | The method of calculation (aggregation or derived) that is applied to the expression | count_distinct | yes |
| sql | The expression to aggregate/calculate over | user_id, cast(user_id as int) | yes |
| timestamp | The time-based component of the metric | signup_date | yes |
-| time_grains | One or more "grains" at which the metric can be evaluated | [day, week, month, quarter, year] | yes |
+| time_grains | One or more "grains" at which the metric can be evaluated | [day, week, month, quarter, year, all_time] | yes |
| dimensions | A list of dimensions to group or filter the metric by | [plan, country] | no |
| filters | A list of filters to apply before calculating the metric | See below | no |
| meta | Arbitrary key/value store | {team: Finance} | no |
@@ -252,7 +252,7 @@ metrics:
expression: "{{metric('total_revenue')}} / {{metric('count_of_customers')}}"
timestamp: order_date
- time_grains: [day, week, month, quarter, year]
+ time_grains: [day, week, month, quarter, year, all_time]
dimensions:
- had_discount
- order_country
@@ -293,7 +293,7 @@ metrics:
sql: "{{metric('total_revenue')}} / {{metric('count_of_customers')}}"
timestamp: order_date
- time_grains: [day, week, month, quarter, year]
+ time_grains: [day, week, month, quarter, year, all_time]
dimensions:
- had_discount
- order_country
@@ -399,7 +399,7 @@ You may find some pieces of functionality, like secondary calculations, complica
| Input | Example | Description | Required |
| ----------- | ----------- | ----------- | -----------|
| metric_listmetric_name | `metric('some_metric)'`, [`metric('some_metric)'`, `metric('some_other_metric)'`] `'metric_name'` | The metric(s) to be queried by the macro. If multiple metrics required, provide in list format.The name of the metric | Required |
-| grain | `'day'`, `'week'`, `'month'`, `'quarter'`, `'year'` | The time grain that the metric will be aggregated to in the returned dataset | Required |
+| grain | `'day'`, `'week'`, `'month'`, `'quarter'`, `'year'`, `'all_time'` | The time grain that the metric will be aggregated to in the returned dataset | Required |
| dimensions | [`'plan'`, `'country'`] | The dimensions you want the metric to be aggregated by in the returned dataset | Optional |
| secondary_calculations | [`metrics.period_over_period( comparison_strategy="ratio", interval=1, alias="pop_1wk")`] | Performs the specified secondary calculation on the metric results. Examples include period over period calculations, rolling calcultions, and period to date calculations. | Optional |
| start_date | `'2022-01-01'` | Limits the date range of data used in the metric calculation by not querying data before this date | Optional |
@@ -427,7 +427,7 @@ metrics:
model: ref('fact_orders')
label: Total Discount ($)
timestamp: order_date
- time_grains: [day, week, month, quarter, year]
+ time_grains: [day, week, month, quarter, year, all_time]
calculation_method: average
expression: discount_total
dimensions:
@@ -467,7 +467,7 @@ metrics:
model: ref('fact_orders')
label: Total Discount ($)
timestamp: order_date
- time_grains: [day, week, month, quarter, year]
+ time_grains: [day, week, month, quarter, year, all_time]
type: average
sql: discount_total
dimensions:
diff --git a/website/docs/docs/build/packages.md b/website/docs/docs/build/packages.md
index ac05724e836..3a77ce310b4 100644
--- a/website/docs/docs/build/packages.md
+++ b/website/docs/docs/build/packages.md
@@ -35,7 +35,7 @@ packages:
version: 0.7.0
- git: "https://github.com/dbt-labs/dbt-utils.git"
- revision: 0.1.21
+ revision: 0.9.2
- local: /opt/dbt/redshift
```
@@ -119,7 +119,7 @@ Packages stored on a Git server can be installed using the `git` syntax, like so
```yaml
packages:
- git: "https://github.com/dbt-labs/dbt-utils.git" # git URL
- revision: 0.1.21 # tag or branch name
+ revision: 0.9.2 # tag or branch name
```
diff --git a/website/docs/docs/build/sources.md b/website/docs/docs/build/sources.md
index 08b0d793d5c..e5802ada6db 100644
--- a/website/docs/docs/build/sources.md
+++ b/website/docs/docs/build/sources.md
@@ -1,6 +1,7 @@
---
title: "Sources"
id: "sources"
+search_weight: "heavy"
---
## Related reference docs
diff --git a/website/docs/docs/building-a-dbt-project/building-models/python-models.md b/website/docs/docs/building-a-dbt-project/building-models/python-models.md
index 6a283e5dff8..4c25da2a10d 100644
--- a/website/docs/docs/building-a-dbt-project/building-models/python-models.md
+++ b/website/docs/docs/building-a-dbt-project/building-models/python-models.md
@@ -521,8 +521,8 @@ def model(dbt, session):
#### Code reuse
-Currently, Python functions defined in one dbt model cannot be imported and reused in other models. This is something we'd like dbt to support. There are two patterns we're considering:
-1. Creating and registering **"named" UDFs**. This process is different across data platforms and has some performance limitations. (Snowpark does support ["vectorized" UDFs](https://docs.snowflake.com/en/developer-guide/udf/python/udf-python-batch.html): Pandas-like functions that can be executed in parallel.)
+Currently, you cannot import or reuse Python functions defined in one dbt model, in other models. This is something we'd like dbt to support. There are two patterns we're considering:
+1. Creating and registering **"named" UDFs**. This process is different across data platforms and has some performance limitations. (Snowpark does support ["vectorized" UDFs](https://docs.snowflake.com/en/developer-guide/udf/python/udf-python-batch.html): pandas-like functions that you can execute in parallel.)
2. Using **private Python packages**. In addition to importing reusable functions from public PyPI packages, many data platforms support uploading custom Python assets and registering them as packages. The upload process looks different across platforms, but your code’s actual `import` looks the same.
:::note ❓ Our questions
@@ -544,16 +544,16 @@ That's about where the agreement ends. There are numerous frameworks with their
When developing a Python model, you will find yourself asking these questions:
-**Why Pandas?** It's the most common API for DataFrames. It makes it easy to explore sampled data and develop transformations locally. You can “promote” your code as-is into dbt models and run it in production for small datasets.
+**Why pandas?** It's the most common API for DataFrames. It makes it easy to explore sampled data and develop transformations locally. You can “promote” your code as-is into dbt models and run it in production for small datasets.
-**Why _not_ Pandas?** Performance. Pandas runs "single-node" transformations, which cannot benefit from the parallelism and distributed computing offered by modern data warehouses. This quickly becomes a problem as you operate on larger datasets. Some data platforms support optimizations for code written using Pandas' DataFrame API, preventing the need for major refactors. For example, ["Pandas on PySpark"](https://spark.apache.org/docs/latest/api/python/getting_started/quickstart_ps.html) offers support for 95% of Pandas functionality, using the same API while still leveraging parallel processing.
+**Why _not_ pandas?** Performance. pandas runs "single-node" transformations, which cannot benefit from the parallelism and distributed computing offered by modern data warehouses. This quickly becomes a problem as you operate on larger datasets. Some data platforms support optimizations for code written using pandas' DataFrame API, preventing the need for major refactors. For example, ["pandas on PySpark"](https://spark.apache.org/docs/latest/api/python/getting_started/quickstart_ps.html) offers support for 95% of pandas functionality, using the same API while still leveraging parallel processing.
:::note ❓ Our questions
-- When developing a new dbt Python model, should we recommend Pandas-style syntax for rapid iteration and then refactor?
+- When developing a new dbt Python model, should we recommend pandas-style syntax for rapid iteration and then refactor?
- Which open source libraries provide compelling abstractions across different data engines and vendor-specific APIs?
- Should dbt attempt to play a longer-term role in standardizing across them?
-💬 Discussion: ["Python models: the Pandas problem (and a possible solution)"](https://github.com/dbt-labs/dbt-core/discussions/5738)
+💬 Discussion: ["Python models: the pandas problem (and a possible solution)"](https://github.com/dbt-labs/dbt-core/discussions/5738)
:::
### Limitations
@@ -574,7 +574,7 @@ In their initial launch, Python models are supported on three of the most popula
-**Additional setup:** Snowpark Python is in Public Preview - Open and enabled by default for all accounts. You will need to [acknowledge and accept Snowflake Third Party Terms](https://docs.snowflake.com/en/developer-guide/udf/python/udf-python-packages.html#getting-started) to use Anaconda packages.
+**Additional setup:** You will need to [acknowledge and accept Snowflake Third Party Terms](https://docs.snowflake.com/en/developer-guide/udf/python/udf-python-packages.html#getting-started) to use Anaconda packages.
**Installing packages:** Snowpark supports several popular packages via Anaconda. The complete list is at https://repo.anaconda.com/pkgs/snowflake/. Packages are installed at the time your model is being run. Different models can have different package dependencies. If you are using third-party packages, Snowflake recommends using a dedicated virtual warehouse for best performance rather than one with many concurrent users.
diff --git a/website/docs/docs/collaborate/environments.md b/website/docs/docs/collaborate/environments.md
index d072a188609..c611056c9e1 100644
--- a/website/docs/docs/collaborate/environments.md
+++ b/website/docs/docs/collaborate/environments.md
@@ -24,3 +24,7 @@ You can learn more about different ways to run dbt in production in [this articl
Targets offer the flexibility to decide how to implement your separate environments – whether you want to use separate schemas, databases, or entirely different clusters altogether! We recommend using _different schemas within one data warehouse_ to separate your environments. This is the easiest to set up, and is the most cost effective solution in a modern cloud-based data stack.
In practice, this means that most of the details in a target will be consistent across all targets, except for the `schema` and user credentials. If you have multiple dbt users writing code, it often makes sense for _each user_ to have their own _development_ environment. A pattern we've found useful is to set your dev target schema to be `dbt_`. User credentials should also differ across targets so that each dbt user is using their own data warehouse user.
+
+## Related docs
+- [About dbt Core versions](/docs/dbt-versions/core)
+- [Upgrade Core version in Cloud](/docs/dbt-versions/upgrade-core-in-cloud)
diff --git a/website/docs/docs/collaborate/git/connect-azure-devops.md b/website/docs/docs/collaborate/git/connect-azure-devops.md
index 0997c4b0228..22ecd12bbbf 100644
--- a/website/docs/docs/collaborate/git/connect-azure-devops.md
+++ b/website/docs/docs/collaborate/git/connect-azure-devops.md
@@ -8,7 +8,7 @@ id: "connect-azure-devops"
## About Azure DevOps and dbt Cloud
-You can connect your Azure DevOps account in dbt Cloud to unlock new product experiences:
+Connect your Azure DevOps cloud account in dbt Cloud to unlock new product experiences:
- Import new Azure DevOps repos with a couple clicks during dbt Cloud project setup.
- Clone repos using HTTPS rather than SSH
@@ -16,7 +16,10 @@ You can connect your Azure DevOps account in dbt Cloud to unlock new product exp
- Carry Azure DevOps user repository permissions (read / write access) through to dbt Cloud IDE's git actions.
- Trigger Continuous integration (CI) builds when pull requests are opened in Azure DevOps.
+
To connect Azure DevOps in dbt Cloud:
1. An account admin needs to [set up an Active Directory application and add it to dbt Cloud](/docs/collaborate/git/setup-azure).
2. dbt Cloud developers need to [personally authenticate with Azure DevOps](/docs/collaborate/git/authenticate-azure) from dbt Cloud.
+
+
diff --git a/website/docs/docs/collaborate/git/setup-azure.md b/website/docs/docs/collaborate/git/setup-azure.md
index 084e0291ec2..a4aa73b6aef 100644
--- a/website/docs/docs/collaborate/git/setup-azure.md
+++ b/website/docs/docs/collaborate/git/setup-azure.md
@@ -87,7 +87,7 @@ Once you connect your Azure AD app and Azure DevOps, you need to provide dbt Clo
2. Select **Integrations**.
3. Scroll to the Azure DevOps section.
4. Complete the form:
- - **Azure DevOps Organization:** Must match the name of your Azure DevOps organization exactly.
+ - **Azure DevOps Organization:** Must match the name of your Azure DevOps organization exactly. Do not include the `dev.azure.com/` prefix in this field. ✅ Use `my-devops-org` ❌ Avoid `dev.azure.com/my-devops-org`
- **Application (client) ID:** Found in the Azure AD App.
- **Client Secrets:** You need to first create a secret in the Azure AD App under **Client credentials**. Make sure to copy the **Value** field in the Azure AD App and paste it in the **Client Secret** field in dbt Cloud. You are responsible for the Azure AD app secret expiration and rotation.
- **Directory(tenant) ID:** Found in the Azure AD App.
diff --git a/website/docs/docs/collaborate/manage-access/audit-log.md b/website/docs/docs/collaborate/manage-access/audit-log.md
index e64969050a3..78d59d9a0a2 100644
--- a/website/docs/docs/collaborate/manage-access/audit-log.md
+++ b/website/docs/docs/collaborate/manage-access/audit-log.md
@@ -5,12 +5,15 @@ description: "You can troubleshoot possible issues and provide security audits b
sidebar_label: "Audit log"
---
-To review actions performed by people in your organization, dbt provides logs of audited user and system events. The dbt Cloud audit log lists events triggered in your organization within the last 90 days.
-
-Use the audit log to quickly review the actions performed by members of your organization. The audit log includes details such as who performed the action, what the action was, and when it was performed. You can use these details to troubleshoot access issues, perform security audits, or analyze specific events.
+To review actions performed by people in your organization, dbt provides logs of audited user and system events. You can use the audit log to quickly review the actions performed by members of your organization. The audit log includes details such as who performed the action, what the action was, and when it was performed. You can use these details to troubleshoot access issues, perform security audits, or analyze specific events.
You must be an **Account Admin** to access the audit log and this feature is only available on Enterprise plans.
+The dbt Cloud audit log stores all the events that occurred in your organization:
+
+- For events within 90 days, the dbt Cloud audit log has a selectable date range that lists events triggered.
+- For events beyond 90 days, **Account Admins** can [export all events](#exporting-logs) by using **Export All**.
+
## Accessing the audit log
To access audit log, click the gear icon in the top right, then click **Audit Log**.
@@ -166,10 +169,11 @@ You can search the audit log to find a specific event or actor, which is limited
## Exporting logs
-You can use the audit log to export historical audit results for security, compliance, and analysis purposes. You can export data for up to the last 90 days. Click the **Export CSV** button to download a CSV file of all the events that occurred in your organization over the last 90 days.
+You can use the audit log to export all historical audit results for security, compliance, and analysis purposes:
-
+- For events within 90 days — dbt Cloud will automatically display the 90 days selectable date range. Select **Export Selection** to download a CSV file of all the events that occurred in your organization within 90 days.
+- For events beyond 90 days — Select **Export All**. The Account Admin will receive an email link to download a CSV file of all the events that occurred in your organization.
+
+
-
-
diff --git a/website/docs/docs/collaborate/manage-access/enterprise-permissions.md b/website/docs/docs/collaborate/manage-access/enterprise-permissions.md
index a9d1ef68e8c..7a0031d3c7a 100644
--- a/website/docs/docs/collaborate/manage-access/enterprise-permissions.md
+++ b/website/docs/docs/collaborate/manage-access/enterprise-permissions.md
@@ -38,12 +38,32 @@ Account Admins have unrestricted access to dbt Cloud accounts. Users with Accoun
- Create, delete, and modify Jobs
- Create, delete, and modify Groups
- Create, delete, and modify Group Memberships
-- Manage notification settings
+- Manage Notification Settings
- Manage account-level [artifacts](dbt-cloud/using-dbt-cloud/artifacts)
- View and modify Account Settings
- Use the IDE
- Run and cancel jobs
+### Project Creator
+- **Has permissions on:** Authorized projects, account-level settings
+- **License restrictions:** must have a developer license
+
+Project Creators have write and read-only access to dbt Cloud accounts, but do not have the permissions required to modify SSO settings and account integrations.
+
+Users with Project Creator permissions can:
+
+- View Account Settings
+- View and modify project users
+- Create, delete and modify all projects in an account
+- Create, delete, and modify Repositories
+- Create, delete, and modify Connections
+- Create, delete, and modify Environments
+- Create, delete, and modify Jobs
+- Use the IDE
+- Run and cancel jobs
+- View Groups
+- View Notification Settings
+
### Account Viewer
- **Has permissions on:** Authorized projects, account-level settings
@@ -58,7 +78,7 @@ Account Viewers have read only access to dbt Cloud accounts. Users with Account
- View Jobs
- View Groups
- View Group Memberships
-- View notification settings
+- View Notification Settings
- View account-level artifacts
### Admin
diff --git a/website/docs/docs/collaborate/manage-access/set-up-snowflake-oauth.md b/website/docs/docs/collaborate/manage-access/set-up-snowflake-oauth.md
index 677c01c93cd..270ceadf6c8 100644
--- a/website/docs/docs/collaborate/manage-access/set-up-snowflake-oauth.md
+++ b/website/docs/docs/collaborate/manage-access/set-up-snowflake-oauth.md
@@ -41,7 +41,7 @@ CREATE OR REPLACE SECURITY INTEGRATION DBT_CLOUD
| ENABLED | Required |
| OAUTH_CLIENT | Required |
| OAUTH_CLIENT_TYPE | Required |
-| OAUTH_REDIRECT_URI | Required. If dbt Cloud is deployed on-premises, use the domain name of your application instead of `cloud.getdbt.com` |
+| OAUTH_REDIRECT_URI | Required. Use the access URL that corresponds to your server [region](/docs/deploy/regions). If dbt Cloud is deployed on-premises, use the domain name of your application instead of the access URL. |
| OAUTH_ISSUE_REFRESH_TOKENS | Required |
| OAUTH_REFRESH_TOKEN_VALIDITY | Required. This configuration dictates the number of seconds that a refresh token is valid for. Use a smaller value to force users to re-authenticate with Snowflake more frequently. |
diff --git a/website/docs/docs/collaborate/manage-access/set-up-sso-azure-active-directory.md b/website/docs/docs/collaborate/manage-access/set-up-sso-azure-active-directory.md
index 5bf838b7dad..736eba16850 100644
--- a/website/docs/docs/collaborate/manage-access/set-up-sso-azure-active-directory.md
+++ b/website/docs/docs/collaborate/manage-access/set-up-sso-azure-active-directory.md
@@ -49,6 +49,8 @@ need to select the appropriate directory and then register a new application.
| Single-Tenant _(recommended)_ | `https://cloud.getdbt.com/complete/azure_single_tenant` |
| Multi-Tenant | `https://cloud.getdbt.com/complete/azure_multi_tenant` |
+*Note:* If your dbt account instance is a VPC deployment or is based [outside the US](/docs/deploy/regions), your login URL will use the domain supplied to you by your dbt Labs account team, instead of the domain `cloud.getdbt.com`.
+
5. Save the App registration to continue setting up Azure AD SSO
@@ -163,7 +165,7 @@ by navigating to the URL:
`https://cloud.getdbt.com/enterprise-login/`
:::
-*Note:* If your dbt account is a VPC deployment, your login URL will use the domain supplied to you by your dbt Labs account team, instead of the domain `cloud.getdbt.com`.
+*Note:* If your dbt account instance is a VPC deployment or is [based outside the US](/docs/deploy/regions), your login URL will use the domain supplied to you by your dbt Labs account team, instead of the domain `cloud.getdbt.com`.
## Setting up RBAC
diff --git a/website/docs/docs/collaborate/manage-access/set-up-sso-google-workspace.md b/website/docs/docs/collaborate/manage-access/set-up-sso-google-workspace.md
index 9c343953c5f..3aba99f2bd0 100644
--- a/website/docs/docs/collaborate/manage-access/set-up-sso-google-workspace.md
+++ b/website/docs/docs/collaborate/manage-access/set-up-sso-google-workspace.md
@@ -90,13 +90,6 @@ and ensure that the API is enabled.
To complete setup, follow the steps below in the dbt Cloud application.
-### Enable GSuite Native Auth (beta)
-
-- For users accessing dbt Cloud at cloud.getdbt.com, contact your account manager to
- gain access to the GSuite Native auth configuration UI
-- For users accessing dbt Cloud deployed in a VPC, enable the `native_gsuite`
- feature flag in the dbt Cloud admin backend.
-
### Supply your OAuth Client ID and Client Secret
1. Navigate to the **Enterprise > Single Sign On** page under Account
diff --git a/website/docs/docs/dbt-versions/release-notes/04-Nov-2022/ide-features-ide-deprecation.md b/website/docs/docs/dbt-versions/release-notes/04-Nov-2022/ide-features-ide-deprecation.md
new file mode 100644
index 00000000000..becad55356c
--- /dev/null
+++ b/website/docs/docs/dbt-versions/release-notes/04-Nov-2022/ide-features-ide-deprecation.md
@@ -0,0 +1,32 @@
+---
+
+title: "Extra features in new IDE, and classic IDE deprecation"
+id: "ide-features-ide-deprecation"
+description: "Enhancement and Deprecation: Extra features in new IDE, and classic IDE deprecation"
+sidebar_label: "Enhancement and deprecation: Extra features in the new IDE and classic IDE deprecation"
+tags: [Nov-29-2022, v1.1.67.0]
+
+---
+
+### Extra features in new and refreshed IDE
+
+The refreshed version of the dbt Cloud IDE has launched four brand-new additional features, making it easier and faster for you to develop in the IDE.
+
+The new features are:
+
+- **Formatting** — Format your dbt SQL files to a single code style with a click of a button. This uses the tool [sqlfmt](https://github.com/tconbeer/sqlfmt).
+- **Git diff view** — Highlights the changes in a file before opening a pull request.
+- **dbt autocomplete** — There are four new types of autocomplete features to help you develop faster:
+ - Use `ref` to autocomplete your model names
+ - Use `source` to autocomplete your source name + table name
+ - Use `macro` to autocomplete your arguments
+ - Use `env var` to autocomplete env var
+- **Dark mode** — Use dark mode in the dbt Cloud IDE for low-light environments.
+
+Read more about all the [Cloud IDE features](/docs/get-started/dbt-cloud-features).
+
+### Classic IDE deprecation notice
+
+In December 2022, dbt Labs will deprecate the classic IDE. The [new and refreshed IDE](/docs/get-started/develop-in-the-cloud) will be available for _all_ dbt Cloud users. You will no longer be able to access the classic IDE and dbt Labs might introduce changes that break the classic IDE.
+
+With deprecation, dbt Labs will only support the refreshed version of the dbt Cloud IDE.
diff --git a/website/docs/docs/dbt-cloud/release-notes/04-Oct-2022/cloud-integration-azure.md b/website/docs/docs/dbt-versions/release-notes/05-Oct-2022/cloud-integration-azure.md
similarity index 100%
rename from website/docs/docs/dbt-cloud/release-notes/04-Oct-2022/cloud-integration-azure.md
rename to website/docs/docs/dbt-versions/release-notes/05-Oct-2022/cloud-integration-azure.md
diff --git a/website/docs/docs/get-started/connect-your-database.md b/website/docs/docs/get-started/connect-your-database.md
index ea2d61d3aa1..656288be68a 100644
--- a/website/docs/docs/get-started/connect-your-database.md
+++ b/website/docs/docs/get-started/connect-your-database.md
@@ -149,12 +149,40 @@ As an end user, if your organization has set up BigQuery OAuth, you can link a p
## Connecting to Databricks
+You can connect to Databricks by using one of two supported adapters: [dbt-databricks](/connect-your-database#dbt-databricks) and [dbt-spark](/connect-your-database#dbt-spark). For accounts on dbt 1.0 or later, we recommend using the dbt-databricks adapter. The dbt-databricks adapter is maintained by the Databricks team and is verified by dbt Labs. The Databricks team is committed to supporting and improving the adapter over time, so you can be sure the integrated experience will provide the best of dbt and the best of Databricks. Connecting to Databricks via dbt-spark will be deprecated in the future.
-### ODBC
+### dbt-databricks Adapter
+dbt-databricks is compatible with the following versions of dbt Core in dbt Cloud with varying degrees of functionality.
+
+| Feature | dbt Versions |
+| ----- | ----------- |
+| dbt-databricks | Available starting with dbt 1.0 in dbt Cloud|
+| Unity Catalog | Available starting with dbt 1.1 |
+| Python models | Available starting with dbt 1.3 |
+
+The dbt-databricks adapter offers:
+- **Easier set up**
+- **Better defaults:**
+The dbt-databricks adapter is more opinionated, guiding users to an improved experience with less effort. Design choices of this adapter include defaulting to Delta format, using merge for incremental models, and running expensive queries with Photon.
+- **Support for Unity Catalog:**
+Unity Catalog allows Databricks users to centrally manage all data assets, simplifying access management and improving search and query performance. Databricks users can now get three-part data hierarchies – catalog, schema, model name – which solves a longstanding friction point in data organization and governance.
+
+
+To set up the Databricks connection, supply the following fields:
+
+| Field | Description | Examples |
+| ----- | ----------- | -------- |
+| Server Hostname | The hostname of the Databricks account to connect to | dbc-a2c61234-1234.cloud.databricks.com |
+| HTTP Path | The HTTP path of the Databricks cluster or SQL warehouse | /sql/1.0/warehouses/1a23b4596cd7e8fg |
+| Catalog | Name of Databricks Catalog (optional) | Production |
+
+
+
+### dbt-spark Adapter
dbt Cloud supports connecting to Databricks using
[a Cluster](https://docs.databricks.com/clusters/index.html) or
-[a SQL Endpoint](https://docs.databricks.com/sql/admin/sql-endpoints.html).
+[a SQL Warehouse (formerly called SQL endpoints)](https://docs.databricks.com/sql/admin/sql-endpoints.html).
Depending on how you connect to Databricks, either one of the `Cluster` or
`Endpoint` configurations must be provided, but setting _both_ values is not
allowed.
@@ -163,14 +191,14 @@ The following fields are available when creating a Databricks connection:
| Field | Description | Examples |
| ----- | ----------- | -------- |
-| Host Name | The hostname of the Databricks account to connect to | `avc-def1234ghi-9999.cloud.databricks.com` |
+| Hostname | The hostname of the Databricks account to connect to | dbc-a2c61234-1234.cloud.databricks.com |
| Port | The port to connect to Databricks for this connection | 443 |
-| Organization | Optional (default: 0) | 0123456789 |
+| Organization | Optional (default: 0) | 1123456677899012 |
| Cluster | The ID of the cluster to connect to (required if using a cluster) | 1234-567890-abc12345 |
-| Endpoint | The ID of the endpoint to connect to (required if using Databricks SQL) | 0123456789 |
+| Endpoint | The ID of the endpoint to connect to (required if using Databricks SQL) | 1a23b4596cd7e8fg |
| User | Optional | dbt_cloud_user |
-
+
## Connecting to Apache Spark
diff --git a/website/docs/docs/get-started/dbt-cloud-features.md b/website/docs/docs/get-started/dbt-cloud-features.md
index b5f02be327a..c5963e53c4f 100644
--- a/website/docs/docs/get-started/dbt-cloud-features.md
+++ b/website/docs/docs/get-started/dbt-cloud-features.md
@@ -27,20 +27,17 @@ With the Cloud IDE, you can:
The dbt Cloud IDE comes with features, including better performance and exciting enhancements, making it easier for you to develop, build, compile, run and test data models. Check out the some of the features below to learn more:
-| Feature | Info | Available in the IDE |
-|---|---|:---:|
-| **File state indicators** | Ability to see when changes or actions have been made to the file. The indicators **M, U,** and **•** appear to the right of your file or folder name and indicate the actions performed:
- Unsaved **(•)** — The IDE detects unsaved changes to your file/folder - Modification **(M)** — The IDE detects a modification of existing files/folders - Untracked **(U)** — The IDE detects changes made to new files or renamed files | ✅ |
-| **Build, test, and run code** | Build, test, and run your project with a button click or by using the Cloud IDE command bar. | ✅ |
-| **Drag and drop** | Drag and drop files located in the file explorer, and use the file breadcrumb on the top of the IDE for quick, linear navigation. Access adjacent files in the same file by right clicking on the breadcrumb file. | ✅ |
-| **Organize tabs** | You can move your tabs around to reorganize your work in the IDE. You can also right click on a tab to view and select a list of actions to take. | ✅ |
-| **Multiple selections** | You can make multiple selections for small and simultaneous edits. The below commands are a common way to add more cursors and allow you to insert cursors below or above with ease.
- Option-Command-Down arrow - Option-Command-Up arrow - Press Option and click on an area | ✅ |
-| **Formatting** | Format your files with a click of a button, powered by [sqlfmt](http://sqlfmt.com/). | _Coming soon — November 2022_ |
-| **Git diff view** | Ability to see what has been changed in a file before you make a pull request. | _Coming soon — November 2022_ |
-| **dbt autocomplete** | There are four new types of autocomplete features to help you develop faster: - Use `ref` to autocomplete your model names - Use `source` to autocomplete your source name + table name - Use `macro` to autocomplete your arguments - Use `env var` to autocomplete env var | _Coming soon — November 2022_ |
-| **Dark mode** | Use dark mode in the Cloud IDE for a great viewing experience in low-light environments. | _Coming soon — November 2022_ |
-
-**Note**: Cloud IDE beta users may have these features made available earlier.
-
+| Feature | Info |
+|---|---|
+| **File state indicators** | Ability to see when changes or actions have been made to the file. The indicators **M, U,** and **•** appear to the right of your file or folder name and indicate the actions performed:
- Unsaved **(•)** — The IDE detects unsaved changes to your file/folder - Modification **(M)** — The IDE detects a modification of existing files/folders - Untracked **(U)** — The IDE detects changes made to new files or renamed files
+| **Build, test, and run code** | Build, test, and run your project with a button click or by using the Cloud IDE command bar.
+| **Drag and drop** | Drag and drop files located in the file explorer, and use the file breadcrumb on the top of the IDE for quick, linear navigation. Access adjacent files in the same file by right clicking on the breadcrumb file.
+| **Organize tabs** | You can: - Move your tabs around to reorganize your work in the IDE - Right-click on a tab to view and select a list of actions to take - Close multiple, unsaved tabs to batch save your work
+| **Multiple selections** | You can make multiple selections for small and simultaneous edits. The below commands are a common way to add more cursors and allow you to insert cursors below or above with ease.
- Option-Command-Down arrow - Option-Command-Up arrow - Press Option and click on an area
+| **Formatting** | Format your files with a click of a button, powered by [sqlfmt](http://sqlfmt.com/).
+| **Git diff view** | Ability to see what has been changed in a file before you make a pull request.
+| **dbt autocomplete** | There are four new types of autocomplete features to help you develop faster: - Use `ref` to autocomplete your model names - Use `source` to autocomplete your source name + table name - Use `macro` to autocomplete your arguments - Use `env var` to autocomplete env var
+| **Dark mode** | Use dark mode in the Cloud IDE for a great viewing experience in low-light environments.
## Related docs
diff --git a/website/docs/docs/get-started/source-install.md b/website/docs/docs/get-started/source-install.md
index 1f6b506815e..6714e88cd10 100644
--- a/website/docs/docs/get-started/source-install.md
+++ b/website/docs/docs/get-started/source-install.md
@@ -3,9 +3,9 @@ title: "Install from source"
description: "You can install dbt Core from its GitHub code source."
---
-dbt Core and almost all of its adapter plugins are open source software. As such, the codebases are freely available to download and build from source. You might install form source if you want the latest code or want to install dbt from a specific commit. This might be helpful when you are contributing changes, or if you want to debug a past change.
+dbt Core and almost all of its adapter plugins are open source software. As such, the codebases are freely available to download and build from source. You might install from source if you want the latest code or want to install dbt from a specific commit. This might be helpful when you are contributing changes, or if you want to debug a past change.
-To download form source, you would clone the repositories from GitHub, making a local copy, and then install the local version using `pip`.
+To download from source, you would clone the repositories from GitHub, making a local copy, and then install the local version using `pip`.
Downloading and building dbt Core will enable you to contribute to the project by fixing a bug or implementing a sought-after feature. For more details, read the [contributing guidelines](https://github.com/dbt-labs/dbt-core/blob/HEAD/CONTRIBUTING.md).
diff --git a/website/docs/docs/use-dbt-semantic-layer/dbt-semantic-layer.md b/website/docs/docs/use-dbt-semantic-layer/dbt-semantic-layer.md
index f6cc375381d..bca1ff3549f 100644
--- a/website/docs/docs/use-dbt-semantic-layer/dbt-semantic-layer.md
+++ b/website/docs/docs/use-dbt-semantic-layer/dbt-semantic-layer.md
@@ -77,10 +77,10 @@ The dbt Semantic Layer product architecture includes four primary components:
| Components | Information | Developer plans | Team plans | Enterprise plans | License |
| --- | --- | :---: | :---: | :---: | --- |
-| **dbt metrics** | Allows you to define metrics in dbt Core. | ✅ | ✅ | ✅ | Open source, Core |
-| **dbt Server**| HTTP server that is able to quickly compile metric queries per environment using dbt project code. | ✅ | ✅ | ✅ | BSL |
+| **[dbt metrics](/docs/build/metrics)** | Allows you to define metrics in dbt Core. | ✅ | ✅ | ✅ | Open source, Core |
+| **[dbt Server](https://github.com/dbt-labs/dbt-server)**| A persisted HTTP server that wraps dbt core to handle RESTful API requests for dbt operations. | ✅ | ✅ | ✅ | BSL |
| **SQL Proxy** | Reverse-proxy that accepts dbt-SQL (SQL + Jinja like query models and metrics, use macros), compiles the query into pure SQL, and executes the query against the data platform. | ✅ _* Available during Public Preview only_ | ✅ | ✅ | Proprietary, Cloud (Team & Enterprise) |
-| **Metadata API** | Accesses metric definitions primarily via integrations and is the source of truth for objects defined in dbt projects (like models, macros, sources, metrics). The Metadata API is updated at the end of every dbt Cloud run. | ❌ | ✅ | ✅ | Proprietary, Cloud (Team & Enterprise |
+| **[Metadata API](/docs/dbt-cloud-apis/metadata-api)** | Accesses metric definitions primarily via integrations and is the source of truth for objects defined in dbt projects (like models, macros, sources, metrics). The Metadata API is updated at the end of every dbt Cloud run. | ❌ | ✅ | ✅ | Proprietary, Cloud (Team & Enterprise |
diff --git a/website/docs/docs/use-dbt-semantic-layer/quickstart-semantic-layer.md b/website/docs/docs/use-dbt-semantic-layer/quickstart-semantic-layer.md
index 956415bb95b..2ed60e32ded 100644
--- a/website/docs/docs/use-dbt-semantic-layer/quickstart-semantic-layer.md
+++ b/website/docs/docs/use-dbt-semantic-layer/quickstart-semantic-layer.md
@@ -318,3 +318,4 @@ Are you ready to define your own metrics and bring consistency to data consumers
- [dbt Semantic Layer](/docs/use-dbt-semantic-layer/dbt-semantic-layer) to learn about the dbt Semantic Layer
- [Understanding the components of the dbt Semantic Layer](https://docs.getdbt.com/blog/understanding-the-components-of-the-dbt-semantic-layer) blog post to see further examples
- [Integrated partner tools](https://www.getdbt.com/product/semantic-layer-integrations) for info on the different integration partners and their documentation
+- [dbt Server repo](https://github.com/dbt-labs/dbt-server), which is a persisted HTTP server that wraps dbt core to handle RESTful API requests for dbt operations.
diff --git a/website/docs/docs/use-dbt-semantic-layer/set-dbt-semantic-layer.md b/website/docs/docs/use-dbt-semantic-layer/set-dbt-semantic-layer.md
index a54bc2589b7..90fabea0bca 100644
--- a/website/docs/docs/use-dbt-semantic-layer/set-dbt-semantic-layer.md
+++ b/website/docs/docs/use-dbt-semantic-layer/set-dbt-semantic-layer.md
@@ -51,3 +51,4 @@ Before you set up the dbt Semantic Layer, make sure you meet the following:
- [Integrated partner tools](https://www.getdbt.com/product/semantic-layer-integrations) for info on the different integration partners and their documentation
- [Product architecture](/docs/use-dbt-semantic-layer/dbt-semantic-layer#product-architecture) page for more information on plan availability
- [dbt metrics](/docs/build/metrics) for in-depth detail on attributes, properties, filters, and how to define and query metrics
+- [dbt Server repo](https://github.com/dbt-labs/dbt-server), which is a persisted HTTP server that wraps dbt core to handle RESTful API requests for dbt operations
diff --git a/website/docs/faqs/Accounts/transfer-account.md b/website/docs/faqs/Accounts/transfer-account.md
new file mode 100644
index 00000000000..f3bba49bd7a
--- /dev/null
+++ b/website/docs/faqs/Accounts/transfer-account.md
@@ -0,0 +1,21 @@
+---
+title: How do I transfer account ownership to another user?
+description: "Instructions on how to transfer your dbt Cloud user account to another user"
+sidebar_label: 'How to transfer dbt Cloud account?'
+id: transfer-account
+
+---
+
+You can transfer your dbt Cloud [access control](/docs/collaborate/manage-access/about-access) to another user by following the steps below, depending on your dbt Cloud account plan:
+
+| Account plan| Steps |
+| ------ | ---------- |
+| **Developer** | You can transfer ownership by changing the email directly on your dbt Cloud [profile page](https://cloud.getdbt.com/#/profile/). |
+| **Team** | Existing account admins with account access can add users to, or remove users from the owner group. |
+| **Enterprise** | Account admins can add users to, or remove users from a group with Account Admin permissions. |
+| **If all account owners left the company** | If the account owner has left your organization, you will need to work with _your_ IT department to have incoming emails forwarded to the new account owner. Once your IT department has redirected the emails, you can request to reset the user password. Once you log in - you can change the email on the [Profile page](https://cloud.getdbt.com/#/profile/). |
+
+When you make any account owner and email changes:
+
+- The new email address _must_ be verified through our email verification process.
+- You can update any billing email address or [Notifications Settings](/docs/deploy/job-notifications) to reflect the new account owner changes, if applicable.
diff --git a/website/docs/faqs/Project/resource-yml-name.md b/website/docs/faqs/Project/resource-yml-name.md
index a528a6392e2..8a6ebe96134 100644
--- a/website/docs/faqs/Project/resource-yml-name.md
+++ b/website/docs/faqs/Project/resource-yml-name.md
@@ -10,4 +10,4 @@ It's up to you! Here's a few options:
- Use the same name as your directory (assuming you're using sensible names for your directories)
- If you test and document one model (or seed, snapshot, macro etc.) per file, you can give it the same name as the model (or seed, snapshot, macro etc.)
-Choose what works for your team. We have more recommendations in our guide on [structuring dbt project](https://discourse.getdbt.com/t/how-we-structure-our-dbt-projects/355).
+Choose what works for your team. We have more recommendations in our guide on [structuring dbt projects](https://docs.getdbt.com/guides/best-practices/how-we-structure/1-guide-overview).
diff --git a/website/docs/guides/advanced/adapter-development/6-promoting-a-new-adapter.md b/website/docs/guides/advanced/adapter-development/6-promoting-a-new-adapter.md
index eca75adbbe0..206179203fd 100644
--- a/website/docs/guides/advanced/adapter-development/6-promoting-a-new-adapter.md
+++ b/website/docs/guides/advanced/adapter-development/6-promoting-a-new-adapter.md
@@ -9,7 +9,7 @@ The most important thing here is recognizing that people are successful in the c
What does authentic engagement look like? It’s challenging to define explicit rules. One good rule of thumb is to treat people with dignity and respect.
-Contributors to the community should think of contribution *as the end itself,* not a means toward other business KPIs (leads, community members, etc.). [We believe that profits are exhaust.](https://www.getdbt.com/dbt-labs/values/#:~:text=Profits%20are%20exhaust.) Some ways to know if you’re authentically engaging:
+Contributors to the community should think of contribution *as the end itself,* not a means toward other business KPIs (leads, community members, etc.). [We are a mission-driven company.](https://www.getdbt.com/dbt-labs/values/) Some ways to know if you’re authentically engaging:
- Is an engagement’s *primary* purpose of sharing knowledge and resources or building brand engagement?
- Imagine you didn’t work at the org you do — can you imagine yourself still writing this?
diff --git a/website/docs/guides/legacy/custom-generic-tests.md b/website/docs/guides/legacy/custom-generic-tests.md
index 7ece5f08496..601e80a1254 100644
--- a/website/docs/guides/legacy/custom-generic-tests.md
+++ b/website/docs/guides/legacy/custom-generic-tests.md
@@ -184,7 +184,7 @@ models:
To change the way a built-in generic test works—whether to add additional parameters, re-write the SQL, or for any other reason—you simply add a test block named `` to your own project. dbt will favor your version over the global implementation!
-
+
```sql
{% test unique(model, column_name) %}
diff --git a/website/docs/guides/migration/tools/migrating-from-spark-to-databricks.md b/website/docs/guides/migration/tools/migrating-from-spark-to-databricks.md
new file mode 100644
index 00000000000..1a7d41600ba
--- /dev/null
+++ b/website/docs/guides/migration/tools/migrating-from-spark-to-databricks.md
@@ -0,0 +1,111 @@
+---
+title: "Migrating from dbt-spark to dbt-databricks"
+id: "migrating-from-spark-to-databricks"
+---
+
+
+## Pre-requisites
+
+In order to migrate to dbt-databricks, your project must be compatible with `dbt 1.0` or greater as dbt-databricks is not supported pre `dbt 1.0`. [This guide](https://docs.getdbt.com/guides/migration/versions/upgrading-to-v1.0) will help you upgrade your project if necessary.
+
+## Why change to dbt-databricks?
+
+The Databricks team, in collaboration with dbt Labs, built on top of the foundation that the dbt Labs’ dbt-spark adapter provided, and they added some critical improvements. The dbt-databricks adapter offers an easier set up, as it only requires three inputs for authentication, and it also has more features available via the Delta file format.
+
+### Authentication Simplification
+
+Previously users had to provide a `cluster` or `endpoint` ID which was hard to parse out of the http_path provided in the Databricks UI. Now the [dbt-databricks profile](https://docs.getdbt.com/reference/warehouse-setups/databricks-setup) requires the same inputs regardless if you are using a Cluster or a SQL endpoint. All you need to provide is:
+- the hostname of the Databricks workspace
+- the HTTP path of the Databricks SQL warehouse or cluster
+- an appropriate credential
+
+
+### Better defaults
+
+With dbt-databricks, by default, dbt models will use the Delta format and expensive queries will be accelerated with the [Photon engine](https://docs.databricks.com/runtime/photon.html). See [the caveats section of Databricks Profile documentation](https://docs.getdbt.com/reference/warehouse-profiles/databricks-profile#choosing-between-dbt-databricks-and-dbt-spark) for more information. Any declared configurations of `file_format = 'delta'` are now redundant and can be removed.
+
+Additionally, dbt-databricks's default `incremental_strategy` is now `merge`. The default `incremental_strategy` with dbt-spark is `append`.
+If you have been using the default `incremental_strategy=append` with dbt-spark, and would like to continue doing so, you'll have to set this config specifically on your incremental models. Read more [about `incremental_strategy` in dbt](https://docs.getdbt.com/docs/building-a-dbt-project/building-models/configuring-incremental-models#about-incremental_strategy).
+If you already specified `incremental_strategy=merge` on your incremental models, you do not need to change anything when moving to dbt-databricks, though you could remove the param as it is now the default.
+
+### Pure Python (Core only)
+
+A huge benefit to Core only users is that with the new dbt-databricks adapter, you no longer have to download an independent driver to interact with Databricks. The connection information is all embedded in a pure-Python library, `databricks-sql-connector`.
+
+
+## Migration
+### dbt Cloud
+
+#### Credentials
+If you are already successfully connected to Databricks using the dbt-spark ODBC method in dbt Cloud, then you have already supplied credentials in dbt Cloud to connect to your Databricks workspace. Each user will have added their Personal Access Token in their dbt Cloud profile for the given dbt project, which allows them to connect to Databricks in the dbt Cloud IDE, and additionally, an admin will have added an access token for each deployment environment, allowing for dbt Cloud to connect to Databricks during production jobs.
+
+When an admin changes the dbt Cloud's connection to use the dbt-databricks adapter instead of the dbt-spark adapter, your team will not lose their credentials. This makes migrating from dbt-spark to dbt-databricks straightforward as it only requires deleting the connection and re-adding the cluster/endpoint information. Both the admin and users of the project need not re-enter personal access tokens.
+
+#### Procedure
+
+An admin of the dbt Cloud project running on Databricks should take the following steps to migrate from using the generic Spark adapter to the Databricks-specfic adapter. This should not cause any downtime for production jobs, but we recommend that you schedule the connection change when there is not heavy IDE usage for your team to avoid disruption.
+
+1. Select **Account Settings** in the main navigation bar.
+2. On the Projects tab, scroll until you find the project you'd like to migrate to the new dbt-databricks adapter.
+3. Click the hyperlinked Connection for the project.
+4. Click the "Edit" button in the top right corner.
+5. Select Databricks for the warehouse
+6. Select Databricks (dbt-databricks) for the adapter and enter:
+ 1. the `hostname`
+ 2. the `http_path`
+ 3. optionally the catalog name
+7. Click save.
+
+After the above steps have been performed, all users will have to refresh their IDE before being able to start working again. It should complete in less than a minute.
+
+
+
+
+
+### dbt Core
+
+In dbt Core, migrating to the dbt-databricks adapter from dbt-spark requires that you:
+1. install the new adapter in your environment, and
+2. modify your target in your `~/.dbt/profiles.yml`
+
+These changes will be needed for all users of your project.
+
+#### Example
+
+If you're using `dbt-spark` today to connect to a Databricks SQL Endpoint, the below examples show a good before and after of how to authenticate. The cluster example is also effectively the same.
+
+
+
+
+```yaml
+your_profile_name:
+ target: dev
+ outputs:
+ dev:
+ type: spark
+ method: odbc
+ driver: '/opt/simba/spark/lib/64/libsparkodbc_sb64.so'
+ schema: my_schema
+ host: dbc-l33t-nwb.cloud.databricks.com
+ endpoint: 8657cad335ae63e3
+ token: [my_secret_token]
+
+```
+
+
+
+
+
+```yaml
+your_profile_name:
+ target: dev
+ outputs:
+ dev:
+ type: databricks
+ schema: my_schema
+ host: dbc-l33t-nwb.cloud.databricks.com
+ http_path: /sql/1.0/endpoints/8657cad335ae63e3
+ token: [my_secret_token]
+```
+
+
\ No newline at end of file
diff --git a/website/docs/guides/orchestration/airflow-and-dbt-cloud/2-setting-up-airflow-and-dbt-cloud.md b/website/docs/guides/orchestration/airflow-and-dbt-cloud/2-setting-up-airflow-and-dbt-cloud.md
index 165d7ea3610..ab847c526a0 100644
--- a/website/docs/guides/orchestration/airflow-and-dbt-cloud/2-setting-up-airflow-and-dbt-cloud.md
+++ b/website/docs/guides/orchestration/airflow-and-dbt-cloud/2-setting-up-airflow-and-dbt-cloud.md
@@ -13,7 +13,7 @@ In this example, we’re using Homebrew to install Astro CLI. Follow the instruc
brew install astronomer/cloud/astrocloud
```
-
+
## 2. Install and start Docker Desktop
@@ -21,7 +21,7 @@ Docker allows us to spin up an environment with all the apps and dependencies we
Follow the instructions [here](https://docs.docker.com/desktop/) to install Docker desktop for your own operating system. Once Docker is installed, ensure you have it up and running for the next steps.
-
+
## 3. Clone the airflow-dbt-cloud repository
@@ -32,13 +32,16 @@ git clone https://github.com/sungchun12/airflow-dbt-cloud.git
cd airflow-dbt-cloud
```
-
+
## 4. Start the Docker container
-1. Run the following command to spin up the Docker container and start your local Airflow deployment:
+You can initialize an Astronomer project in an empty local directory using a Docker container, and then run your project locally using the `start` command.
+
+1. Run the following commands to initialize your project and start your local Airflow deployment:
```bash
+ astrocloud dev init
astrocloud dev start
```
@@ -64,13 +67,13 @@ cd airflow-dbt-cloud
![Airflow login screen](/img/guides/orchestration/airflow-and-dbt-cloud/airflow-login.png)
-
+
## 5. Create a dbt Cloud service token
Create a service token from within dbt Cloud using the instructions [found here](https://docs.getdbt.com/docs/dbt-cloud-apis/service-tokens). Ensure that you save a copy of the token, as you won’t be able to access this later. In this example we use `Account Admin`, but you can also use `Job Admin` instead for token permissions.
-
+
## 6. Create a dbt Cloud job
@@ -84,4 +87,4 @@ In your dbt Cloud account create a job, paying special attention to the informat
https://cloud.getdbt.com/#/accounts/{account_id}/projects/{project_id}/jobs/{job_id}/
```
-
+
diff --git a/website/docs/reference/dbt-jinja-functions/config.md b/website/docs/reference/dbt-jinja-functions/config.md
index 3bc0d1c7f3f..616d8cd6d9c 100644
--- a/website/docs/reference/dbt-jinja-functions/config.md
+++ b/website/docs/reference/dbt-jinja-functions/config.md
@@ -24,6 +24,10 @@ is responsible for handling model code that looks like this:
}}
```
+Review [Model configurations](/reference/model-configs) for examples and more information on valid arguments.
+https://docs.getdbt.com/reference/model-configs
+
+
## config.get
__Args__:
diff --git a/website/docs/reference/node-selection/yaml-selectors.md b/website/docs/reference/node-selection/yaml-selectors.md
index 70d6f318c6b..eeaa7be7267 100644
--- a/website/docs/reference/node-selection/yaml-selectors.md
+++ b/website/docs/reference/node-selection/yaml-selectors.md
@@ -35,7 +35,7 @@ selectors:
## Definitions
Each `definition` is comprised of one or more arguments, which can be one of the following:
-* **CLI-style:** strings, representing CLI-style) arguments
+* **CLI-style:** strings, representing CLI-style arguments
* **Key-value:** pairs in the form `method: value`
* **Full YAML:** fully specified dictionaries with items for `method`, `value`, operator-equivalent keywords, and support for `exclude`
diff --git a/website/docs/reference/resource-configs/column_types.md b/website/docs/reference/resource-configs/column_types.md
index 21f7c7ce8e1..274166a9aba 100644
--- a/website/docs/reference/resource-configs/column_types.md
+++ b/website/docs/reference/resource-configs/column_types.md
@@ -79,3 +79,6 @@ seeds:
## Recommendation
Use this configuration only when required, i.e. when the type inference is not working as expected. Otherwise you can omit this configuration.
+
+## Troubleshooting
+Note: The `column_types` configuration is case-sensitive, regardless of quoting configuration. If you specify a column as `Country_Name` in your Seed, you should reference it as `Country_Name`, and not `country_name`.
diff --git a/website/docs/reference/resource-configs/materialize-configs.md b/website/docs/reference/resource-configs/materialize-configs.md
index 2f0180de865..a565156c459 100644
--- a/website/docs/reference/resource-configs/materialize-configs.md
+++ b/website/docs/reference/resource-configs/materialize-configs.md
@@ -5,21 +5,65 @@ id: "materialize-configs"
## Performance optimizations
-### Incremental models
-Materialize, at its core, is a real-time database that delivers incremental view updates without ever compromising on latency or correctness.
-Materialized views are incremental models, defined once.
+### Clusters
+
+
+
+- **v1.2.0:** Enable the configuration of [clusters](https://github.com/MaterializeInc/materialize/blob/main/misc/dbt-materialize/CHANGELOG.md#120---2022-08-31).
+
+
+
+The default [cluster](https://materialize.com/docs/overview/key-concepts/#clusters) that is used to maintain materialized views or indexes can be configured in your [profile](/reference/profiles.yml) using the `cluster` connection parameter. To override the cluster that is used for specific models (or groups of models), use the `cluster` configuration parameter.
+
+
+
+```sql
+{{ config(materialized='materializedview', cluster='not_default') }}
+
+select ...
+```
+
+
+
+
+
+```yaml
+models:
+ project_name:
+ +materialized: materializedview
+ +cluster: not_default
+```
+
+
+
+
+
+### Incremental models: Materialized Views
+
+Materialize, at its core, is a real-time database that delivers incremental view updates without ever compromising on latency or correctness. Use [materialized views](https://materialize.com/docs/overview/key-concepts/#materialized-views) to compute and incrementally update the results of your query.
### Indexes
+
+
+- **v1.2.0:** Enable additional configuration for [indexes](https://github.com/MaterializeInc/materialize/blob/main/misc/dbt-materialize/CHANGELOG.md#120---2022-08-31).
+
+
+
+Like in any standard relational database, you can use [indexes](https://materialize.com/docs/overview/key-concepts/#indexes) to optimize query performance in Materialize. Improvements can be significant, reducing response times down to single-digit milliseconds.
+
Materialized views (`materializedview`), views (`view`) and sources (`source`) may have a list of `indexes` defined. Each [Materialize index](https://materialize.com/docs/sql/create-index/) can have the following components:
-- `columns` (list, required): one or more columns on which the index is defined
-- `type` (string, optional): a supported index type. The only supported type is [`arrangement`](https://materialize.com/docs/overview/arrangements/).
+- `columns` (list, required): one or more columns on which the index is defined. To create an index that uses _all_ columns, use the `default` component instead.
+- `name` (string, optional): the name for the index. If unspecified, Materialize will use the materialization name and column names provided.
+- `cluster` (string, optional): the cluster to use to create the index. If unspecified, indexes will be created in the cluster used to create the materialization.
+- `default` (bool, optional): Default: `False`. If set to `True`, creates a default index that uses all columns.
-
+
```sql
{{ config(materialized='view',
+ indexes=[{'columns': ['col_a'], 'cluster': 'cluster_a'}]) }}
indexes=[{'columns': ['symbol']}]) }}
select ...
@@ -27,25 +71,13 @@ select ...
-If one or more indexes are configured on a resource, dbt will run `create index` statement(s) as part of that resource's , within the same transaction as its main `create` statement. For the index's name, dbt uses a hash of its properties and the current timestamp, in order to guarantee uniqueness and avoid namespace conflict with other indexes.
+
```sql
-create index if not exists
-"3695050e025a7173586579da5b27d275"
-on "my_target_database"."my_target_schema"."view_model"
-(symbol);
-```
-
-You can also configure indexes for a number of resources at once:
-
-
+{{ config(materialized='view',
+ indexes=[{'default': True}]) }}
-```yaml
-models:
- project_name:
- subdirectory:
- +indexes:
- - columns: ['symbol']
+select ...
```
diff --git a/website/docs/reference/resource-properties/tests.md b/website/docs/reference/resource-properties/tests.md
index 87bf7bfd54c..da78376f57b 100644
--- a/website/docs/reference/resource-properties/tests.md
+++ b/website/docs/reference/resource-properties/tests.md
@@ -183,6 +183,8 @@ models:
### `unique`
This test validates that there are no duplicate values present in a field.
+
+The config and where clause are optional.
@@ -194,7 +196,9 @@ models:
columns:
- name: order_id
tests:
- - unique
+ - unique:
+ config:
+ where: "order_id > 21"
```
diff --git a/website/docs/reference/warehouse-setups/databricks-setup.md b/website/docs/reference/warehouse-setups/databricks-setup.md
index 314a7f77d6f..86439aa2484 100644
--- a/website/docs/reference/warehouse-setups/databricks-setup.md
+++ b/website/docs/reference/warehouse-setups/databricks-setup.md
@@ -91,7 +91,7 @@ even easier to use dbt with the Databricks Lakehouse.
`dbt-databricks` includes:
- No need to install additional drivers or dependencies for use on the CLI
- Use of Delta Lake for all models out of the box
-- SQL macros that are optimzed to run with [Photon](https://docs.databricks.com/runtime/photon.html)
+- SQL macros that are optimized to run with [Photon](https://docs.databricks.com/runtime/photon.html)
### Support for Unity Catalog
diff --git a/website/docs/reference/warehouse-setups/materialize-setup.md b/website/docs/reference/warehouse-setups/materialize-setup.md
index 7eb3af4aea8..684f7174a9f 100644
--- a/website/docs/reference/warehouse-setups/materialize-setup.md
+++ b/website/docs/reference/warehouse-setups/materialize-setup.md
@@ -11,7 +11,7 @@ meta:
slack_channel_name: '#db-materialize'
slack_channel_link: 'https://getdbt.slack.com/archives/C01PWAH41A5'
platform_name: 'Materialize'
- config_page: 'no-configs'
+ config_page: 'materialize-configs'
---
:::info Vendor-supported plugin
@@ -49,7 +49,7 @@ pip is the easiest way to install the adapter:
## Connecting to Materialize
-Once you have Materialize [installed and running](https://materialize.com/docs/install/), adapt your `profiles.yml` to connect to your instance using the following reference profile configuration:
+Once you have set up a [Materialize account](https://materialize.com/register/), adapt your `profiles.yml` to connect to your instance using the following reference profile configuration:
@@ -59,17 +59,27 @@ dbt-materialize:
outputs:
dev:
type: materialize
- threads: 1
host: [host]
port: [port]
- user: [user]
+ user: [user@domain.com]
pass: [password]
dbname: [database]
- schema: [name of your dbt schema]
+ cluster: [cluster] # default 'default'
+ schema: [dbt schema]
+ sslmode: require
+ keepalives_idle: 0 # default 0, indicating the system default
+ connect_timeout: 10 # default 10 seconds
+ retries: 1 # default 1 retry on error/timeout when opening connections
```
+### Configurations
+
+`cluster`: The default [cluster](https://materialize.com/docs/overview/key-concepts/#clusters) is used to maintain materialized views or indexes. A [`default` cluster](https://materialize.com/docs/sql/show-clusters/#default-cluster) is pre-installed in every environment, but we recommend creating dedicated clusters to isolate the workloads in your dbt project (for example, `staging` and `data_mart`).
+
+`keepalives_idle`: The number of seconds before sending a ping to keep the Materialize connection active. If you are encountering `SSL SYSCALL error: EOF detected`, you may want to lower the [keepalives_idle](https://docs.getdbt.com/reference/warehouse-setups/postgres-setup#keepalives_idle) value to prevent the database from closing its connection.
+
To test the connection to Materialize, run:
```
@@ -90,14 +100,17 @@ Type | Supported? | Details
`view` | YES | Creates a [view](https://materialize.com/docs/sql/create-view/#main).
`materializedview` | YES | Creates a [materialized view](https://materialize.com/docs/sql/create-materialized-view/#main).
`table` | YES | Creates a [materialized view](https://materialize.com/docs/sql/create-materialized-view/#main). (Actual table support pending [#5266](https://github.com/MaterializeInc/materialize/issues/5266))
-`index` | YES | (Deprecated) Creates an index. Use the [`indexes` config](materialize-configs#indexes) to create indexes on `materializedview`, `view` or `source` relations instead.
`sink` | YES | Creates a [sink](https://materialize.com/docs/sql/create-sink/#main).
`ephemeral` | YES | Executes queries using CTEs.
`incremental` | NO | Use the `materializedview` instead. Materialized views will always return up-to-date results without manual or configured refreshes. For more information, check out [Materialize documentation](https://materialize.com/docs/).
+### Indexes
+
+Materialized views (`materializedview`), views (`view`) and sources (`source`) may have a list of [`indexes`](resource-configs/materialize-configs/indexes) defined.
+
### Seeds
-Running [`dbt seed`](commands/seed) will create a static materialized from a CSV file. You will not be able to add to or update this view after it has been created. If you want to rerun `dbt seed`, you must first drop existing views manually with `drop view`.
+Running [`dbt seed`](commands/seed) will create a static materialized from a CSV file. You will not be able to add to or update this view after it has been created.
### Tests
@@ -105,5 +118,4 @@ Running [`dbt test`](commands/test) with the optional `--store-failures` flag or
## Resources
-- [dbt and Materialize guide](https://materialize.com/docs/guides/dbt/)
-- [Get started](https://github.com/MaterializeInc/demos/tree/main/dbt-get-started) using dbt and Materialize together
+- [dbt and Materialize guide](https://materialize.com/docs/guides/dbt/)
\ No newline at end of file
diff --git a/website/docs/reference/warehouse-setups/postgres-setup.md b/website/docs/reference/warehouse-setups/postgres-setup.md
index bbfcdb6e572..0955f731974 100644
--- a/website/docs/reference/warehouse-setups/postgres-setup.md
+++ b/website/docs/reference/warehouse-setups/postgres-setup.md
@@ -4,7 +4,7 @@ id: "postgres-setup"
meta:
maintained_by: dbt Labs
authors: 'core dbt maintainers'
- github_repo: 'dbt-labs/dbt-postgres'
+ github_repo: 'dbt-labs/dbt-core'
pypi_package: 'dbt-postgres'
min_core_version: 'v0.4.0'
cloud_support: Supported
diff --git a/website/docs/terms/relational-database.md b/website/docs/terms/relational-database.md
new file mode 100644
index 00000000000..8f05e5f4944
--- /dev/null
+++ b/website/docs/terms/relational-database.md
@@ -0,0 +1,88 @@
+---
+id: relational-database
+title: Relational database
+description: A relational database provides a structured way to store data into tables consisting of rows and columns. Different tables in a relational database can be joined together using common columns from each table, forming relationships.
+displayText: relational database
+hoverSnippet: A relational database provides a structured way to store data into tables consisting of rows and columns. Different tables in a relational database can be joined together using common columns from each table, forming relationships.
+---
+
+
+ Relational database: A way to get order out of data chaos
+
+
+A relational database provides a structured way to store data into tables consisting of rows and columns. Different tables in a relational database can be joined together using common columns from each table, forming relationships.
+
+Analytics engineers use relational database models to process high volumes of data that, in its rawest form, is too difficult for an end user or analyst to read and comprehend. Thanks to these models, people can easily query, interpret, and derive insight out of data using the accessible SQL.
+
+Anyone who’s ever managed or modeled data will tell you that data points are only meaningful in relation to each other. The very philosophy behind data management and data analytics has centered on forming a narrative out of seemingly disparate elements.
+
+At the heart of this notion sits the relational database, which was first introduced by computer scientist E.F. Codd in the year 1970 — 13 years before the internet was even invented!
+
+## How relational databases work
+
+The legwork behind relational databases lies in establishing pre-defined relationships between tables, also called “entities”. For example, in the [jaffle_shop](https://github.com/dbt-labs/jaffle_shop) ecommerce store database where customers’ information is stored in a `customers` table and orders information is stored in an `orders` table, a relationship is defined such that each order is attributed to a customer.
+
+![](/img/docs/terms/relational-database/relation.png)
+
+The way relationships are defined is via primary keys and foreign keys.
+
+By definition, a is a column (or combination of columns as a surrogate key) which identifies a unique record. There can be only one primary key per table, and the primary key should be unique and not null.
+
+On the other hand, a foreign key is a column (or combination of columns) in one table that references the primary key in another table. In the above example, multiple orders can belong to one customer. Assuming that `id` is defined as the primary key for the `customers` table, `user_id` in the `orders` table would be the foreign key.
+
+In analytics engineering, where the focus is geared towards data modeling and creating a reporting layer for a BI tool, relational databases are a great fit. Data modeling defines how the data elements are related to each other, and a well-organized database is the cornerstone of effective data querying.
+
+## Use cases for relational databases
+
+Relational databases are best for structured data that can be organized into tables made up of rows and columns. Data teams rely on relational databases for storing transactional data, and also when data querying and data analysis is needed.
+
+### Transactional processing
+
+As mentioned earlier, relational databases are a great fit for transaction-oriented systems such as CRM tools, e-commerce platforms, or finance software. Companies tend to use relational databases when transactional consistency is required, as they offer a near failsafe environment for data accuracy and completion. When a transaction consists of several steps, the system treats the steps as a single transaction and assures that the operation follows an ‘all-or-nothing’ scenario, ie: the steps either all survive or all fail.
+
+### Modeling data and organizing it for analysis
+
+Relational databases support common data modeling techniques such as , Data Vault, or sometimes hybrid approaches that combine different modeling techniques. Such methodologies allow teams to organize their data into useful data structures.
+
+A data model is the overarching conceptual layer that organizes data entities and their relationships. The specific physical implementation of that data model including the definitions of data types and constraints constitutes the database schema.
+
+Having organized data entities also helps analytics engineers and analysts build meaningful queries that derive data in a format and granularity that is otherwise not directly available in the base database.
+
+Most analytics engineers have to deal with both relational (typically structured data) and non-relational data (typically unstructured data) coming in from multiple sources. The data is then transformed until it ultimately gets modeled into data entities using relational modeling approaches. More on non-relational databases in the following section, but in a nutshell, structured data is data that can be easily stored in a relational database system, while unstructured data is composed of formats that cannot easily (or at all) be broken down into tabular data. Common examples of unstructured data include video files, PDFs, audio files, and social media posts.
+
+Another popular format is semi-structured data which is inherently difficult to organize into rows and columns, but contains semantic markup that makes it possible to extract the underlying information. Some examples include XML and .
+
+Relational data warehouses provide relational databases that are specifically optimized for analytical querying rather than transaction processing. Increasingly, data warehouses are providing better support for unstructured data, or data that cannot be stored in relational tables. .
+
+Even when analytics engineers do not physically enforce relationships at the database level (many modern data warehouses allow for defining relational constraints but do not actually enforce them), they do follow a relational process. This process enables them to still organize the data into logical entities whenever possible, and in order to make sure that the data is not redundant and easily queryable.
+
+## Relational database vs. non-relational database
+
+The main difference between a relational and non-relational database is in how they store information. Relational databases are well-suited for data that is structured and store values in tables, and non-relational databases store data in a non-tabular form called unstructured data.
+
+As datasets are becoming dramatically more complex and less structured, the format of the ingested data can sometimes be unpredictable which makes the case for non-relational databases (also called NoSQL).
+
+NoSQL databases are also typically better suited for granular real-time monitoring. On the other hand, relational databases make it easier to look at transformed and aggregated data, making them a more appropriate fit for reporting and analytics.
+
+The below table summarizes the main differences between a relational and a non-relational database:
+
+| | Relational Database | Non-Relational Database |
+|---|---|---|
+| Data storage | Data is stored in tables. | Data is stored in document files, graph stores, key-value stores, or wide-column stores. |
+| Data format | Data is structured. | Data is mainly unstructured. |
+| Usage | Mainly used for recording transactions, data modeling, and data analysis. | Mainly used to ingest large volume real-time data streams. |
+| Data Integrity | The relationships and constraints defined help ensure higher data integrity. | Non-relational databases do not guarantee data integrity. |
+| Scalability | Scalable at a high price tag. | Highly scalable. |
+
+## Conclusion
+
+Relational databases store data in a systematic way, and support querying multiple tables together in order to generate business insights.
+
+Often starting off with unorganized and chaotic data, analytics engineers leverage relational databases to bring structure and consistency to their data.
+
+Relational databases also have a strong record of transactional consistency. While some companies are racing to embrace non-relational databases in order to handle the big volume of unstructured data, most of their workloads likely remain transactional and analytical in nature which is why relational databases are very common.
+
+## Further reading
+
+- [Glossary: Primary key](/terms/primary-key)
+- [Glossary: Data warehouse](/terms/data-warehouse)
diff --git a/website/functions/post-preview.js b/website/functions/post-preview.js
new file mode 100644
index 00000000000..8d88c5f95cb
--- /dev/null
+++ b/website/functions/post-preview.js
@@ -0,0 +1,7 @@
+export default function createPostPreview(description, charCount) {
+ if (description.length <= charCount) { return description };
+ const clippedDesc = description.slice(0, charCount-1);
+ // return the version of the description clipped to the last instance of a space
+ // this is so there are no cut-off words.
+ return clippedDesc.slice(0, clippedDesc.lastIndexOf(" ")) + '...';
+}
diff --git a/website/sidebars.js b/website/sidebars.js
index bc8d0b05ffe..87664aabcd8 100644
--- a/website/sidebars.js
+++ b/website/sidebars.js
@@ -799,6 +799,7 @@ const sidebarSettings = {
"guides/migration/tools/migrating-from-stored-procedures/6-migrating-from-stored-procedures-conclusion",
],
},
+ "guides/migration/tools/migrating-from-spark-to-databricks",
],
},
],
diff --git a/website/snippets/available-enterprise-tier-only.md b/website/snippets/available-enterprise-tier-only.md
index 87ee93e776e..0d75b72287e 100644
--- a/website/snippets/available-enterprise-tier-only.md
+++ b/website/snippets/available-enterprise-tier-only.md
@@ -1,5 +1,7 @@
:::caution Available for dbt Cloud Enterprise
-Connecting an Azure DevOps account is available for organizations using the dbt Cloud Enterprise tier.
+Connecting an Azure DevOps cloud account is available for organizations using the dbt Cloud Enterprise tier.
+
+Azure DevOps on-premise instances are not supported in dbt Cloud.
:::
diff --git a/website/snippets/core-versions-table.md b/website/snippets/core-versions-table.md
index d8cd6314b6a..7ecf61e5b96 100644
--- a/website/snippets/core-versions-table.md
+++ b/website/snippets/core-versions-table.md
@@ -4,6 +4,7 @@
| [**v1.0**](upgrading-to-v1.0) | Dec 3, 2021 | v1.1.0 release | Dec 3, 2022 | Dec 2022 | |
| [**v1.1**](upgrading-to-v1.1) | Apr 28, 2022 | v1.2.0 release | Apr 28, 2023 | Apr 2023 | |
| [**v1.2**](upgrading-to-v1.2) | Jul 26, 2022 | v1.3.0 release | Jul 26, 2023 | Jul 2023 | |
-| _**v1.3**_ | _Oct 2022_ | _v1.4.0 release_ | _Oct 2023_ | _Oct 2023_ | |
+| [**v1.3**](upgrading-to-v1.3) | Oct 12, 2022 | v1.4.0 release | Oct 12, 2023 | Oct 2023 | |
+| _**v1.4**_ | _Jan 2023_ | _v1.5.0 release_ | _Jan 2024_ | _Jan 2024_ | |
_Italics: Future releases, NOT definite commitments. Shown for indication only._
diff --git a/website/snippets/sl-set-up-steps.md b/website/snippets/sl-set-up-steps.md
index eeb6fff7615..3d2a531b773 100644
--- a/website/snippets/sl-set-up-steps.md
+++ b/website/snippets/sl-set-up-steps.md
@@ -23,7 +23,7 @@ Note - It is _not_ recommended that you use your dbt Cloud credentials due to e
:::
-12. Set up the [Metadata API](docs/dbt-cloud-apis/metadata-api) (Team and Enterprise accounts only) in the integrated partner tool to import the metric definitions. The [integrated parnter tool](https://www.getdbt.com/product/semantic-layer-integrations) will treat the dbt Server as another data source (like a data platform). This requires:
+12. Set up the [Metadata API](docs/dbt-cloud-apis/metadata-api) (Team and Enterprise accounts only) in the integrated partner tool to import the metric definitions. The [integrated partner tool](https://www.getdbt.com/product/semantic-layer-integrations) will treat the dbt Server as another data source (like a data platform). This requires:
- The account ID, environment ID, and job ID (visible in the job URL)
- An [API service token](/docs/dbt-cloud-apis/service-tokens) with job admin and metadata permissions
diff --git a/website/snippets/tutorial-build-models-atop-other-models.md b/website/snippets/tutorial-build-models-atop-other-models.md
index c769f1fc255..6ca4dd20ed8 100644
--- a/website/snippets/tutorial-build-models-atop-other-models.md
+++ b/website/snippets/tutorial-build-models-atop-other-models.md
@@ -195,7 +195,8 @@ Now you can experiment by separating the logic out into separate models and usin
This time, when you performed a `dbt run`, separate views/tables were created for `stg_customers`, `stg_orders` and `customers`. dbt inferred the order to run these models. Because `customers` depends on `stg_customers` and `stg_orders`, dbt builds `customers` last. You do not need to explicitly define these dependencies.
-### FAQs
+
+### FAQs {#faq-2}
diff --git a/website/src/components/blogPostCard/index.js b/website/src/components/blogPostCard/index.js
index 670c2862cf6..835ca57f3b8 100644
--- a/website/src/components/blogPostCard/index.js
+++ b/website/src/components/blogPostCard/index.js
@@ -2,6 +2,7 @@ import React from 'react';
import styles from './styles.module.css';
import useBaseUrl from '@docusaurus/useBaseUrl';
import Link from '@docusaurus/Link';
+import createPostPreview from '@site/functions/post-preview';
function BlogPostCard({ postMetaData }) {
@@ -9,14 +10,14 @@ function BlogPostCard({ postMetaData }) {
return (
+
+
);
}
diff --git a/website/src/css/custom.css b/website/src/css/custom.css
index 6bfbc18e32e..d9eddd9de3b 100644
--- a/website/src/css/custom.css
+++ b/website/src/css/custom.css
@@ -258,6 +258,12 @@ a.hash-link:hover {
color: var(--docsearch-text-color);
}
+/* Deletes the Hash after TOC links that is generated by Docusaurus here for some reason */
+ul.table-of-contents .hash-link::before {
+ content: '';
+ display: none;
+}
+
pre {
font-size: 14px !important;
}
@@ -436,6 +442,12 @@ a.navbar__item.navbar__link.btn:hover {
background: transparent;
}
+/* Prevent layout shift when sidebar gets vertical overflow */
+@media(min-width: 996px) {
+ .theme-doc-sidebar-menu.menu__list {
+ max-width: 260px;
+ }
+}
/* level2 */
li.theme-doc-sidebar-item-category.theme-doc-sidebar-item-category-level-2 .menu__list-item-collapsible {
@@ -505,13 +517,21 @@ i.theme-doc-sidebar-item-category.theme-doc-sidebar-item-category-level-2.menu__
.menu__list-item--collapsed .menu__link--sublist::after,
.menu__list-item--collapsed .menu__caret::before {
- transform: rotateZ(90deg);
+ transform: rotateZ(0deg);
+}
+
+/* Mobile ToC caret */
+.docs-doc-page .theme-doc-toc-mobile .clean-btn:after {
+ transform: rotate(90deg);
+}
+.docs-doc-page .theme-doc-toc-mobile[class*="Expanded"] .clean-btn:after {
+ transform: rotate(180deg);
}
/* < icon */
.menu__link--sublist::after,
.menu__caret::before {
- transform: rotate(-90deg);
+ transform: rotate(90deg);
margin-left: 1em;
background: var(--ifm-breadcrumb-separator) center;
background-repeat: no-repeat;
diff --git a/website/src/theme/BlogPostItem/index.js b/website/src/theme/BlogPostItem/index.js
index 9e93d67efe3..071319aae31 100644
--- a/website/src/theme/BlogPostItem/index.js
+++ b/website/src/theme/BlogPostItem/index.js
@@ -12,7 +12,7 @@
* - Add image above title for blog posts
*/
-import React from 'react';
+import React, { useEffect } from 'react';
import clsx from 'clsx';
import {MDXProvider} from '@mdx-js/react';
import Translate, {translate} from '@docusaurus/Translate';
@@ -101,6 +101,36 @@ function BlogPostItem(props) {
);
};
+ // dbt custom - send blog context to datalayer to send to snowplow
+ useEffect(() => {
+ let blogContext = {
+ event: 'blogContext',
+ blogAuthor: '',
+ blogCategory: '',
+ blogDate: formattedDate ? formattedDate : undefined
+ }
+
+ if(authors && authors.length > 0) {
+ authors.map((author, i) => {
+ blogContext.blogAuthor +=
+ `${author.name}${i !== authors.length - 1 ? ', ' : ''}`
+ })
+ }
+
+ if(tags && tags.length > 0) {
+ tags.map((tag, i) => {
+ blogContext.blogCategory +=
+ `${tag.label}${i !== tags.length - 1 ? ', ' : ''}`
+ })
+ }
+
+ // Only send to datalayer if blog post page
+ if(isBlogPostPage) {
+ window.dataLayer = window.dataLayer || [];
+ dataLayer && dataLayer.push(blogContext)
+ }
+ }, [])
+
return (
<>
{frontMatter.canonical_url && (
diff --git a/website/src/theme/DocItem/index.js b/website/src/theme/DocItem/index.js
index 22f93421c54..5337c58b808 100644
--- a/website/src/theme/DocItem/index.js
+++ b/website/src/theme/DocItem/index.js
@@ -4,7 +4,7 @@
* This source code is licensed under the MIT license found in the
* LICENSE file in the root directory of this source tree.
*/
-import React, {useState, useEffect, useContext} from 'react';
+import React, { useState, useEffect, useContext } from 'react';
import clsx from 'clsx';
import DocPaginator from '@theme/DocPaginator';
import DocVersionBanner from '@theme/DocVersionBanner';
@@ -15,16 +15,17 @@ import TOC from '@theme/TOC';
import TOCCollapsible from '@theme/TOCCollapsible';
import Heading from '@theme/Heading';
import styles from './styles.module.css';
-import {ThemeClassNames, useWindowSize} from '@docusaurus/theme-common';
+import { ThemeClassNames, useWindowSize } from '@docusaurus/theme-common';
import DocBreadcrumbs from '@theme/DocBreadcrumbs';
+import DocSearchWeight from '@site/src/components/docSearchWeight';
// dbt Custom
import VersionContext from '../../stores/VersionContext'
import getElements from '../../utils/get-html-elements';
export default function DocItem(props) {
- const {content: DocContent} = props;
- const {metadata, frontMatter, assets} = DocContent;
+ const { content: DocContent } = props;
+ const { metadata, frontMatter, assets } = DocContent;
const {
keywords,
hide_title: hideTitle,
@@ -32,7 +33,7 @@ export default function DocItem(props) {
toc_min_heading_level: tocMinHeadingLevel,
toc_max_heading_level: tocMaxHeadingLevel,
} = frontMatter;
- const {description, title} = metadata;
+ const { description, title } = metadata;
const image = assets.image ?? frontMatter.image; // We only add a title if:
// - user asks to hide it with front matter
// - the markdown content does not already contain a top-level h1 heading
@@ -49,6 +50,10 @@ export default function DocItem(props) {
// If term has cta property set, show that cta
const termCTA = frontMatter?.cta && frontMatter.cta
+ // dbt Custom
+ // If the page has a search_weight value, apply that value
+ const searchWeight = frontMatter?.search_weight && frontMatter.search_weight
+
// This hides any TOC items not in
// html markdown headings for current version.
const { version: dbtVersion } = useContext(VersionContext)
@@ -58,28 +63,51 @@ export default function DocItem(props) {
async function fetchElements() {
// get html elements
const headings = await getElements(".markdown h1, .markdown h2, .markdown h3, .markdown h4, .markdown h5, .markdown h6")
-
// if headings exist on page
// compare against toc
- if(DocContent.toc && headings && headings.length) {
- let updated = DocContent.toc.reduce((acc, item) => {
+ if (DocContent.toc && headings && headings.length) {
+ // make new TOC object
+ let updated = Array.from(headings).reduce((acc, item) => {
// If heading id and toc item id match found
// include in updated toc
- let found = Array.from(headings).find(heading =>
+ let found = DocContent.toc.find(heading =>
heading.id.includes(item.id)
)
// If toc item is not in headings
// do not include in toc
// This means heading is versioned
- if(found)
- acc.push(item)
+
+ let makeToc = (heading) => {
+ let level;
+ if (heading.nodeName === "H2") {
+ level = 2
+ } else if (heading.nodeName === "H3") {
+ level = 3
+ } else {
+ level = null
+ }
+
+ return {
+ value: heading.innerHTML,
+ id: heading.id,
+ level: level && level
+ }
+ }
+
+ if (found) {
+ acc.push(makeToc(item))
+ } else if (!found) {
+ acc.push(makeToc(item))
+ } else {
+ null
+ }
return acc
}, [])
// If updated toc different than current
// If so, show loader and update toc
- if(currentToc.length !== updated.length) {
+ if (currentToc.length !== updated.length) {
setTocReady(false)
// This timeout provides enough time to show the loader
// Otherwise the content updates immediately
@@ -136,11 +164,11 @@ export default function DocItem(props) {
{/*
- Title can be declared inside md content or declared through
- front matter and added manually. To make both cases consistent,
- the added title is added under the same div.markdown block
- See https://github.com/facebook/docusaurus/pull/4882#issuecomment-853021120
- */}
+ Title can be declared inside md content or declared through
+ front matter and added manually. To make both cases consistent,
+ the added title is added under the same div.markdown block
+ See https://github.com/facebook/docusaurus/pull/4882#issuecomment-853021120
+ */}
{shouldAddTitle && (
{title}
@@ -148,6 +176,8 @@ export default function DocItem(props) {
)}
+
+