diff --git a/website/blog/2022-11-22-move-spreadsheets-to-your-dwh.md b/website/blog/2022-11-22-move-spreadsheets-to-your-dwh.md index 09274b41a9b..93cf91efeed 100644 --- a/website/blog/2022-11-22-move-spreadsheets-to-your-dwh.md +++ b/website/blog/2022-11-22-move-spreadsheets-to-your-dwh.md @@ -70,9 +70,9 @@ An obvious choice if you have data to load into your warehouse would be your exi [Fivetran’s browser uploader](https://fivetran.com/docs/files/browser-upload) does exactly what it says on the tin: you upload a file to their web portal and it creates a table containing that data in a predefined schema in your warehouse. With a visual interface to modify data types, it’s easy for anyone to use. And with an account type with the permission to only upload files, you don’t need to worry about your stakeholders accidentally breaking anything either. - + - + A nice benefit of the uploader is support for updating data in the table over time. If a file with the same name and same columns is uploaded, any new records will be added, and existing records (per the ) will be updated. @@ -100,7 +100,7 @@ The main benefit of connecting to Google Sheets instead of a static spreadsheet Instead of syncing all cells in a sheet, you create a [named range](https://fivetran.com/docs/files/google-sheets/google-sheets-setup-guide) and connect Fivetran to that range. Each Fivetran connector can only read a single range—if you have multiple tabs then you’ll need to create multiple connectors, each with its own schema and table in the target warehouse. When a sync takes place, it will [truncate](https://docs.getdbt.com/terms/ddl#truncate) and reload the table from scratch as there is no primary key to use for matching. - + Beware of inconsistent data types though—if someone types text into a column that was originally numeric, Fivetran will automatically convert the column to a string type which might cause issues in your downstream transformations. [The recommended workaround](https://fivetran.com/docs/files/google-sheets#typetransformationsandmapping) is to explicitly cast your types in [staging models](https://docs.getdbt.com/best-practices/how-we-structure/2-staging) to ensure that any undesirable records are converted to null. @@ -119,7 +119,7 @@ Beware of inconsistent data types though—if someone types text into a column t I’m a big fan of [Fivetran’s Google Drive connector](https://fivetran.com/docs/files/google-drive); in the past I’ve used it to streamline a lot of weekly reporting. It allows stakeholders to use a tool they’re already familiar with (Google Drive) instead of dealing with another set of credentials. Every file uploaded into a specific folder on Drive (or [Box, or consumer Dropbox](https://fivetran.com/docs/files/magic-folder)) turns into a table in your warehouse. - + Like the Google Sheets connector, the data types of the columns are determined automatically. Dates, in particular, are finicky though—if you can control your input data, try to get it into [ISO 8601 format](https://xkcd.com/1179/) to minimize the amount of cleanup you have to do on the other side. @@ -174,7 +174,7 @@ Each of the major data warehouses also has native integrations to import spreads Snowflake’s options are robust and user-friendly, offering both a [web-based loader](https://docs.snowflake.com/en/user-guide/data-load-web-ui.html) as well as [a bulk importer](https://docs.snowflake.com/en/user-guide/data-load-bulk.html). The web loader is suitable for small to medium files (up to 50MB) and can be used for specific files, all files in a folder, or files in a folder that match a given pattern. It’s also the most provider-agnostic, with support for Amazon S3, Google Cloud Storage, Azure and the local file system. - + ### BigQuery diff --git a/website/blog/2022-11-30-dbt-project-evaluator.md b/website/blog/2022-11-30-dbt-project-evaluator.md index b936d4786cd..3ea7a459c35 100644 --- a/website/blog/2022-11-30-dbt-project-evaluator.md +++ b/website/blog/2022-11-30-dbt-project-evaluator.md @@ -20,7 +20,7 @@ If you attended [Coalesce 2022](https://www.youtube.com/watch?v=smbRwmcM1Ok), yo Don’t believe me??? Here’s photographic proof. - + Since the inception of dbt Labs, our team has been embedded with a variety of different data teams — from an over-stretched-data-team-of-one to a data-mesh-multiverse. @@ -120,4 +120,4 @@ If something isn’t working quite right or you have ideas for future functional Together, we can ensure that dbt projects across the galaxy are set up for success as they grow to infinity and beyond. - + diff --git a/website/blog/2023-01-17-grouping-data-tests.md b/website/blog/2023-01-17-grouping-data-tests.md index 3648837302b..23fcce6d27e 100644 --- a/website/blog/2023-01-17-grouping-data-tests.md +++ b/website/blog/2023-01-17-grouping-data-tests.md @@ -43,11 +43,11 @@ So what do we discover when we validate our data by group? Testing for monotonicity, we find many poorly behaved turnstiles. Unlike the well-behaved dark blue line, other turnstiles seem to _decrement_ versus _increment_ with each rotation while still others cyclically increase and plummet to zero – perhaps due to maintenance events, replacements, or glitches in communication with the central server. - + Similarly, while no expected timestamp is missing from the data altogether, a more rigorous test of timestamps _by turnstile_ reveals between roughly 50-100 missing observations for any given period. - + _Check out this [GitHub gist](https://gist.github.com/emilyriederer/4dcc6a05ea53c82db175e15f698a1fb6) to replicate these views locally._ diff --git a/website/blog/2023-02-01-ingestion-time-partitioning-bigquery.md b/website/blog/2023-02-01-ingestion-time-partitioning-bigquery.md index 51a62006ee8..99ce142d5ed 100644 --- a/website/blog/2023-02-01-ingestion-time-partitioning-bigquery.md +++ b/website/blog/2023-02-01-ingestion-time-partitioning-bigquery.md @@ -125,7 +125,7 @@ In both cases, the operation can be done on a single partition at a time so it r On a 192 GB partition here is how the different methods compare: - + Also, the `SELECT` statement consumed more than 10 hours of slot time while `MERGE` statement took days of slot time. diff --git a/website/blog/2023-03-23-audit-helper.md b/website/blog/2023-03-23-audit-helper.md index 106715c5e4f..8599ad5eb5d 100644 --- a/website/blog/2023-03-23-audit-helper.md +++ b/website/blog/2023-03-23-audit-helper.md @@ -19,7 +19,7 @@ It is common for analytics engineers (AE) and data analysts to have to refactor Not only is that approach time-consuming, but it is also prone to naive assumptions that values match based on aggregate measures (such as counts or sums). To provide a better, more accurate approach to auditing, dbt Labs has created the `audit_helper` package. `audit_helper` is a package for dbt whose main purpose is to audit data by comparing two tables (the original one versus a refactored model). It uses a simple and intuitive query structure that enables quickly comparing tables based on the column values, row amount, and even column types (for example, to make sure that a given column is numeric in both your table and the original one). Figure 1 graphically displays the workflow and where `audit_helper` is positioned in the refactoring process. - + Now that it is clear where the `audit_helper` package is positioned in the refactoring process, it is important to highlight the benefits of using audit_helper (and ultimately, of auditing refactored models). Among the benefits, we can mention: - **Quality assurance**: Assert that a refactored model is reaching the same output as the original model that is being refactored. @@ -57,12 +57,12 @@ According to the `audit_helper` package documentation, this macro comes in handy ### How it works When you run the dbt audit model, it will compare all columns, row by row. To count for the match, every column in a row from one source must exactly match a row from another source, as illustrated in the example in Figure 2 below: - + As shown in the example, the model is compared line by line, and in this case, all lines in both models are equivalent and the result should be 100%. Figure 3 below depicts a row in which two of the three columns are equal and only the last column of row 1 has divergent values. In this case, despite the fact that most of row 1 is identical, that row will not be counted towards the final result. In this example, only row 2 and row 3 are valid, yielding a 66.6% match in the total of analyzed rows. - + As previously stated, for the match to be valid, all column values of a model’s row must be equal to the other model. This is why we sometimes need to exclude columns from the comparison (such as date columns, which can have a time zone difference from the original model to the refactored — we will discuss tips like these below). @@ -103,12 +103,12 @@ Let’s understand the arguments used in the `compare_queries` macro: - `summarize` (optional): This argument allows you to switch between a summary or detailed (verbose) view of the compared data. This argument accepts true or false values (its default is set to be true). 3. Replace the sources from the example with your own - + As illustrated in Figure 4, using the `ref` statements allows you to easily refer to your development model, and using the full path makes it easy to refer to the original table (which will be useful when you are refactoring a SQL Server Stored Procedure or Alteryx Workflow that is already being materialized in the data warehouse). 4. Specify your comparison columns - + Delete the example columns and replace them with the columns of your models, exactly as they are written in each model. You should rename/alias the columns to match, as well as ensuring they are in the same order within the `select` clauses. @@ -129,7 +129,7 @@ Let’s understand the arguments used in the `compare_queries` macro: ``` The output will be the similar to the one shown in Figure 6 below: - +
The output is presented in table format, with each column explained below:
@@ -155,7 +155,7 @@ While we can surely rely on that overview to validate the final refactored model A really useful way to check out which specific columns are driving down the match percentage between tables is the `compare_column_values` macro that allows us to audit column values. This macro requires a column to be set, so it can be used as an anchor to compare entries between the refactored dbt model column and the legacy table column. Figure 7 illustrates how the `compare_column_value`s macro works. - + The macro’s output summarizes the status of column compatibility, breaking it down into different categories: perfect match, both are null, values do not match, value is null in A only, value is null in B only, missing from A and missing from B. This level of detailing makes it simpler for the AE or data analyst to figure out what can be causing incompatibility issues between the models. While refactoring a model, it is common that some keys used to join models are inconsistent, bringing up unwanted null values on the final model as a result, and that would cause the audit row query to fail, without giving much more detail. @@ -224,7 +224,7 @@ Also, we can see that the example code includes a table printing option enabled But unlike from the `compare_queries` macro, if you have kept the printing function enabled, you should expect a table to be printed in the command line when you run the model, as shown in Figure 8. Otherwise, it will be materialized on your data warehouse like this: - + The `compare_column_values` macro separates column auditing results in seven different labels: - **Perfect match**: count of rows (and relative percentage) where the column values compared between both tables are equal and not null; diff --git a/website/blog/2023-04-18-building-a-kimball-dimensional-model-with-dbt.md b/website/blog/2023-04-18-building-a-kimball-dimensional-model-with-dbt.md index 691a7f77571..0aac3d77d53 100644 --- a/website/blog/2023-04-18-building-a-kimball-dimensional-model-with-dbt.md +++ b/website/blog/2023-04-18-building-a-kimball-dimensional-model-with-dbt.md @@ -39,7 +39,7 @@ Dimensional modeling is a technique introduced by Ralph Kimball in 1996 with his The goal of dimensional modeling is to take raw data and transform it into Fact and Dimension tables that represent the business. - + The benefits of dimensional modeling are: @@ -185,7 +185,7 @@ Now that you’ve set up the dbt project, database, and have taken a peek at the Identifying the business process is done in collaboration with the business user. The business user has context around the business objectives and business processes, and can provide you with that information. - + Upon speaking with the CEO of AdventureWorks, you learn the following information: diff --git a/website/blog/2023-04-24-framework-refactor-alteryx-dbt.md b/website/blog/2023-04-24-framework-refactor-alteryx-dbt.md index 2c6a9d87591..46cfcb58cdd 100644 --- a/website/blog/2023-04-24-framework-refactor-alteryx-dbt.md +++ b/website/blog/2023-04-24-framework-refactor-alteryx-dbt.md @@ -17,7 +17,7 @@ Alteryx is a visual data transformation platform with a user-friendly interface Transforming data to follow business rules can be a complex task, especially with the increasing amount of data collected by companies. To reduce such complexity, data transformation solutions designed as drag-and-drop tools can be seen as more intuitive, since analysts can visualize the steps taken to transform data. One example of a popular drag-and-drop transformation tool is Alteryx which allows business analysts to transform data by dragging and dropping operators in a canvas. The graphic interface of Alteryx Designer is presented in **Figure 1**. - + Nonetheless, as data workflows become more complex, Alteryx lacks the modularity, documentation, and version control capabilities that these flows require. In this sense, dbt may be a more suitable solution to building resilient and modular data pipelines due to its focus on data modeling. @@ -62,7 +62,7 @@ This blog post reports a consulting project for a major client at Indicium Tech When the client hired Indicium, they had dozens of Alteryx workflows built and running daily solely for the marketing team, which was the focus of the project. For the marketing team, the Alteryx workflows had to be executed in the correct order since they were interdependent, which means one Alteryx workflow used the outcome of the previous one, and so on. The main Alteryx workflows run daily by the marketing team took about 6 hours to run. Another important aspect to consider was that if a model had not finished running when the next one downstream began to run, the data would be incomplete, requiring the workflow to be run again. The execution of all models was usually scheduled to run overnight and by early morning, so the data would be up to date the next day. But if there was an error the night before, the data would be incorrect or out of date. **Figure 3** exemplifies the scheduler. - + Data lineage was a point that added a lot of extra labor because it was difficult to identify which models were dependent on others with so many Alteryx workflows built. When the number of workflows increased, it required a long time to create a view of that lineage in another software. So, if a column's name changed in a model due to a change in the model's source, the marketing analysts would have to map which downstream models were impacted by such change to make the necessary adjustments. Because model lineage was mapped manually, it was a challenge to keep it up to date. @@ -89,7 +89,7 @@ The first step is to validate all data sources and create one com It is essential to click on each data source (the green book icons on the leftmost side of **Figure 5**) and examine whether any transformations have been done inside that data source query. It is very common for a source icon to contain more than one data source or filter, which is why this step is important. The next step is to follow the workflow and transcribe the transformations into SQL queries in the dbt models to replicate the same data transformations as in the Alteryx workflow. - + For this step, we identified which operators were used in the data source (for example, joining data, order columns, group by, etc). Usually the Alteryx operators are pretty self-explanatory and all the information needed for understanding appears on the left side of the menu. We also checked the documentation to understand how each Alteryx operator works behind the scenes. @@ -102,7 +102,7 @@ Auditing large models, with sometimes dozens of columns and millions of rows, ca In this project, we used [the `audit_helper` package](https://github.com/dbt-labs/dbt-audit-helper), because it provides more robust auditing macros and offers more automation possibilities for our use case. To that end, we needed to have both the legacy Alteryx workflow output table and the refactored dbt model materialized in the project’s data warehouse. Then we used the macros available in `audit_helper` to compare query results, data types, column values, row numbers and many more things that are available within the package. For an in-depth explanation and tutorial on how to use the `audit_helper` package, check out [this blog post](https://docs.getdbt.com/blog/audit-helper-for-migration). **Figure 6** graphically illustrates the validation logic behind audit_helper. - + #### Step 4: Duplicate reports and connect them to the dbt refactored models @@ -120,7 +120,7 @@ The conversion proved to be of great value to the client due to three main aspec - Improved workflow visibility: dbt’s support for documentation and testing, associated with dbt Cloud, allows for great visibility of the workflow’s lineage execution, accelerating errors and data inconsistencies identification and troubleshooting. More than once, our team was able to identify the impact of one column’s logic alteration in downstream models much earlier than these Alteryx models. - Workflow simplification: dbt’s modularized approach of data modeling, aside from accelerating total run time of the data workflow, simplified the construction of new tables, based on the already existing modules, and improved code readability. - + As we can see, refactoring Alteryx to dbt was an important step in the direction of data availability, and allowed for much more agile processes for the client’s data team. With less time dedicated to manually executing sequential Alteryx workflows that took hours to complete, and searching for errors in each individual file, the analysts could focus on what they do best: **getting insights from the data and generating value from them**. diff --git a/website/blog/2023-05-02-modeling-ragged-time-varying-hierarchies.md b/website/blog/2023-05-02-modeling-ragged-time-varying-hierarchies.md index 2b00787cc07..f719bdb40cb 100644 --- a/website/blog/2023-05-02-modeling-ragged-time-varying-hierarchies.md +++ b/website/blog/2023-05-02-modeling-ragged-time-varying-hierarchies.md @@ -22,7 +22,7 @@ To help visualize this data, we're going to pretend we are a company that manufa Obviously, a real bike could have a hundred or more separate components. To keep things simple for this article, let's just consider the bike, the frame, a wheel, the wheel rim, tire, and tube. Our component hierarchy looks like: - + This hierarchy is *ragged* because different paths through the hierarchy terminate at different depths. It is *time-varying* because specific components can be added and removed. diff --git a/website/blog/2023-05-04-generating-dynamic-docs.md b/website/blog/2023-05-04-generating-dynamic-docs.md index b6e8d929e72..1e704178b0a 100644 --- a/website/blog/2023-05-04-generating-dynamic-docs.md +++ b/website/blog/2023-05-04-generating-dynamic-docs.md @@ -35,7 +35,7 @@ This results in a lot of the same columns (e.g. `account_id`) existing in differ In fact, I found a better way using some CLI commands, the dbt Codegen package and docs blocks. I also made the following meme in the [dbt Community Slack](https://www.getdbt.com/community/join-the-community/) channel #memes-and-off-topic-chatter to encapsulate this method: - + ## What pain is being solved? @@ -279,7 +279,7 @@ To confirm the formatting works, run the following command to get dbt Docs up an ``` $ dbt docs && dbt docs serve ``` - + Here, you can confirm that the column descriptions using the doc blocks are working as intended. @@ -326,7 +326,7 @@ user_id ``` Now, open your code editor, and replace `(.*)` with `{% docs column__activity_based_interest__$1 %}\n\n{% enddocs %}\n`, which will result in the following in your markdown file: - + Now you can add documentation to each of your columns. @@ -334,7 +334,7 @@ Now you can add documentation to each of your columns. You can programmatically identify all columns, and have them point towards the newly-created documentation. In your code editor, replace `\s{6}- name: (.*)\n description: ""` with ` - name: $1\n description: "{{ doc('column__activity_based_interest__$1') }}`: - + ⚠️ Some of your columns may already be available in existing docs blocks. In this example, the following replacements are done: - `{{ doc('column__activity_based_interest__user_id') }}` → `{{ doc("column_user_id") }}` @@ -343,7 +343,7 @@ You can programmatically identify all columns, and have them point towards the n ## Check that everything works Run `dbt docs generate`. If there are syntax errors, this will be found out at this stage. If successful, we can run `dbt docs serve` to perform a smoke test and ensure everything looks right: - + ## Additional considerations diff --git a/website/blog/2023-07-17-GPT-and-dbt-test.md b/website/blog/2023-07-17-GPT-and-dbt-test.md index 12e380eb220..84f756919a5 100644 --- a/website/blog/2023-07-17-GPT-and-dbt-test.md +++ b/website/blog/2023-07-17-GPT-and-dbt-test.md @@ -55,7 +55,7 @@ We all know how ChatGPT can digest very complex prompts, but as this is a tool f Opening ChatGPT with GPT4, my first prompt is usually along these lines: - + And the output of this simple prompt is nothing short of amazing: @@ -118,7 +118,7 @@ Back in my day (5 months ago), ChatGPT with GPT 3.5 didn’t have much context o A prompt for it would look something like: - + ## Specify details on generic tests in your prompts @@ -133,7 +133,7 @@ Accepted_values and relationships are slightly trickier but the model can be adj One way of doing this is with a prompt like this: - + Which results in the following output: diff --git a/website/blog/2023-08-01-announcing-materialized-views.md b/website/blog/2023-08-01-announcing-materialized-views.md index 6534e1d0b56..eb9716e73a5 100644 --- a/website/blog/2023-08-01-announcing-materialized-views.md +++ b/website/blog/2023-08-01-announcing-materialized-views.md @@ -103,7 +103,7 @@ When we talk about using materialized views in development, the question to thin Outside of the scheduling part, development will be pretty standard. Your pipeline is likely going to look something like this: - + This is assuming you have a near real time pipeline where you are pulling from a streaming data source like a Kafka Topic via an ingestion tool of your choice like Snowpipe for Streaming into your data platform. After your data is in the data platform, you will: diff --git a/website/blog/2024-01-09-defer-in-development.md b/website/blog/2024-01-09-defer-in-development.md index 406b036cab4..96e2ed53f85 100644 --- a/website/blog/2024-01-09-defer-in-development.md +++ b/website/blog/2024-01-09-defer-in-development.md @@ -80,7 +80,7 @@ dbt Cloud offers a seamless deferral experience in both the dbt Cloud IDE and th In the dbt Cloud IDE, there’s as simple toggle switch labeled `Defer to production`. Simply enabling this toggle will defer your command to the production environment when you run any dbt command in the IDE! - + The cloud CLI has this setting *on by default* — there’s nothing else you need to do to set this up! If you prefer not to defer, you can pass the `--no-defer` flag to override this behavior. You can also set an environment other than your production environment as the deferred to environment in your `dbt-cloud` settings in your `dbt_project.yml` : @@ -155,6 +155,6 @@ While defer is a faster and cheaper option for most folks in most situations, de ### Call me Willem Defer - + Defer to prod is a powerful way to improve your development velocity with dbt, and dbt Cloud makes it easier than ever to make use of this feature! You too could look this cool while you’re saving time and money developing on your dbt projects! diff --git a/website/docs/best-practices/dbt-unity-catalog-best-practices.md b/website/docs/best-practices/dbt-unity-catalog-best-practices.md index 5f230263cf8..a55e1d121af 100644 --- a/website/docs/best-practices/dbt-unity-catalog-best-practices.md +++ b/website/docs/best-practices/dbt-unity-catalog-best-practices.md @@ -21,11 +21,11 @@ If you use multiple Databricks workspaces to isolate development from production To do so, use dbt's [environment variable syntax](https://docs.getdbt.com/docs/dbt-cloud/using-dbt-cloud/cloud-environment-variables#special-environment-variables) for Server Hostname of your Databricks workspace URL and HTTP Path for the SQL warehouse in your connection settings. Note that Server Hostname still needs to appear to be a valid domain name to pass validation checks, so you will need to hard-code the domain suffix on the URL, eg `{{env_var('DBT_HOSTNAME')}}.cloud.databricks.com` and the path prefix for your warehouses, eg `/sql/1.0/warehouses/{{env_var('DBT_HTTP_PATH')}}`. - + When you create environments in dbt Cloud, you can assign environment variables to populate the connection information dynamically. Don’t forget to make sure the tokens you use in the credentials for those environments were generated from the associated workspace. - + ## Access Control diff --git a/website/docs/docs/build/custom-target-names.md b/website/docs/docs/build/custom-target-names.md index 4786641678d..ac7036de572 100644 --- a/website/docs/docs/build/custom-target-names.md +++ b/website/docs/docs/build/custom-target-names.md @@ -21,9 +21,9 @@ where created_at > date_trunc('month', current_date) To set a custom target name for a job in dbt Cloud, configure the **Target Name** field for your job in the Job Settings page. - + ## dbt Cloud IDE When developing in dbt Cloud, you can set a custom target name in your development credentials. Go to your account (from the gear menu in the top right hand corner), select the project under **Credentials**, and update the target name. - + diff --git a/website/docs/docs/build/data-tests.md b/website/docs/docs/build/data-tests.md index 7c12e5d7059..d981d7e272d 100644 --- a/website/docs/docs/build/data-tests.md +++ b/website/docs/docs/build/data-tests.md @@ -245,7 +245,7 @@ Normally, a data test query will calculate failures as part of its execution. If This workflow allows you to query and examine failing records much more quickly in development: - + Note that, if you elect to store test failures: * Test result tables are created in a schema suffixed or named `dbt_test__audit`, by default. It is possible to change this value by setting a `schema` config. (For more details on schema naming, see [using custom schemas](/docs/build/custom-schemas).) diff --git a/website/docs/docs/build/environment-variables.md b/website/docs/docs/build/environment-variables.md index 14076352ac1..3f2aebd0036 100644 --- a/website/docs/docs/build/environment-variables.md +++ b/website/docs/docs/build/environment-variables.md @@ -17,7 +17,7 @@ Environment variables in dbt Cloud must be prefixed with either `DBT_` or `DBT_E Environment variable values can be set in multiple places within dbt Cloud. As a result, dbt Cloud will interpret environment variables according to the following order of precedence (lowest to highest): - + There are four levels of environment variables: 1. the optional default argument supplied to the `env_var` Jinja function in code @@ -30,7 +30,7 @@ There are four levels of environment variables: To set environment variables at the project and environment level, click **Deploy** in the top left, then select **Environments**. Click **Environments Variables** to add and update your environment variables. - + @@ -38,7 +38,7 @@ You'll notice there is a `Project Default` column. This is a great place to set To the right of the `Project Default` column are all your environments. Values set at the environment level take priority over the project level default value. This is where you can tell dbt Cloud to interpret an environment value differently in your Staging vs. Production environment, as example. - + @@ -48,12 +48,12 @@ You may have multiple jobs that run in the same environment, and you'd like the When setting up or editing a job, you will see a section where you can override environment variable values defined at the environment or project level. - + Every job runs in a specific, deployment environment, and by default, a job will inherit the values set at the environment level (or the highest precedence level set) for the environment in which it runs. If you'd like to set a different value at the job level, edit the value to override it. - + **Overriding environment variables at the personal level** @@ -61,11 +61,11 @@ Every job runs in a specific, deployment environment, and by default, a job will You can also set a personal value override for an environment variable when you develop in the dbt integrated developer environment (IDE). By default, dbt Cloud uses environment variable values set in the project's development environment. To see and override these values, click the gear icon in the top right. Under "Your Profile," click **Credentials** and select your project. Click **Edit** and make any changes in "Environment Variables." - + To supply an override, developers can edit and specify a different value to use. These values will be respected in the IDE both for the Results and Compiled SQL tabs. - + :::info Appropriate coverage If you have not set a project level default value for every environment variable, it may be possible that dbt Cloud does not know how to interpret the value of an environment variable in all contexts. In such cases, dbt will throw a compilation error: "Env var required but not provided". @@ -77,7 +77,7 @@ If you change the value of an environment variable mid-session while using the I To refresh the IDE mid-development, click on either the green 'ready' signal or the red 'compilation error' message at the bottom right corner of the IDE. A new modal will pop up, and you should select the Refresh IDE button. This will load your environment variables values into your development environment. - + There are some known issues with partial parsing of a project and changing environment variables mid-session in the IDE. If you find that your dbt project is not compiling to the values you've set, try deleting the `target/partial_parse.msgpack` file in your dbt project which will force dbt to re-compile your whole project. @@ -86,7 +86,7 @@ There are some known issues with partial parsing of a project and changing envir While all environment variables are encrypted at rest in dbt Cloud, dbt Cloud has additional capabilities for managing environment variables with secret or otherwise sensitive values. If you want a particular environment variable to be scrubbed from all logs and error messages, in addition to obfuscating the value in the UI, you can prefix the key with `DBT_ENV_SECRET_`. This functionality is supported from `dbt v1.0` and on. - + **Note**: An environment variable can be used to store a [git token for repo cloning](/docs/build/environment-variables#clone-private-packages). We recommend you make the git token's permissions read only and consider using a machine account or service user's PAT with limited repo access in order to practice good security hygiene. @@ -131,7 +131,7 @@ Currently, it's not possible to dynamically set environment variables across mod **Note** — You can also use this method with Databricks SQL Warehouse. - + :::info Environment variables and Snowflake OAuth limitations Env vars works fine with username/password and keypair, including scheduled jobs, because dbt Core consumes the Jinja inserted into the autogenerated `profiles.yml` and resolves it to do an `env_var` lookup. diff --git a/website/docs/docs/build/exposures.md b/website/docs/docs/build/exposures.md index a26ac10bd36..65c0792e0a0 100644 --- a/website/docs/docs/build/exposures.md +++ b/website/docs/docs/build/exposures.md @@ -118,8 +118,8 @@ dbt test -s +exposure:weekly_jaffle_report When we generate our documentation site, you'll see the exposure appear: - - + + ## Related docs diff --git a/website/docs/docs/build/python-models.md b/website/docs/docs/build/python-models.md index b24d3129f0c..3fe194a4cb7 100644 --- a/website/docs/docs/build/python-models.md +++ b/website/docs/docs/build/python-models.md @@ -660,7 +660,7 @@ Use the `cluster` submission method with dedicated Dataproc clusters you or your - Enable Dataproc APIs for your project + region - If using the `cluster` submission method: Create or use an existing [Dataproc cluster](https://cloud.google.com/dataproc/docs/guides/create-cluster) with the [Spark BigQuery connector initialization action](https://github.com/GoogleCloudDataproc/initialization-actions/tree/master/connectors#bigquery-connectors). (Google recommends copying the action into your own Cloud Storage bucket, rather than using the example version shown in the screenshot) - + The following configurations are needed to run Python models on Dataproc. You can add these to your [BigQuery profile](/docs/core/connect-data-platform/bigquery-setup#running-python-models-on-dataproc) or configure them on specific Python models: - `gcs_bucket`: Storage bucket to which dbt will upload your model's compiled PySpark code. @@ -706,7 +706,7 @@ Google recommends installing Python packages on Dataproc clusters via initializa You can also install packages at cluster creation time by [defining cluster properties](https://cloud.google.com/dataproc/docs/tutorials/python-configuration#image_version_20): `dataproc:pip.packages` or `dataproc:conda.packages`. - + **Docs:** - [Dataproc overview](https://cloud.google.com/dataproc/docs/concepts/overview) diff --git a/website/docs/docs/build/sources.md b/website/docs/docs/build/sources.md index e4fb10ac725..466bcedc688 100644 --- a/website/docs/docs/build/sources.md +++ b/website/docs/docs/build/sources.md @@ -84,7 +84,7 @@ left join raw.jaffle_shop.customers using (customer_id) Using the `{{ source () }}` function also creates a dependency between the model and the source table. - + ### Testing and documenting sources You can also: @@ -189,7 +189,7 @@ from raw.jaffle_shop.orders The results of this query are used to determine whether the source is fresh or not: - + ### Filter diff --git a/website/docs/docs/build/sql-models.md b/website/docs/docs/build/sql-models.md index d33e4798974..a0dd174278b 100644 --- a/website/docs/docs/build/sql-models.md +++ b/website/docs/docs/build/sql-models.md @@ -254,7 +254,7 @@ create view analytics.customers as ( dbt uses the `ref` function to: * Determine the order to run the models by creating a dependent acyclic graph (DAG). - + * Manage separate environments — dbt will replace the model specified in the `ref` function with the database name for the (or view). Importantly, this is environment-aware — if you're running dbt with a target schema named `dbt_alice`, it will select from an upstream table in the same schema. Check out the tabs above to see this in action. diff --git a/website/docs/docs/cloud/connect-data-platform/about-connections.md b/website/docs/docs/cloud/connect-data-platform/about-connections.md index bc4a515112d..93bbf83584f 100644 --- a/website/docs/docs/cloud/connect-data-platform/about-connections.md +++ b/website/docs/docs/cloud/connect-data-platform/about-connections.md @@ -22,7 +22,7 @@ import MSCallout from '/snippets/_microsoft-adapters-soon.md'; You can connect to your database in dbt Cloud by clicking the gear in the top right and selecting **Account Settings**. From the Account Settings page, click **+ New Project**. - + These connection instructions provide the basic fields required for configuring a data platform connection in dbt Cloud. For more detailed guides, which include demo project data, read our [Quickstart guides](https://docs.getdbt.com/guides) diff --git a/website/docs/docs/cloud/connect-data-platform/connect-apache-spark.md b/website/docs/docs/cloud/connect-data-platform/connect-apache-spark.md index eecf0a8e229..0186d821a54 100644 --- a/website/docs/docs/cloud/connect-data-platform/connect-apache-spark.md +++ b/website/docs/docs/cloud/connect-data-platform/connect-apache-spark.md @@ -36,4 +36,4 @@ HTTP and Thrift connection methods: | Auth | Optional, supply if using Kerberos | `KERBEROS` | | Kerberos Service Name | Optional, supply if using Kerberos | `hive` | - + diff --git a/website/docs/docs/cloud/connect-data-platform/connect-databricks.md b/website/docs/docs/cloud/connect-data-platform/connect-databricks.md index ebf6be63bd1..032246ad16a 100644 --- a/website/docs/docs/cloud/connect-data-platform/connect-databricks.md +++ b/website/docs/docs/cloud/connect-data-platform/connect-databricks.md @@ -37,4 +37,4 @@ To set up the Databricks connection, supply the following fields: | HTTP Path | The HTTP path of the Databricks cluster or SQL warehouse | /sql/1.0/warehouses/1a23b4596cd7e8fg | | Catalog | Name of Databricks Catalog (optional) | Production | - + diff --git a/website/docs/docs/cloud/connect-data-platform/connect-snowflake.md b/website/docs/docs/cloud/connect-data-platform/connect-snowflake.md index 9193a890ed3..c265529fb49 100644 --- a/website/docs/docs/cloud/connect-data-platform/connect-snowflake.md +++ b/website/docs/docs/cloud/connect-data-platform/connect-snowflake.md @@ -27,7 +27,7 @@ username (specifically, the `login_name`) and the corresponding user's Snowflake to authenticate dbt Cloud to run queries against Snowflake on behalf of a Snowflake user. **Note**: The schema field in the **Developer Credentials** section is a required field. - + ### Key Pair @@ -68,7 +68,7 @@ As of dbt version 1.5.0, you can use a `private_key` string in place of `private The OAuth auth method permits dbt Cloud to run development queries on behalf of a Snowflake user without the configuration of Snowflake password in dbt Cloud. For more information on configuring a Snowflake OAuth connection in dbt Cloud, please see [the docs on setting up Snowflake OAuth](/docs/cloud/manage-access/set-up-snowflake-oauth). - + ## Configuration diff --git a/website/docs/docs/cloud/connect-data-platform/connnect-bigquery.md b/website/docs/docs/cloud/connect-data-platform/connnect-bigquery.md index 2e637b7450a..7ea6e380000 100644 --- a/website/docs/docs/cloud/connect-data-platform/connnect-bigquery.md +++ b/website/docs/docs/cloud/connect-data-platform/connnect-bigquery.md @@ -32,7 +32,7 @@ In addition to these fields, there are two other optional fields that can be con - + ### BigQuery OAuth **Available in:** Development environments, Enterprise plans only @@ -43,7 +43,7 @@ more information on the initial configuration of a BigQuery OAuth connection in [the docs on setting up BigQuery OAuth](/docs/cloud/manage-access/set-up-bigquery-oauth). As an end user, if your organization has set up BigQuery OAuth, you can link a project with your personal BigQuery account in your personal Profile in dbt Cloud, like so: - + ## Configuration diff --git a/website/docs/docs/cloud/git/authenticate-azure.md b/website/docs/docs/cloud/git/authenticate-azure.md index bbb2cff8b29..42028bf993b 100644 --- a/website/docs/docs/cloud/git/authenticate-azure.md +++ b/website/docs/docs/cloud/git/authenticate-azure.md @@ -16,11 +16,11 @@ Connect your dbt Cloud profile to Azure DevOps using OAuth: 1. Click the gear icon at the top right and select **Profile settings**. 2. Click **Linked Accounts**. 3. Next to Azure DevOps, click **Link**. - + 4. Once you're redirected to Azure DevOps, sign into your account. 5. When you see the permission request screen from Azure DevOps App, click **Accept**. - + You will be directed back to dbt Cloud, and your profile should be linked. You are now ready to develop in dbt Cloud! diff --git a/website/docs/docs/cloud/git/connect-github.md b/website/docs/docs/cloud/git/connect-github.md index 715f23912e5..ff0f2fff18f 100644 --- a/website/docs/docs/cloud/git/connect-github.md +++ b/website/docs/docs/cloud/git/connect-github.md @@ -30,13 +30,13 @@ To connect your dbt Cloud account to your GitHub account: 2. Select **Linked Accounts** from the left menu. - + 3. In the **Linked Accounts** section, set up your GitHub account connection to dbt Cloud by clicking **Link** to the right of GitHub. This redirects you to your account on GitHub where you will be asked to install and configure the dbt Cloud application. 4. Select the GitHub organization and repositories dbt Cloud should access. - + 5. Assign the dbt Cloud GitHub App the following permissions: - Read access to metadata @@ -52,7 +52,7 @@ To connect your dbt Cloud account to your GitHub account: ## Limiting repository access in GitHub If you are your GitHub organization owner, you can also configure the dbt Cloud GitHub application to have access to only select repositories. This configuration must be done in GitHub, but we provide an easy link in dbt Cloud to start this process. - + ## Personally authenticate with GitHub @@ -70,7 +70,7 @@ To connect a personal GitHub account: 2. Select **Linked Accounts** in the left menu. If your GitHub account is not connected, you’ll see "No connected account". 3. Select **Link** to begin the setup process. You’ll be redirected to GitHub, and asked to authorize dbt Cloud in a grant screen. - + 4. Once you approve authorization, you will be redirected to dbt Cloud, and you should now see your connected account. diff --git a/website/docs/docs/cloud/git/connect-gitlab.md b/website/docs/docs/cloud/git/connect-gitlab.md index 316e6af0135..e55552e2d86 100644 --- a/website/docs/docs/cloud/git/connect-gitlab.md +++ b/website/docs/docs/cloud/git/connect-gitlab.md @@ -22,11 +22,11 @@ To connect your GitLab account: 2. Select **Linked Accounts** in the left menu. 3. Click **Link** to the right of your GitLab account. - + When you click **Link**, you will be redirected to GitLab and prompted to sign into your account. GitLab will then ask for your explicit authorization: - + Once you've accepted, you should be redirected back to dbt Cloud, and you'll see that your account has been linked to your profile. @@ -52,7 +52,7 @@ For more detail, GitLab has a [guide for creating a Group Application](https://d In GitLab, navigate to your group settings and select **Applications**. Here you'll see a form to create a new application. - + In GitLab, when creating your Group Application, input the following: @@ -67,7 +67,7 @@ Replace `YOUR_ACCESS_URL` with the [appropriate Access URL](/docs/cloud/about-cl The application form in GitLab should look as follows when completed: - + Click **Save application** in GitLab, and GitLab will then generate an **Application ID** and **Secret**. These values will be available even if you close the app screen, so this is not the only chance you have to save them. @@ -76,7 +76,7 @@ If you're a Business Critical customer using [IP restrictions](/docs/cloud/secur ### Adding the GitLab OAuth application to dbt Cloud After you've created your GitLab application, you need to provide dbt Cloud information about the app. In dbt Cloud, account admins should navigate to **Account Settings**, click on the **Integrations** tab, and expand the GitLab section. - + In dbt Cloud, input the following values: @@ -92,7 +92,7 @@ Once the form is complete in dbt Cloud, click **Save**. You will then be redirected to GitLab and prompted to sign into your account. GitLab will ask for your explicit authorization: - + Once you've accepted, you should be redirected back to dbt Cloud, and your integration is ready for developers on your team to [personally authenticate with](#personally-authenticating-with-gitlab). @@ -103,7 +103,7 @@ To connect a personal GitLab account, dbt Cloud developers should navigate to Yo If your GitLab account is not connected, you’ll see "No connected account". Select **Link** to begin the setup process. You’ll be redirected to GitLab, and asked to authorize dbt Cloud in a grant screen. - + Once you approve authorization, you will be redirected to dbt Cloud, and you should see your connected account. You're now ready to start developing in the dbt Cloud IDE or dbt Cloud CLI. diff --git a/website/docs/docs/cloud/git/import-a-project-by-git-url.md b/website/docs/docs/cloud/git/import-a-project-by-git-url.md index 2ccaba1ec4d..83846bb1f0b 100644 --- a/website/docs/docs/cloud/git/import-a-project-by-git-url.md +++ b/website/docs/docs/cloud/git/import-a-project-by-git-url.md @@ -37,7 +37,7 @@ If you use GitHub, you can import your repo directly using [dbt Cloud's GitHub A - After adding this key, dbt Cloud will be able to read and write files in your dbt project. - Refer to [Adding a deploy key in GitHub](https://github.blog/2015-06-16-read-only-deploy-keys/) - + ## GitLab @@ -52,7 +52,7 @@ If you use GitLab, you can import your repo directly using [dbt Cloud's GitLab A - After saving this SSH key, dbt Cloud will be able to read and write files in your GitLab repository. - Refer to [Adding a read only deploy key in GitLab](https://docs.gitlab.com/ee/ssh/#per-repository-deploy-keys) - + ## BitBucket @@ -60,7 +60,7 @@ If you use GitLab, you can import your repo directly using [dbt Cloud's GitLab A - Next, click the **Add key** button and paste in the deploy key generated by dbt Cloud for your repository. - After saving this SSH key, dbt Cloud will be able to read and write files in your BitBucket repository. - + ## AWS CodeCommit @@ -109,17 +109,17 @@ If you use Azure DevOps and you are on the dbt Cloud Enterprise plan, you can im 2. We recommend using a dedicated service user for the integration to ensure that dbt Cloud's connection to Azure DevOps is not interrupted by changes to user permissions. - + 3. Next, click the **+ New Key** button to create a new SSH key for the repository. - + 4. Select a descriptive name for the key and then paste in the deploy key generated by dbt Cloud for your repository. 5. After saving this SSH key, dbt Cloud will be able to read and write files in your Azure DevOps repository. - + ## Other git providers diff --git a/website/docs/docs/cloud/git/setup-azure.md b/website/docs/docs/cloud/git/setup-azure.md index b24ec577935..843371be6ea 100644 --- a/website/docs/docs/cloud/git/setup-azure.md +++ b/website/docs/docs/cloud/git/setup-azure.md @@ -34,11 +34,11 @@ Many customers ask why they need to select Multitenant instead of Single tenant, 6. Add a redirect URI by selecting **Web** and, in the field, entering `https://YOUR_ACCESS_URL/complete/azure_active_directory`, replacing `YOUR_ACCESS_URL` with the [appropriate Access URL](/docs/cloud/about-cloud/regions-ip-addresses) for your region and plan. 7. Click **Register**. - + Here's what your app should look like before registering it: - + ## Add permissions to your new app @@ -51,7 +51,7 @@ Provide your new app access to Azure DevOps: 4. Select **Azure DevOps**. 5. Select the **user_impersonation** permission. This is the only permission available for Azure DevOps. - + ## Add another redirect URI @@ -63,7 +63,7 @@ You also need to add another redirect URI to your Azure AD application. This red `https://YOUR_ACCESS_URL/complete/azure_active_directory_service_user` 4. Click **Save**. - + @@ -77,7 +77,7 @@ If you have already connected your Azure DevOps account to Active Directory, the 4. Select the directory you want to connect. 5. Click **Connect**. - + ## Add your Azure AD app to dbt Cloud @@ -91,7 +91,7 @@ Once you connect your Azure AD app and Azure DevOps, you need to provide dbt Clo - **Application (client) ID:** Found in the Azure AD App. - **Client Secrets:** You need to first create a secret in the Azure AD App under **Client credentials**. Make sure to copy the **Value** field in the Azure AD App and paste it in the **Client Secret** field in dbt Cloud. You are responsible for the Azure AD app secret expiration and rotation. - **Directory(tenant) ID:** Found in the Azure AD App. - + Your Azure AD app should now be added to your dbt Cloud Account. People on your team who want to develop in the dbt Cloud IDE or dbt Cloud CLI can now personally [authorize Azure DevOps from their profiles](/docs/cloud/git/authenticate-azure). @@ -345,7 +345,7 @@ To connect the service user: 2. The admin should click **Link Azure Service User** in dbt Cloud. 3. The admin will be directed to Azure DevOps and must accept the Azure AD app's permissions. 4. Finally, the admin will be redirected to dbt Cloud, and the service user will be connected. - + Once connected, dbt Cloud displays the email address of the service user so you know which user's permissions are enabling headless actions in deployment environments. To change which account is connected, disconnect the profile in dbt Cloud, sign into the alternative Azure DevOps service account, and re-link the account in dbt Cloud. diff --git a/website/docs/docs/cloud/manage-access/auth0-migration.md b/website/docs/docs/cloud/manage-access/auth0-migration.md index 610c97e8b74..a40bb006d06 100644 --- a/website/docs/docs/cloud/manage-access/auth0-migration.md +++ b/website/docs/docs/cloud/manage-access/auth0-migration.md @@ -17,11 +17,11 @@ If you have not yet configured SSO in dbt Cloud, refer instead to our setup guid The Auth0 migration feature is being rolled out incrementally to customers who have SSO features already enabled. When the migration option has been enabled on your account, you will see **SSO Updates Available** on the right side of the menu bar, near the settings icon. - + Alternatively, you can start the process from the **Settings** page in the **Single Sign-on** pane. Click the **Begin Migration** button to start. - + Once you have opted to begin the migration process, the following steps will vary depending on the configured identity provider. You can just skip to the section that's right for your environment. These steps only apply to customers going through the migration; new setups will use the existing [setup instructions](/docs/cloud/manage-access/sso-overview). @@ -48,15 +48,15 @@ Below are sample steps to update. You must complete all of them to ensure uninte Here is an example of an updated SAML 2.0 setup in Okta. - + 2. Save the configuration, and your SAML settings will look something like this: - + 3. Toggle the `Enable new SSO authentication` option to ensure the traffic is routed correctly. _The new SSO migration action is final and cannot be undone_ - + 4. Save the settings and test the new configuration using the SSO login URL provided on the settings page. @@ -68,17 +68,17 @@ Below are steps to update. You must complete all of them to ensure uninterrupted 1. Open the [Google Cloud console](https://console.cloud.google.com/) and select the project with your dbt Cloud single sign-on settings. From the project page **Quick Access**, select **APIs and Services** - + 2. Click **Credentials** from the left side pane and click the appropriate name from **OAuth 2.0 Client IDs** - + 3. In the **Client ID for Web application** window, find the **Authorized Redirect URIs** field and click **Add URI** and enter `https:///login/callback`. Click **Save** once you are done. - + 4. _You will need a person with Google Workspace admin privileges to complete these steps in dbt Cloud_. In dbt Cloud, navigate to the **Account Settings**, click on **Single Sign-on**, and then click **Edit** on the right side of the SSO pane. Toggle the **Enable New SSO Authentication** option and select **Save**. This will trigger an authorization window from Google that will require admin credentials. _The migration action is final and cannot be undone_. Once the authentication has gone through, test the new configuration using the SSO login URL provided on the settings page. @@ -88,7 +88,7 @@ You must complete the domain authorization before you toggle `Enable New SSO Aut ::: - + ## Azure Active Directory @@ -98,15 +98,15 @@ Below are steps to update. You must complete all of them to ensure uninterrupted 1. Click **App Registrations** on the left side menu. - + 2. Select the proper **dbt Cloud** app (name may vary) from the list. From the app overview, click on the hyperlink next to **Redirect URI** - + 3. In the **Web** pane with **Redirect URIs**, click **Add URI** and enter the appropriate `https:///login/callback`. Save the settings and verify it is counted in the updated app overview. - + 4. Navigate to the dbt Cloud environment and open the **Account Settings**. Click the **Single Sign-on** option from the left side menu and click the **Edit** option from the right side of the SSO pane. The **domain** field is the domain your organization uses to login to Azure AD. Toggle the **Enable New SSO Authentication** option and **Save**. _Once this option is enabled, it cannot be undone._ @@ -116,4 +116,4 @@ You must complete the domain authorization before you toggle `Enable New SSO Aut ::: - + diff --git a/website/docs/docs/cloud/manage-access/cloud-seats-and-users.md b/website/docs/docs/cloud/manage-access/cloud-seats-and-users.md index 76e16039ae8..adf849c3ba1 100644 --- a/website/docs/docs/cloud/manage-access/cloud-seats-and-users.md +++ b/website/docs/docs/cloud/manage-access/cloud-seats-and-users.md @@ -130,7 +130,7 @@ to allocate for the user. If your account does not have an available license to allocate, you will need to add more licenses to your plan to complete the license change. - + ### Mapped configuration @@ -148,7 +148,7 @@ license. To assign Read-Only licenses to certain groups of users, create a new License Mapping for the Read-Only license type and include a comma separated list of IdP group names that should receive a Read-Only license at sign-in time. - Usage notes: diff --git a/website/docs/docs/cloud/manage-access/enterprise-permissions.md b/website/docs/docs/cloud/manage-access/enterprise-permissions.md index ac2d6258819..dcacda20deb 100644 --- a/website/docs/docs/cloud/manage-access/enterprise-permissions.md +++ b/website/docs/docs/cloud/manage-access/enterprise-permissions.md @@ -28,11 +28,11 @@ Role-Based Access Control (RBAC) is helpful for automatically assigning permissi 1. Click the gear icon to the top right and select **Account Settings**. From the **Team** section, click **Groups** - + 1. Select an existing group or create a new group to add RBAC. Name the group (this can be any name you like, but it's recommended to keep it consistent with the SSO groups). If you have configured SSO with SAML 2.0, you may have to use the GroupID instead of the name of the group. 2. Configure the SSO provider groups you want to add RBAC by clicking **Add** in the **SSO** section. These fields are case-sensitive and must match the source group formatting. 3. Configure the permissions for users within those groups by clicking **Add** in the **Access** section of the window. - + 4. When you've completed your configurations, click **Save**. Users will begin to populate the group automatically once they have signed in to dbt Cloud with their SSO credentials. diff --git a/website/docs/docs/cloud/manage-access/invite-users.md b/website/docs/docs/cloud/manage-access/invite-users.md index f79daebf45e..21be7010a30 100644 --- a/website/docs/docs/cloud/manage-access/invite-users.md +++ b/website/docs/docs/cloud/manage-access/invite-users.md @@ -20,11 +20,11 @@ You must have proper permissions to invite new users: 1. In your dbt Cloud account, select the gear menu in the upper right corner and then select **Account Settings**. 2. From the left sidebar, select **Users**. - + 3. Click on **Invite Users**. - + 4. In the **Email Addresses** field, enter the email addresses of the users you would like to invite separated by comma, semicolon, or a new line. 5. Select the license type for the batch of users from the **License** dropdown. @@ -40,7 +40,7 @@ dbt Cloud generates and sends emails from `support@getdbt.com` to the specified The email contains a link to create an account. When the user clicks on this they will be brought to one of two screens depending on whether SSO is configured or not. - + @@ -48,7 +48,7 @@ The email contains a link to create an account. When the user clicks on this the The default settings send the email, the user clicks the link, and is prompted to create their account: - + @@ -56,7 +56,7 @@ The default settings send the email, the user clicks the link, and is prompted t If SSO is configured for the environment, the user clicks the link, is brought to a confirmation screen, and presented with a link to authenticate against the company's identity provider: - + @@ -73,4 +73,4 @@ Once the user completes this process, their email and user information will popu * What happens if I need to resend the invitation? _From the Users page, click on the invite record, and you will be presented with the option to resend the invitation._ * What can I do if I entered an email address incorrectly? _From the Users page, click on the invite record, and you will be presented with the option to revoke it. Once revoked, generate a new invitation to the correct email address._ - \ No newline at end of file + \ No newline at end of file diff --git a/website/docs/docs/cloud/manage-access/set-up-bigquery-oauth.md b/website/docs/docs/cloud/manage-access/set-up-bigquery-oauth.md index b0930af16f7..87018b14d56 100644 --- a/website/docs/docs/cloud/manage-access/set-up-bigquery-oauth.md +++ b/website/docs/docs/cloud/manage-access/set-up-bigquery-oauth.md @@ -28,7 +28,7 @@ To get started, you need to create a client ID and secret for [authentication](h In the BigQuery console, navigate to **APIs & Services** and select **Credentials**: - + On the **Credentials** page, you can see your existing keys, client IDs, and service accounts. @@ -46,7 +46,7 @@ Fill in the application, replacing `YOUR_ACCESS_URL` with the [appropriate Acces Then click **Create** to create the BigQuery OAuth app and see the app client ID and secret values. These values are available even if you close the app screen, so this isn't the only chance you have to save them. - + @@ -59,7 +59,7 @@ Now that you have an OAuth app set up in BigQuery, you'll need to add the client - add the client ID and secret from the BigQuery OAuth app under the **OAuth2.0 Settings** section - + ### Authenticating to BigQuery Once the BigQuery OAuth app is set up for a dbt Cloud project, each dbt Cloud user will need to authenticate with BigQuery in order to use the IDE. To do so: @@ -68,10 +68,10 @@ Once the BigQuery OAuth app is set up for a dbt Cloud project, each dbt Cloud us - Select **Credentials**. - choose your project from the list - select **Authenticate BigQuery Account** - + You will then be redirected to BigQuery and asked to approve the drive, cloud platform, and BigQuery scopes, unless the connection is less privileged. - + Select **Allow**. This redirects you back to dbt Cloud. You should now be an authenticated BigQuery user, ready to use the dbt Cloud IDE. diff --git a/website/docs/docs/cloud/manage-access/set-up-databricks-oauth.md b/website/docs/docs/cloud/manage-access/set-up-databricks-oauth.md index 8dcbb42ffa7..679133b7844 100644 --- a/website/docs/docs/cloud/manage-access/set-up-databricks-oauth.md +++ b/website/docs/docs/cloud/manage-access/set-up-databricks-oauth.md @@ -60,7 +60,7 @@ Now that you have an OAuth app set up in Databricks, you'll need to add the clie - select **Connection** to edit the connection details - add the `OAuth Client ID` and `OAuth Client Secret` from the Databricks OAuth app under the **Optional Settings** section - + ### Authenticating to Databricks (dbt Cloud IDE developer) @@ -72,6 +72,6 @@ Once the Databricks connection via OAuth is set up for a dbt Cloud project, each - Select `OAuth` as the authentication method, and click **Save** - Finalize by clicking the **Connect Databricks Account** button - + You will then be redirected to Databricks and asked to approve the connection. This redirects you back to dbt Cloud. You should now be an authenticated Databricks user, ready to use the dbt Cloud IDE. diff --git a/website/docs/docs/cloud/manage-access/set-up-snowflake-oauth.md b/website/docs/docs/cloud/manage-access/set-up-snowflake-oauth.md index 8e38a60dd27..5b9abb6058a 100644 --- a/website/docs/docs/cloud/manage-access/set-up-snowflake-oauth.md +++ b/website/docs/docs/cloud/manage-access/set-up-snowflake-oauth.md @@ -68,7 +68,7 @@ from Enter the Client ID and Client Secret into dbt Cloud to complete the creation of your Connection. - + ### Authorize Developer Credentials @@ -76,7 +76,7 @@ Once Snowflake SSO is enabled, users on the project will be able to configure th ### SSO OAuth Flow Diagram - + Once a user has authorized dbt Cloud with Snowflake via their identity provider, Snowflake will return a Refresh Token to the dbt Cloud application. dbt Cloud is then able to exchange this refresh token for an Access Token which can then be used to open a Snowflake connection and execute queries in the dbt Cloud IDE on behalf of users. diff --git a/website/docs/docs/cloud/manage-access/set-up-sso-google-workspace.md b/website/docs/docs/cloud/manage-access/set-up-sso-google-workspace.md index 1e45de190f5..19779baf615 100644 --- a/website/docs/docs/cloud/manage-access/set-up-sso-google-workspace.md +++ b/website/docs/docs/cloud/manage-access/set-up-sso-google-workspace.md @@ -52,7 +52,7 @@ Client Secret for use in dbt Cloud. | **Authorized domains** | `getdbt.com` (US multi-tenant) `getdbt.com` and `dbt.com`(US Cell 1) `dbt.com` (EMEA or AU) | If deploying into a VPC, use the domain for your deployment | | **Scopes** | `email, profile, openid` | The default scopes are sufficient | - + 6. Save the **Consent screen** settings to navigate back to the **Create OAuth client id** page. @@ -65,7 +65,7 @@ Client Secret for use in dbt Cloud. | **Authorized Javascript origins** | `https://YOUR_ACCESS_URL` | | **Authorized Redirect URIs** | `https://YOUR_AUTH0_URI/login/callback` | - + 8. Press "Create" to create your new credentials. A popup will appear with a **Client ID** and **Client Secret**. Write these down as you will need them later! @@ -77,7 +77,7 @@ Group Membership information from the GSuite API. To enable the Admin SDK for this project, navigate to the [Admin SDK Settings page](https://console.developers.google.com/apis/api/admin.googleapis.com/overview) and ensure that the API is enabled. - + ## Configuration in dbt Cloud @@ -99,7 +99,7 @@ Settings. Cloud by navigating to `https://YOUR_ACCESS_URL/enterprise-login/LOGIN-SLUG`, replacing `YOUR_ACCESS_URL` with the [appropriate Access URL](/docs/cloud/about-cloud/regions-ip-addresses) for your region and plan. The `LOGIN-SLUG` must be unique across all dbt Cloud accounts, so pick a slug that uniquely identifies your company. - + 3. Click **Save & Authorize** to authorize your credentials. You should be dropped into the GSuite OAuth flow and prompted to log into dbt Cloud with your work email address. If authentication is successful, you will be @@ -109,7 +109,7 @@ Settings. you do not see a `groups` entry in the IdP attribute list, consult the following Troubleshooting steps. - + If the verification information looks appropriate, then you have completed the configuration of GSuite SSO. diff --git a/website/docs/docs/cloud/manage-access/set-up-sso-saml-2.0.md b/website/docs/docs/cloud/manage-access/set-up-sso-saml-2.0.md index 79c33a28450..ba925fa2c24 100644 --- a/website/docs/docs/cloud/manage-access/set-up-sso-saml-2.0.md +++ b/website/docs/docs/cloud/manage-access/set-up-sso-saml-2.0.md @@ -426,7 +426,7 @@ To complete setup, follow the steps below in dbt Cloud: | Identity Provider Issuer | Paste the **Identity Provider Issuer** shown in the IdP setup instructions | | X.509 Certificate | Paste the **X.509 Certificate** shown in the IdP setup instructions;
**Note:** When the certificate expires, an Idp admin will have to generate a new one to be pasted into dbt Cloud for uninterrupted application access. | | Slug | Enter your desired login slug. | - 4. Click **Save** to complete setup for the SAML 2.0 integration. diff --git a/website/docs/docs/cloud/manage-access/sso-overview.md b/website/docs/docs/cloud/manage-access/sso-overview.md index 938587d59b3..b4954955c8c 100644 --- a/website/docs/docs/cloud/manage-access/sso-overview.md +++ b/website/docs/docs/cloud/manage-access/sso-overview.md @@ -24,7 +24,7 @@ Once you configure SSO, even partially, you cannot disable or revert it. When yo The diagram below explains the basic process by which users are provisioned in dbt Cloud upon logging in with SSO. - + #### Diagram Explanation diff --git a/website/docs/docs/cloud/secure/ip-restrictions.md b/website/docs/docs/cloud/secure/ip-restrictions.md index a0206ca038d..034b3a6c144 100644 --- a/website/docs/docs/cloud/secure/ip-restrictions.md +++ b/website/docs/docs/cloud/secure/ip-restrictions.md @@ -71,6 +71,6 @@ Once you are done adding all your ranges, IP restrictions can be enabled by sele Once enabled, when someone attempts to access dbt Cloud from a restricted IP, they will encounter one of the following messages depending on whether they use email & password or SSO login. - + - + diff --git a/website/docs/docs/cloud/secure/redshift-privatelink.md b/website/docs/docs/cloud/secure/redshift-privatelink.md index da5312876fb..c42c703556b 100644 --- a/website/docs/docs/cloud/secure/redshift-privatelink.md +++ b/website/docs/docs/cloud/secure/redshift-privatelink.md @@ -23,17 +23,17 @@ While Redshift Serverless does support Redshift-managed type VPC endpoints, this 1. On the running Redshift cluster, select the **Properties** tab. - + 2. In the **Granted accounts** section, click **Grant access**. - + 3. Enter the AWS account ID: `346425330055` - _NOTE: This account ID only applies to dbt Cloud Multi-Tenant environments. For Virtual Private/Single-Tenant account IDs please contact [Support](https://docs.getdbt.com/community/resources/getting-help#dbt-cloud-support)._ 4. Choose **Grant access to all VPCs** —or— (optional) contact [Support](https://docs.getdbt.com/community/resources/getting-help#dbt-cloud-support) for the appropriate regional VPC ID to designate in the **Grant access to specific VPCs** field. - + 5. Add the required information to the following template, and submit your request to [dbt Support](https://docs.getdbt.com/community/resources/getting-help#dbt-cloud-support): @@ -62,14 +62,14 @@ Creating an Interface VPC PrivateLink connection requires creating multiple AWS - **Standard Redshift** - Use IP addresses from the Redshift cluster’s **Network Interfaces** whenever possible. While IPs listed in the **Node IP addresses** section will work, they are also more likely to change. - + - There will likely be only one Network Interface (NI) to start, but if the cluster fails over to another availability zone (AZ), a new NI will also be created for that AZ. The NI IP from the original AZ will still work, but the new NI IP can also be added to the Target Group. If adding additional IPs, note that the NLB will also need to add the corresponding AZ. Once created, the NI(s) should stay the same (This is our observation from testing, but AWS does not officially document it). - **Redshift Serverless** - To find the IP addresses for Redshift Serverless instance locate and copy the endpoint (only the URL listed before the port) in the Workgroup configuration section of the AWS console for the instance. - + - From a command line run the command `nslookup ` using the endpoint found in the previous step and use the associated IP(s) for the Target Group. @@ -85,13 +85,13 @@ On the provisioned VPC endpoint service, click the **Allow principals** tab. Cli - Principal: `arn:aws:iam::346425330055:role/MTPL_Admin` - + ### 3. Obtain VPC Endpoint Service Name Once the VPC Endpoint Service is provisioned, you can find the service name in the AWS console by navigating to **VPC** → **Endpoint Services** and selecting the appropriate endpoint service. You can copy the service name field value and include it in your communication to dbt Cloud support. - + ### 4. Add the required information to the template below, and submit your request to [dbt Support](https://docs.getdbt.com/community/resources/getting-help#dbt-cloud-support): ``` diff --git a/website/docs/docs/cloud/secure/snowflake-privatelink.md b/website/docs/docs/cloud/secure/snowflake-privatelink.md index bc8f30a5566..dd046259e4e 100644 --- a/website/docs/docs/cloud/secure/snowflake-privatelink.md +++ b/website/docs/docs/cloud/secure/snowflake-privatelink.md @@ -27,7 +27,7 @@ Users connecting to Snowflake using SSO over a PrivateLink connection from dbt C - AWS account ID: `346425330055` - _NOTE: This account ID only applies to dbt Cloud Multi-Tenant environments. For Virtual Private/Single-Tenant account IDs please contact [Support](https://docs.getdbt.com/community/resources/getting-help#dbt-cloud-support)._ - You will need to have `ACCOUNTADMIN` access to the Snowflake instance to submit a Support request. - + 2. After Snowflake has granted the requested access, run the Snowflake system function [SYSTEM$GET_PRIVATELINK_CONFIG](https://docs.snowflake.com/en/sql-reference/functions/system_get_privatelink_config.html) and copy the output. diff --git a/website/docs/docs/collaborate/cloud-build-and-view-your-docs.md b/website/docs/docs/collaborate/cloud-build-and-view-your-docs.md index 7e85cbb8b11..e104ea8640c 100644 --- a/website/docs/docs/collaborate/cloud-build-and-view-your-docs.md +++ b/website/docs/docs/collaborate/cloud-build-and-view-your-docs.md @@ -16,7 +16,7 @@ To set up a job to generate docs: 1. In the top left, click **Deploy** and select **Jobs**. 2. Create a new job or select an existing job and click **Settings**. 3. Under "Execution Settings," select **Generate docs on run**. - + 4. Click **Save**. Proceed to [configure project documentation](#configure-project-documentation) so your project generates the documentation when this job runs. @@ -44,7 +44,7 @@ You configure project documentation to generate documentation when the job you s 3. Navigate to **Projects** and select the project that needs documentation. 4. Click **Edit**. 5. Under **Artifacts**, select the job that should generate docs when it runs. - + 6. Click **Save**. ## Generating documentation @@ -65,4 +65,4 @@ These generated docs always show the last fully successful run, which means that The dbt Cloud IDE makes it possible to view [documentation](/docs/collaborate/documentation) for your dbt project while your code is still in development. With this workflow, you can inspect and verify what your project's generated documentation will look like before your changes are released to production. - + diff --git a/website/docs/docs/collaborate/documentation.md b/website/docs/docs/collaborate/documentation.md index b6636a84eee..1a989806851 100644 --- a/website/docs/docs/collaborate/documentation.md +++ b/website/docs/docs/collaborate/documentation.md @@ -29,7 +29,7 @@ Importantly, dbt also provides a way to add **descriptions** to models, columns, Here's an example docs site: - + ## Adding descriptions to your project To add descriptions to your project, use the `description:` key in the same files where you declare [tests](/docs/build/data-tests), like so: @@ -177,17 +177,17 @@ up to page views and sessions. ## Navigating the documentation site Using the docs interface, you can navigate to the documentation for a specific model. That might look something like this: - + Here, you can see a representation of the project structure, a markdown description for a model, and a list of all of the columns (with documentation) in the model. From a docs page, you can click the green button in the bottom-right corner of the webpage to expand a "mini-map" of your DAG. This pane (shown below) will display the immediate parents and children of the model that you're exploring. - + In this example, the `fct_subscription_transactions` model only has one direct parent. By clicking the "Expand" button in the top-right corner of the window, we can pivot the graph horizontally and view the full lineage for our model. This lineage is filterable using the `--select` and `--exclude` flags, which are consistent with the semantics of [model selection syntax](/reference/node-selection/syntax). Further, you can right-click to interact with the DAG, jump to documentation, or share links to your graph visualization with your coworkers. - + ## Deploying the documentation site diff --git a/website/docs/docs/collaborate/git/managed-repository.md b/website/docs/docs/collaborate/git/managed-repository.md index 6112b84d4c6..db8e9840ccd 100644 --- a/website/docs/docs/collaborate/git/managed-repository.md +++ b/website/docs/docs/collaborate/git/managed-repository.md @@ -13,7 +13,7 @@ To set up a project with a managed repository: 4. Select **Managed**. 5. Enter a name for the repository. For example, "analytics" or "dbt-models." 6. Click **Create**. - + dbt Cloud will host and manage this repository for you. If in the future you choose to host this repository elsewhere, you can export the information from dbt Cloud at any time. diff --git a/website/docs/docs/collaborate/git/merge-conflicts.md b/website/docs/docs/collaborate/git/merge-conflicts.md index 133a096da9c..c3c19b1e2a1 100644 --- a/website/docs/docs/collaborate/git/merge-conflicts.md +++ b/website/docs/docs/collaborate/git/merge-conflicts.md @@ -35,9 +35,9 @@ The dbt Cloud IDE will display: - The file name colored in red in the **Changes** section, with a warning icon. - If you press commit without resolving the conflict, the dbt Cloud IDE will prompt a pop up box with a list which files need to be resolved. - + - + ## Resolve merge conflicts @@ -51,7 +51,7 @@ You can seamlessly resolve merge conflicts that involve competing line changes i 6. Repeat this process for every file that has a merge conflict. - + :::info Edit conflict files - If you open the conflict file under **Changes**, the file name will display something like `model.sql (last commit)` and is fully read-only and cannot be edited.
@@ -67,6 +67,6 @@ When you've resolved all the merge conflicts, the last step would be to commit t 3. The dbt Cloud IDE will return to its normal state and you can continue developing! - + - + diff --git a/website/docs/docs/collaborate/git/pr-template.md b/website/docs/docs/collaborate/git/pr-template.md index b85aa8a0d51..ddb4948dad9 100644 --- a/website/docs/docs/collaborate/git/pr-template.md +++ b/website/docs/docs/collaborate/git/pr-template.md @@ -9,7 +9,7 @@ open a new Pull Request for the code changes. To enable this functionality, ensu that a PR Template URL is configured in the Repository details page in your Account Settings. If this setting is blank, the IDE will prompt users to merge the changes directly into their default branch. - + ### PR Template URL by git provider diff --git a/website/docs/docs/collaborate/model-performance.md b/website/docs/docs/collaborate/model-performance.md index aeb18090751..7ef675b4e1e 100644 --- a/website/docs/docs/collaborate/model-performance.md +++ b/website/docs/docs/collaborate/model-performance.md @@ -27,7 +27,7 @@ Each data point links to individual models in Explorer. You can view historical metadata for up to the past three months. Select the time horizon using the filter, which defaults to a two-week lookback. - + ## The Model performance tab @@ -38,4 +38,4 @@ You can view trends in execution times, counts, and failures by using the Model Clicking on a data point reveals a table listing all job runs for that day, with each row providing a direct link to the details of a specific run. - \ No newline at end of file + \ No newline at end of file diff --git a/website/docs/docs/dbt-cloud-apis/service-tokens.md b/website/docs/docs/dbt-cloud-apis/service-tokens.md index a5a8a6c4807..b0b5fbd6cfe 100644 --- a/website/docs/docs/dbt-cloud-apis/service-tokens.md +++ b/website/docs/docs/dbt-cloud-apis/service-tokens.md @@ -110,7 +110,7 @@ On July 18, 2023, dbt Labs made critical infrastructure changes to service accou To rotate your token: 1. Navigate to **Account settings** and click **Service tokens** on the left side pane. 2. Verify the **Created** date for the token is _on or before_ July 18, 2023. - + 3. Click **+ New Token** on the top right side of the screen. Ensure the new token has the same permissions as the old one. 4. Copy the new token and replace the old one in your systems. Store it in a safe place, as it will not be available again once the creation screen is closed. 5. Delete the old token in dbt Cloud by clicking the **trash can icon**. _Only take this action after the new token is in place to avoid service disruptions_. diff --git a/website/docs/docs/dbt-cloud-apis/user-tokens.md b/website/docs/docs/dbt-cloud-apis/user-tokens.md index 5734f8ba35a..77e536b12a5 100644 --- a/website/docs/docs/dbt-cloud-apis/user-tokens.md +++ b/website/docs/docs/dbt-cloud-apis/user-tokens.md @@ -14,7 +14,7 @@ permissions of the user the that they were created for. You can find your User API token in the Profile page under the `API Access` label. - + ## FAQs diff --git a/website/docs/docs/dbt-versions/release-notes/77-Sept-2023/removing-prerelease-versions.md b/website/docs/docs/dbt-versions/release-notes/77-Sept-2023/removing-prerelease-versions.md index dc2cdb63748..0b588376c34 100644 --- a/website/docs/docs/dbt-versions/release-notes/77-Sept-2023/removing-prerelease-versions.md +++ b/website/docs/docs/dbt-versions/release-notes/77-Sept-2023/removing-prerelease-versions.md @@ -12,4 +12,4 @@ Previously, when dbt Labs released a new [version](/docs/dbt-versions/core#how-d To see which version you are currently using and to upgrade, select **Deploy** in the top navigation bar and select **Environments**. Choose the preferred environment and click **Settings**. Click **Edit** to make a change to the current dbt version. dbt Labs recommends always using the latest version whenever possible to take advantage of new features and functionality. - \ No newline at end of file + \ No newline at end of file diff --git a/website/docs/docs/dbt-versions/release-notes/81-May-2023/run-details-and-logs-improvements.md b/website/docs/docs/dbt-versions/release-notes/81-May-2023/run-details-and-logs-improvements.md index 38b017baa30..1aabe517076 100644 --- a/website/docs/docs/dbt-versions/release-notes/81-May-2023/run-details-and-logs-improvements.md +++ b/website/docs/docs/dbt-versions/release-notes/81-May-2023/run-details-and-logs-improvements.md @@ -16,4 +16,4 @@ Highlights include: - Cleaner look and feel with iconography - Helpful tool tips - + diff --git a/website/docs/docs/dbt-versions/release-notes/81-May-2023/run-history-improvements.md b/website/docs/docs/dbt-versions/release-notes/81-May-2023/run-history-improvements.md index 0bc4b76d0fc..d4d299b1d36 100644 --- a/website/docs/docs/dbt-versions/release-notes/81-May-2023/run-history-improvements.md +++ b/website/docs/docs/dbt-versions/release-notes/81-May-2023/run-history-improvements.md @@ -8,7 +8,7 @@ tags: [May-2023, Scheduler] New usability and design improvements to the **Run History** dashboard in dbt Cloud are now available. These updates allow people to discover the information they need more easily by reducing the number of clicks, surfacing more relevant information, keeping people in flow state, and designing the look and feel to be more intuitive to use. - + Highlights include: diff --git a/website/docs/docs/dbt-versions/release-notes/86-Dec-2022/new-jobs-default-as-off.md b/website/docs/docs/dbt-versions/release-notes/86-Dec-2022/new-jobs-default-as-off.md index 9ceda7749cd..bdc89b4abde 100644 --- a/website/docs/docs/dbt-versions/release-notes/86-Dec-2022/new-jobs-default-as-off.md +++ b/website/docs/docs/dbt-versions/release-notes/86-Dec-2022/new-jobs-default-as-off.md @@ -10,5 +10,5 @@ To help save compute time, new jobs will no longer be triggered to run by defaul For more information, refer to [Deploy jobs](/docs/deploy/deploy-jobs). - + diff --git a/website/docs/docs/dbt-versions/release-notes/92-July-2022/render-lineage-feature.md b/website/docs/docs/dbt-versions/release-notes/92-July-2022/render-lineage-feature.md index 41e1a5265ca..2d0488d4488 100644 --- a/website/docs/docs/dbt-versions/release-notes/92-July-2022/render-lineage-feature.md +++ b/website/docs/docs/dbt-versions/release-notes/92-July-2022/render-lineage-feature.md @@ -13,4 +13,4 @@ Large DAGs can take a long time (10 or more seconds, if not minutes) to render a The new button prevents large DAGs from rendering automatically. Instead, you can select **Render Lineage** to load the visualization. This should affect about 15% of the DAGs. - + diff --git a/website/docs/docs/dbt-versions/release-notes/95-March-2022/ide-timeout-message.md b/website/docs/docs/dbt-versions/release-notes/95-March-2022/ide-timeout-message.md index 90e6ac72fea..307786c6b85 100644 --- a/website/docs/docs/dbt-versions/release-notes/95-March-2022/ide-timeout-message.md +++ b/website/docs/docs/dbt-versions/release-notes/95-March-2022/ide-timeout-message.md @@ -10,4 +10,4 @@ We fixed an issue where a spotty internet connection could cause the “IDE sess We updated the health check logic so it now excludes client-side connectivity issues from the IDE session check. If you lose your internet connection, we no longer update the health-check state. Now, losing internet connectivity will no longer cause this unexpected message. - + diff --git a/website/docs/docs/dbt-versions/release-notes/95-March-2022/prep-and-waiting-time.md b/website/docs/docs/dbt-versions/release-notes/95-March-2022/prep-and-waiting-time.md index 46c1f4bbd15..9ff5986b4da 100644 --- a/website/docs/docs/dbt-versions/release-notes/95-March-2022/prep-and-waiting-time.md +++ b/website/docs/docs/dbt-versions/release-notes/95-March-2022/prep-and-waiting-time.md @@ -9,4 +9,4 @@ tags: [v1.1.46, March-02-2022] dbt Cloud now shows "waiting time" and "prep time" for a run, which used to be expressed in aggregate as "queue time". Waiting time captures the time dbt Cloud waits to run your job if there isn't an available run slot or if a previous run of the same job is still running. Prep time represents the time it takes dbt Cloud to ready your job to run in your cloud data warehouse. - + diff --git a/website/docs/docs/dbt-versions/upgrade-core-in-cloud.md b/website/docs/docs/dbt-versions/upgrade-core-in-cloud.md index 75697d32d17..052611f66e6 100644 --- a/website/docs/docs/dbt-versions/upgrade-core-in-cloud.md +++ b/website/docs/docs/dbt-versions/upgrade-core-in-cloud.md @@ -9,7 +9,7 @@ In dbt Cloud, both jobs and environments are configured to use a specific versio Navigate to the settings page of an environment, then click **edit**. Click the **dbt Version** dropdown bar and make your selection. From this list, you can select an available version of Core to associate with this environment. - + Be sure to save your changes before navigating away. @@ -17,7 +17,7 @@ Be sure to save your changes before navigating away. Each job in dbt Cloud can be configured to inherit parameters from the environment it belongs to. - + The example job seen in the screenshot above belongs to the environment "Prod". It inherits the dbt version of its environment as shown by the **Inherited from ENVIRONMENT_NAME (DBT_VERSION)** selection. You may also manually override the dbt version of a specific job to be any of the current Core releases supported by Cloud by selecting another option from the dropdown. @@ -275,7 +275,7 @@ Once you have your project compiling and running on the latest version of dbt in - + Then add a job to the new testing environment that replicates one of the production jobs your team relies on. If that job runs smoothly, you should be all set to merge your branch into main and change your development and deployment environments in your main dbt project to run off the newest version of dbt Core. diff --git a/website/docs/docs/deploy/artifacts.md b/website/docs/docs/deploy/artifacts.md index 7ecc05355a0..9b3ae71e79c 100644 --- a/website/docs/docs/deploy/artifacts.md +++ b/website/docs/docs/deploy/artifacts.md @@ -10,11 +10,11 @@ When running dbt jobs, dbt Cloud generates and saves *artifacts*. You can use th While running any job can produce artifacts, you should only associate one production job with a given project to produce the project's artifacts. You can designate this connection in the **Project details** page. To access this page, click the gear icon in the upper right, select **Account Settings**, select your project, and click **Edit** in the lower right. Under **Artifacts**, select the jobs you want to produce documentation and source freshness artifacts for. - + If you don't see your job listed, you might need to edit the job and select **Run source freshness** and **Generate docs on run**. - + When you add a production job to a project, dbt Cloud updates the content and provides links to the production documentation and source freshness artifacts it generated for that project. You can see these links by clicking **Deploy** in the upper left, selecting **Jobs**, and then selecting the production job. From the job page, you can select a specific run to see how artifacts were updated for that run only. @@ -25,10 +25,10 @@ When set up, dbt Cloud updates the **Documentation** link in the header tab so i Note that both the job's commands and the docs generate step (triggered by the **Generate docs on run** checkbox) must succeed during the job invocation for the project-level documentation to be populated or updated. - + ### Source Freshness As with Documentation, configuring a job for the Source Freshness artifact setting also updates the Data Sources link under **Deploy**. The new link points to the latest Source Freshness report for the selected job. - + diff --git a/website/docs/docs/deploy/ci-jobs.md b/website/docs/docs/deploy/ci-jobs.md index 9f0bafddaef..149a6951fdc 100644 --- a/website/docs/docs/deploy/ci-jobs.md +++ b/website/docs/docs/deploy/ci-jobs.md @@ -117,10 +117,10 @@ If you're experiencing any issues, review some of the common questions and answe First, make sure you have the native GitHub authentication, native GitLab authentication, or native Azure DevOps authentication set up depending on which git provider you use. After you have gone through those steps, go to Account Settings, select Projects and click on the project you'd like to reconnect through native GitHub, GitLab, or Azure DevOps auth. Then click on the repository link.



Once you're in the repository page, select Edit and then Disconnect Repository at the bottom.

- +

Confirm that you'd like to disconnect your repository. You should then see a new Configure a repository link in your old repository's place. Click through to the configuration page:

- +

Select the GitHub, GitLab, or AzureDevOps tab and reselect your repository. That should complete the setup of the project and enable you to set up a dbt Cloud CI job. diff --git a/website/docs/docs/deploy/dashboard-status-tiles.md b/website/docs/docs/deploy/dashboard-status-tiles.md index 4da0f859546..d9e33fc32d6 100644 --- a/website/docs/docs/deploy/dashboard-status-tiles.md +++ b/website/docs/docs/deploy/dashboard-status-tiles.md @@ -9,11 +9,11 @@ In dbt Cloud, the [Discovery API](/docs/dbt-cloud-apis/discovery-api) can power ## Functionality The dashboard status tile looks like this: - + The data freshness check fails if any sources feeding into the exposure are stale. The data quality check fails if any dbt tests fail. A failure state could look like this: - + Clicking into **see details** from the Dashboard Status Tile takes you to a landing page where you can learn more about the specific sources, models, and tests feeding into this exposure. @@ -56,11 +56,11 @@ Note that Mode has also built its own [integration](https://mode.com/get-dbt/) w Looker does not allow you to directly embed HTML and instead requires creating a [custom visualization](https://docs.looker.com/admin-options/platform/visualizations). One way to do this for admins is to: - Add a [new visualization](https://fishtown.looker.com/admin/visualizations) on the visualization page for Looker admins. You can use [this URL](https://metadata.cloud.getdbt.com/static/looker-viz.js) to configure a Looker visualization powered by the iFrame. It will look like this: - + - Once you have set up your custom visualization, you can use it on any dashboard! You can configure it with the exposure name, jobID, and token relevant to that dashboard. - + ### Tableau Tableau does not require you to embed an iFrame. You only need to use a Web Page object on your Tableau Dashboard and a URL in the following format: @@ -79,7 +79,7 @@ https://metadata.cloud.getdbt.com/exposure-tile?name=&jobId= + ### Sigma @@ -99,4 +99,4 @@ https://metadata.au.dbt.com/exposure-tile?name=&jobId=&to ``` ::: - + diff --git a/website/docs/docs/deploy/deploy-jobs.md b/website/docs/docs/deploy/deploy-jobs.md index 90ba0c7796c..cee6e245359 100644 --- a/website/docs/docs/deploy/deploy-jobs.md +++ b/website/docs/docs/deploy/deploy-jobs.md @@ -80,7 +80,7 @@ dbt Cloud uses [Coordinated Universal Time](https://en.wikipedia.org/wiki/Coordi To fully customize the scheduling of your job, choose the **Custom cron schedule** option and use the cron syntax. With this syntax, you can specify the minute, hour, day of the month, month, and day of the week, allowing you to set up complex schedules like running a job on the first Monday of each month. - + Use tools such as [crontab.guru](https://crontab.guru/) to generate the correct cron syntax. This tool allows you to input cron snippets and returns their plain English translations. diff --git a/website/docs/docs/deploy/deployment-tools.md b/website/docs/docs/deploy/deployment-tools.md index 64fcb1dadae..cca2368f38a 100644 --- a/website/docs/docs/deploy/deployment-tools.md +++ b/website/docs/docs/deploy/deployment-tools.md @@ -19,8 +19,8 @@ If your organization is using [Airflow](https://airflow.apache.org/), there are Installing the [dbt Cloud Provider](https://airflow.apache.org/docs/apache-airflow-providers-dbt-cloud/stable/index.html) to orchestrate dbt Cloud jobs. This package contains multiple Hooks, Operators, and Sensors to complete various actions within dbt Cloud. - - + + @@ -71,7 +71,7 @@ If your organization is using [Prefect](https://www.prefect.io/), the way you wi - As jobs are executing, you can poll dbt to see whether or not the job completes without failures, through the [Prefect user interface (UI)](https://docs.prefect.io/ui/overview/). - + diff --git a/website/docs/docs/deploy/source-freshness.md b/website/docs/docs/deploy/source-freshness.md index 3c4866cd084..2f9fe6bc007 100644 --- a/website/docs/docs/deploy/source-freshness.md +++ b/website/docs/docs/deploy/source-freshness.md @@ -6,7 +6,7 @@ description: "Validate that data freshness meets expectations and alert if stale dbt Cloud provides a helpful interface around dbt's [source data freshness](/docs/build/sources#snapshotting-source-data-freshness) calculations. When a dbt Cloud job is configured to snapshot source data freshness, dbt Cloud will render a user interface showing you the state of the most recent snapshot. This interface is intended to help you determine if your source data freshness is meeting the service level agreement (SLA) that you've defined for your organization. - + ### Enabling source freshness snapshots @@ -15,7 +15,7 @@ dbt Cloud provides a helpful interface around dbt's [source data freshness](/doc - Select the **Generate docs on run** checkbox to automatically [generate project docs](/docs/collaborate/build-and-view-your-docs#set-up-a-documentation-job). - Select the **Run source freshness** checkbox to enable [source freshness](#checkbox) as the first step of the job. - + To enable source freshness snapshots, firstly make sure to configure your sources to [snapshot freshness information](/docs/build/sources#snapshotting-source-data-freshness). You can add source freshness to the list of commands in the job run steps or enable the checkbox. However, you can expect different outcomes when you configure a job by selecting the **Run source freshness** checkbox compared to adding the command to the run steps. @@ -27,7 +27,7 @@ Review the following options and outcomes: | **Add as a run step** | Add the `dbt source freshness` command to a job anywhere in your list of run steps. However, if your source data is out of date — this step will "fail", and subsequent steps will not run. dbt Cloud will trigger email notifications (if configured) based on the end state of this step.

You can create a new job to snapshot source freshness.

If you *do not* want your models to run if your source data is out of date, then it could be a good idea to run `dbt source freshness` as the first step in your job. Otherwise, we recommend adding `dbt source freshness` as the last step in the job, or creating a separate job just for this task. | - + ### Source freshness snapshot frequency diff --git a/website/docs/docs/running-a-dbt-project/using-the-dbt-ide.md b/website/docs/docs/running-a-dbt-project/using-the-dbt-ide.md index e6a50443837..f41bceab12d 100644 --- a/website/docs/docs/running-a-dbt-project/using-the-dbt-ide.md +++ b/website/docs/docs/running-a-dbt-project/using-the-dbt-ide.md @@ -16,11 +16,11 @@ New dbt Cloud accounts will automatically be created with a Development Environm To create a development environment, choose **Deploy** > **Environments** from the top left. Then, click **Create Environment**. - + Enter an environment **Name** that would help you identify it among your other environments (for example, `Nate's Development Environment`). Choose **Development** as the **Environment Type**. You can also select which **dbt Version** to use at this time. For compatibility reasons, we recommend that you select the same dbt version that you plan to use in your deployment environment. Finally, click **Save** to finish creating your development environment. - + ### Setting up developer credentials @@ -28,14 +28,14 @@ The IDE uses *developer credentials* to connect to your database. These develope New dbt Cloud accounts should have developer credentials created automatically as a part of Project creation in the initial application setup. - + New users on existing accounts *might not* have their development credentials already configured. To manage your development credentials: 1. Navigate to your **Credentials** under **Your Profile** settings, which you can access at `https://YOUR_ACCESS_URL/settings/profile#credentials`, replacing `YOUR_ACCESS_URL` with the [appropriate Access URL](/docs/cloud/about-cloud/regions-ip-addresses) for your region and plan. 2. Select the relevant project in the list. After entering your developer credentials, you'll be able to access the dbt IDE. - + ### Compiling and running SQL diff --git a/website/docs/faqs/API/rotate-token.md b/website/docs/faqs/API/rotate-token.md index 0b808fa9176..4470de72d5a 100644 --- a/website/docs/faqs/API/rotate-token.md +++ b/website/docs/faqs/API/rotate-token.md @@ -19,7 +19,7 @@ To automatically rotate your API key: 2. Select **API Access** from the lefthand side. 3. In the **API** pane, click `Rotate`. - + diff --git a/website/docs/faqs/Accounts/change-users-license.md b/website/docs/faqs/Accounts/change-users-license.md index ed12ba5dc14..8755b946126 100644 --- a/website/docs/faqs/Accounts/change-users-license.md +++ b/website/docs/faqs/Accounts/change-users-license.md @@ -10,10 +10,10 @@ To change the license type for a user from `developer` to `read-only` or `IT` in 1. From dbt Cloud, click the gear icon at the top right and select **Account Settings**. - + 2. In **Account Settings**, select **Users** under **Teams**. 3. Select the user you want to remove, and click **Edit** in the bottom of their profile. 4. For the **License** option, choose **Read-only** or **IT** (from **Developer**), and click **Save**. - + diff --git a/website/docs/faqs/Accounts/cloud-upgrade-instructions.md b/website/docs/faqs/Accounts/cloud-upgrade-instructions.md index ef2ff8e4cd3..d16651a944c 100644 --- a/website/docs/faqs/Accounts/cloud-upgrade-instructions.md +++ b/website/docs/faqs/Accounts/cloud-upgrade-instructions.md @@ -32,7 +32,7 @@ To unlock your account and select a plan, review the following guidance per plan 3. Confirm your plan selection on the pop up message. 4. This automatically unlocks your dbt Cloud account, and you can now enjoy the benefits of the Developer plan. 🎉 - + ### Team plan @@ -42,7 +42,7 @@ To unlock your account and select a plan, review the following guidance per plan 4. Enter your payment details and click **Save**. 5. This automatically unlocks your dbt Cloud account, and you can now enjoy the benefits of the Team plan. 🎉 - + ### Enterprise plan @@ -50,7 +50,7 @@ To unlock your account and select a plan, review the following guidance per plan 2. Click **Contact Sales** on the right. This opens a chat window for you to contact the dbt Cloud Support team, who will connect you to our Sales team. 3. Once you submit your request, our Sales team will contact you with more information. - + 4. Alternatively, you can [contact](https://www.getdbt.com/contact/) our Sales team directly to chat about how dbt Cloud can help you and your team. diff --git a/website/docs/faqs/Git/git-migration.md b/website/docs/faqs/Git/git-migration.md index 775ae3679e3..156227d59ae 100644 --- a/website/docs/faqs/Git/git-migration.md +++ b/website/docs/faqs/Git/git-migration.md @@ -16,7 +16,7 @@ To migrate from one git provider to another, refer to the following steps to avo 2. Go back to dbt Cloud and set up your [integration for the new git provider](/docs/cloud/git/connect-github), if needed. 3. Disconnect the old repository in dbt Cloud by going to **Account Settings** and then **Projects**. Click on the **Repository** link, then click **Edit** and **Disconnect**. - + 4. On the same page, connect to the new git provider repository by clicking **Configure Repository** - If you're using the native integration, you may need to OAuth to it. diff --git a/website/docs/faqs/Project/delete-a-project.md b/website/docs/faqs/Project/delete-a-project.md index 21f16cbfaec..5fde3fee9cd 100644 --- a/website/docs/faqs/Project/delete-a-project.md +++ b/website/docs/faqs/Project/delete-a-project.md @@ -9,10 +9,10 @@ To delete a project in dbt Cloud, you must be the account owner or have admin pr 1. From dbt Cloud, click the gear icon at the top right corner and select **Account Settings**. - + 2. In **Account Settings**, select **Projects**. Click the project you want to delete from the **Projects** page. 3. Click the edit icon in the lower right-hand corner of the **Project Details**. A **Delete** option will appear on the left side of the same details view. 4. Select **Delete**. Confirm the action to immediately delete the user without additional password prompts. There will be no account password prompt, and the project is deleted immediately after confirmation. Once a project is deleted, this action cannot be undone. - + diff --git a/website/docs/guides/adapter-creation.md b/website/docs/guides/adapter-creation.md index 12bda4726f9..28e0e8253ad 100644 --- a/website/docs/guides/adapter-creation.md +++ b/website/docs/guides/adapter-creation.md @@ -107,7 +107,7 @@ A set of *materializations* and their corresponding helper macros defined in dbt Below is a diagram of how dbt-postgres, the adapter at the center of dbt-core, works. - + ## Prerequisites @@ -1225,17 +1225,17 @@ This can vary substantially depending on the nature of the release but a good ba Breaking this down: - Visually distinctive announcement - make it clear this is a release - + - Short written description of what is in the release - + - Links to additional resources - + - Implementation instructions: - + - Future plans - + - Contributor recognition (if applicable) - + ## Verify a new adapter diff --git a/website/docs/guides/bigquery-qs.md b/website/docs/guides/bigquery-qs.md index d961a27018a..4f461a3cf3a 100644 --- a/website/docs/guides/bigquery-qs.md +++ b/website/docs/guides/bigquery-qs.md @@ -56,7 +56,7 @@ You can check out [dbt Fundamentals](https://courses.getdbt.com/courses/fundamen Click **Run**, then check for results from the queries. For example:
- +
2. Create new datasets from the [BigQuery Console](https://console.cloud.google.com/bigquery). For more information, refer to [Create datasets](https://cloud.google.com/bigquery/docs/datasets#create-dataset) in the Google Cloud docs. Datasets in BigQuery are equivalent to schemas in a traditional database. On the **Create dataset** page: - **Dataset ID** — Enter a name that fits the purpose. This name is used like schema in fully qualified references to your database objects such as `database.schema.table`. As an example for this guide, create one for `jaffle_shop` and another one for `stripe` afterward. @@ -64,7 +64,7 @@ You can check out [dbt Fundamentals](https://courses.getdbt.com/courses/fundamen - **Enable table expiration** — Leave it unselected (the default). The default for the billing table expiration is 60 days. Because billing isn’t enabled for this project, GCP defaults to deprecating tables. - **Google-managed encryption key** — This option is available under **Advanced options**. Allow Google to manage encryption (the default).
- +
3. After you create the `jaffle_shop` dataset, create one for `stripe` with all the same values except for **Dataset ID**. diff --git a/website/docs/guides/codespace-qs.md b/website/docs/guides/codespace-qs.md index c399eb494a9..b28b0ddaacf 100644 --- a/website/docs/guides/codespace-qs.md +++ b/website/docs/guides/codespace-qs.md @@ -35,7 +35,7 @@ dbt Labs provides a [GitHub Codespace](https://docs.github.com/en/codespaces/ove 1. Click **Code** (at the top of the new repository’s page). Under the **Codespaces** tab, choose **Create codespace on main**. Depending on how you've configured your computer's settings, this either opens a new browser tab with the Codespace development environment with VSCode running in it or opens a new VSCode window with the codespace in it. 1. Wait for the codespace to finish building by waiting for the `postCreateCommand` command to complete; this can take several minutes: - + When this command completes, you can start using the codespace development environment. The terminal the command ran in will close and you will get a prompt in a brand new terminal. diff --git a/website/docs/guides/custom-cicd-pipelines.md b/website/docs/guides/custom-cicd-pipelines.md index b21fe13b19b..1778098f752 100644 --- a/website/docs/guides/custom-cicd-pipelines.md +++ b/website/docs/guides/custom-cicd-pipelines.md @@ -144,7 +144,7 @@ In Azure: - Click *OK* and then *Save* to save the variable - Save your new Azure pipeline - + diff --git a/website/docs/guides/databricks-qs.md b/website/docs/guides/databricks-qs.md index 98c215382f6..cb01daec394 100644 --- a/website/docs/guides/databricks-qs.md +++ b/website/docs/guides/databricks-qs.md @@ -41,7 +41,7 @@ You can check out [dbt Fundamentals](https://courses.getdbt.com/courses/fundamen 1. Use your existing account or sign up for a Databricks account at [Try Databricks](https://databricks.com/). Complete the form with your user information.
- +
2. For the purpose of this tutorial, you will be selecting AWS as our cloud provider but if you use Azure or GCP internally, please choose one of them. The setup process will be similar. @@ -49,28 +49,28 @@ You can check out [dbt Fundamentals](https://courses.getdbt.com/courses/fundamen 4. After setting up your password, you will be guided to choose a subscription plan. Select the `Premium` or `Enterprise` plan to access the SQL Compute functionality required for using the SQL warehouse for dbt. We have chosen `Premium` for this tutorial. Click **Continue** after selecting your plan.
- +
5. Click **Get Started** when you come to this below page and then **Confirm** after you validate that you have everything needed.
- +
- +
6. Now it's time to create your first workspace. A Databricks workspace is an environment for accessing all of your Databricks assets. The workspace organizes objects like notebooks, SQL warehouses, clusters, etc into one place. Provide the name of your workspace and choose the appropriate AWS region and click **Start Quickstart**. You might get the checkbox of **I have data in S3 that I want to query with Databricks**. You do not need to check this off for the purpose of this tutorial.
- +
7. By clicking on `Start Quickstart`, you will be redirected to AWS and asked to log in if you haven’t already. After logging in, you should see a page similar to this.
- +
:::tip @@ -79,16 +79,16 @@ If you get a session error and don’t get redirected to this page, you can go b 8. There is no need to change any of the pre-filled out fields in the Parameters. Just add in your Databricks password under **Databricks Account Credentials**. Check off the Acknowledgement and click **Create stack**.
- +
- +
10. Go back to the Databricks tab. You should see that your workspace is ready to use.
- +
11. Now let’s jump into the workspace. Click **Open** and log into the workspace using the same login as you used to log into the account. @@ -101,7 +101,7 @@ If you get a session error and don’t get redirected to this page, you can go b 2. First we need a SQL warehouse. Find the drop down menu and toggle into the SQL space.
- +
3. We will be setting up a SQL warehouse now. Select **SQL Warehouses** from the left hand side console. You will see that a default SQL Warehouse exists. @@ -109,12 +109,12 @@ If you get a session error and don’t get redirected to this page, you can go b 5. Once the SQL Warehouse is up, click **New** and then **File upload** on the dropdown menu.
- +
6. Let's load the Jaffle Shop Customers data first. Drop in the `jaffle_shop_customers.csv` file into the UI.
- +
7. Update the Table Attributes at the top: @@ -128,7 +128,7 @@ If you get a session error and don’t get redirected to this page, you can go b - LAST_NAME = string
- +
8. Click **Create** on the bottom once you’re done. @@ -136,11 +136,11 @@ If you get a session error and don’t get redirected to this page, you can go b 9. Now let’s do the same for `Jaffle Shop Orders` and `Stripe Payments`.
- +
- +
10. Once that's done, make sure you can query the training data. Navigate to the `SQL Editor` through the left hand menu. This will bring you to a query editor. @@ -153,7 +153,7 @@ If you get a session error and don’t get redirected to this page, you can go b ```
- +
12. To ensure any users who might be working on your dbt project has access to your object, run this command. diff --git a/website/docs/guides/dbt-python-snowpark.md b/website/docs/guides/dbt-python-snowpark.md index fce0ad692f6..110445344e9 100644 --- a/website/docs/guides/dbt-python-snowpark.md +++ b/website/docs/guides/dbt-python-snowpark.md @@ -51,19 +51,19 @@ Overall we are going to set up the environments, build scalable pipelines in dbt 1. Log in to your trial Snowflake account. You can [sign up for a Snowflake Trial Account using this form](https://signup.snowflake.com/) if you don’t have one. 2. Ensure that your account is set up using **AWS** in the **US East (N. Virginia)**. We will be copying the data from a public AWS S3 bucket hosted by dbt Labs in the us-east-1 region. By ensuring our Snowflake environment setup matches our bucket region, we avoid any multi-region data copy and retrieval latency issues. - + 3. After creating your account and verifying it from your sign-up email, Snowflake will direct you back to the UI called Snowsight. 4. When Snowsight first opens, your window should look like the following, with you logged in as the ACCOUNTADMIN with demo worksheets open: - + 5. Navigate to **Admin > Billing & Terms**. Click **Enable > Acknowledge & Continue** to enable Anaconda Python Packages to run in Snowflake. - + - + 6. Finally, create a new Worksheet by selecting **+ Worksheet** in the upper right corner. @@ -80,7 +80,7 @@ We need to obtain our data source by copying our Formula 1 data into Snowflake t 3. Rename the worksheet to `data setup script` since we will be placing code in this worksheet to ingest the Formula 1 data. Make sure you are still logged in as the **ACCOUNTADMIN** and select the **COMPUTE_WH** warehouse. - + 4. Copy the following code into the main body of the Snowflake worksheet. You can also find this setup script under the `setup` folder in the [Git repository](https://github.com/dbt-labs/python-snowpark-formula1/blob/main/setup/setup_script_s3_to_snowflake.sql). The script is long since it's bring in all of the data we'll need today! @@ -233,7 +233,7 @@ We need to obtain our data source by copying our Formula 1 data into Snowflake t 5. Ensure all the commands are selected before running the query — an easy way to do this is to use Ctrl-a to highlight all of the code in the worksheet. Select **run** (blue triangle icon). Notice how the dot next to your **COMPUTE_WH** turns from gray to green as you run the query. The **status** table is the final table of all 8 tables loaded in. - + 6. Let’s unpack that pretty long query we ran into component parts. We ran this query to load in our 8 Formula 1 tables from a public S3 bucket. To do this, we: - Created a new database called `formula1` and a schema called `raw` to place our raw (untransformed) data into. @@ -244,7 +244,7 @@ We need to obtain our data source by copying our Formula 1 data into Snowflake t 7. Now let's take a look at some of our cool Formula 1 data we just loaded up! 1. Create a new worksheet by selecting the **+** then **New Worksheet**. - + 2. Navigate to **Database > Formula1 > RAW > Tables**. 3. Query the data using the following code. There are only 76 rows in the circuits table, so we don’t need to worry about limiting the amount of data we query. @@ -256,7 +256,7 @@ We need to obtain our data source by copying our Formula 1 data into Snowflake t 5. Review the query results, you should see information about Formula 1 circuits, starting with Albert Park in Australia! 6. Finally, ensure you have all 8 tables starting with `CIRCUITS` and ending with `STATUS`. Now we are ready to connect into dbt Cloud! - + ## Configure dbt Cloud @@ -264,19 +264,19 @@ We need to obtain our data source by copying our Formula 1 data into Snowflake t 2. Navigate out of your worksheet back by selecting **home**. 3. In Snowsight, confirm that you are using the **ACCOUNTADMIN** role. 4. Navigate to the **Admin** **> Partner Connect**. Find **dbt** either by using the search bar or navigating the **Data Integration**. Select the **dbt** tile. - + 5. You should now see a new window that says **Connect to dbt**. Select **Optional Grant** and add the `FORMULA1` database. This will grant access for your new dbt user role to the FORMULA1 database. - + 6. Ensure the `FORMULA1` is present in your optional grant before clicking **Connect**.  This will create a dedicated dbt user, database, warehouse, and role for your dbt Cloud trial. - + 7. When you see the **Your partner account has been created** window, click **Activate**. 8. You should be redirected to a dbt Cloud registration page. Fill out the form. Make sure to save the password somewhere for login in the future. - + 9. Select **Complete Registration**. You should now be redirected to your dbt Cloud account, complete with a connection to your Snowflake account, a deployment and a development environment, and a sample job. @@ -286,43 +286,43 @@ We need to obtain our data source by copying our Formula 1 data into Snowflake t 1. First we are going to change the name of our default schema to where our dbt models will build. By default, the name is `dbt_`. We will change this to `dbt_` to create your own personal development schema. To do this, select **Profile Settings** from the gear icon in the upper right. - + 2. Navigate to the **Credentials** menu and select **Partner Connect Trial**, which will expand the credentials menu. - + 3. Click **Edit** and change the name of your schema from `dbt_` to `dbt_YOUR_NAME` replacing `YOUR_NAME` with your initials and name (`hwatson` is used in the lab screenshots). Be sure to click **Save** for your changes! - + 4. We now have our own personal development schema, amazing! When we run our first dbt models they will build into this schema. 5. Let’s open up dbt Cloud’s Integrated Development Environment (IDE) and familiarize ourselves. Choose **Develop** at the top of the UI. 6. When the IDE is done loading, click **Initialize dbt project**. The initialization process creates a collection of files and folders necessary to run your dbt project. - + 7. After the initialization is finished, you can view the files and folders in the file tree menu. As we move through the workshop we'll be sure to touch on a few key files and folders that we'll work with to build out our project. 8. Next click **Commit and push** to commit the new files and folders from the initialize step. We always want our commit messages to be relevant to the work we're committing, so be sure to provide a message like `initialize project` and select **Commit Changes**. - + - + 9. [Committing](https://www.atlassian.com/git/tutorials/saving-changes/git-commit) your work here will save it to the managed git repository that was created during the Partner Connect signup. This initial commit is the only commit that will be made directly to our `main` branch and from *here on out we'll be doing all of our work on a development branch*. This allows us to keep our development work separate from our production code. 10. There are a couple of key features to point out about the IDE before we get to work. It is a text editor, an SQL and Python runner, and a CLI with Git version control all baked into one package! This allows you to focus on editing your SQL and Python files, previewing the results with the SQL runner (it even runs Jinja!), and building models at the command line without having to move between different applications. The Git workflow in dbt Cloud allows both Git beginners and experts alike to be able to easily version control all of their work with a couple clicks. - + 11. Let's run our first dbt models! Two example models are included in your dbt project in the `models/examples` folder that we can use to illustrate how to run dbt at the command line. Type `dbt run` into the command line and click **Enter** on your keyboard. When the run bar expands you'll be able to see the results of the run, where you should see the run complete successfully. - + 12. The run results allow you to see the code that dbt compiles and sends to Snowflake for execution. To view the logs for this run, select one of the model tabs using the  **>** icon and then **Details**. If you scroll down a bit you'll be able to see the compiled code and how dbt interacts with Snowflake. Given that this run took place in our development environment, the models were created in your development schema. - + 13. Now let's switch over to Snowflake to confirm that the objects were actually created. Click on the three dots **…** above your database objects and then **Refresh**. Expand the **PC_DBT_DB** database and you should see your development schema. Select the schema, then **Tables**  and **Views**. Now you should be able to see `MY_FIRST_DBT_MODEL` as a table and `MY_SECOND_DBT_MODEL` as a view. - + ## Create branch and set up project configs @@ -414,15 +414,15 @@ dbt Labs has developed a [project structure guide](/best-practices/how-we-struct 1. In your file tree, use your cursor and hover over the `models` subdirectory, click the three dots **…** that appear to the right of the folder name, then select **Create Folder**. We're going to add two new folders to the file path, `staging` and `formula1` (in that order) by typing `staging/formula1` into the file path. - - + + - If you click into your `models` directory now, you should see the new `staging` folder nested within `models` and the `formula1` folder nested within `staging`. 2. Create two additional folders the same as the last step. Within the `models` subdirectory, create new directories `marts/core`. 3. We will need to create a few more folders and subfolders using the UI. After you create all the necessary folders, your folder tree should look like this when it's all done: - + Remember you can always reference the entire project in [GitHub](https://github.com/dbt-labs/python-snowpark-formula1/tree/python-formula1) to view the complete folder and file strucutre. @@ -742,21 +742,21 @@ The next step is to set up the staging models for each of the 8 source tables. G After the source and all the staging models are complete for each of the 8 tables, your staging folder should look like this: - + 1. It’s a good time to delete our example folder since these two models are extraneous to our formula1 pipeline and `my_first_model` fails a `not_null` test that we won’t spend time investigating. dbt Cloud will warn us that this folder will be permanently deleted, and we are okay with that so select **Delete**. - + 1. Now that the staging models are built and saved, it's time to create the models in our development schema in Snowflake. To do this we're going to enter into the command line `dbt build` to run all of the models in our project, which includes the 8 new staging models and the existing example models. Your run should complete successfully and you should see green checkmarks next to all of your models in the run results. We built our 8 staging models as views and ran 13 source tests that we configured in the `f1_sources.yml` file with not that much code, pretty cool! - + Let's take a quick look in Snowflake, refresh database objects, open our development schema, and confirm that the new models are there. If you can see them, then we're good to go! - + Before we move onto the next section, be sure to commit your new models to your Git branch. Click **Commit and push** and give your commit a message like `profile, sources, and staging setup` before moving on. @@ -1055,7 +1055,7 @@ By now, we are pretty good at creating new files in the correct directories so w 1. Let’s talk about our lineage so far. It’s looking good 😎. We’ve shown how SQL can be used to make data type, column name changes, and handle hierarchical joins really well; all while building out our automated lineage! - + 1. Time to **Commit and push** our changes and give your commit a message like `intermediate and fact models` before moving on. @@ -1128,7 +1128,7 @@ First, we want to find out: which constructor had the fastest pit stops in 2021? - The `snowflake-snowpark-python` library has been picked up to execute our Python code. Even though this wasn’t explicitly stated this is picked up by the dbt class object because we need our Snowpark package to run Python! Python models take a bit longer to run than SQL models, however we could always speed this up by using [Snowpark-optimized Warehouses](https://docs.snowflake.com/en/user-guide/warehouses-snowpark-optimized.html) if we wanted to. Our data is sufficiently small, so we won’t worry about creating a separate warehouse for Python versus SQL files today. - + The rest of our **Details** output gives us information about how dbt and Snowpark for Python are working together to define class objects and apply a specific set of methods to run our models. @@ -1142,7 +1142,7 @@ First, we want to find out: which constructor had the fastest pit stops in 2021? ``` and preview the output: - + Not only did Red Bull have the fastest average pit stops by nearly 40 seconds, they also had the smallest standard deviation, meaning they are both fastest and most consistent teams in pit stops. By using the `.describe()` method we were able to avoid verbose SQL requiring us to create a line of code per column and repetitively use the `PERCENTILE_COUNT()` function. @@ -1187,7 +1187,7 @@ First, we want to find out: which constructor had the fastest pit stops in 2021? in the command bar. 12. Once again previewing the output of our data using the same steps for our `fastest_pit_stops_by_constructor` model. - + We can see that it looks like lap times are getting consistently faster over time. Then in 2010 we see an increase occur! Using outside subject matter context, we know that significant rule changes were introduced to Formula 1 in 2010 and 2011 causing slower lap times. @@ -1314,7 +1314,7 @@ At a high level we’ll be: - The `.apply()` function in the pandas library is used to apply a function to a specified axis of a DataFrame or a Series. In our case the function we used was our lambda function! - The `.apply()` function takes two arguments: the first is the function to be applied, and the second is the axis along which the function should be applied. The axis can be specified as 0 for rows or 1 for columns. We are using the default value of 0 so we aren’t explicitly writing it in the code. This means that the function will be applied to each *row* of the DataFrame or Series. 6. Let’s look at the preview of our clean dataframe after running our `ml_data_prep` model: - + ### Covariate encoding @@ -1565,7 +1565,7 @@ If you haven’t seen code like this before or use joblib files to save machine - Right now our model is only in memory, so we need to use our nifty function `save_file` to save our model file to our Snowflake stage. We save our model as a joblib file so Snowpark can easily call this model object back to create predictions. We really don’t need to know much else as a data practitioner unless we want to. It’s worth noting that joblib files aren’t able to be queried directly by SQL. To do this, we would need to transform the joblib file to an SQL querable format such as JSON or CSV (out of scope for this workshop). - Finally we want to return our dataframe, but create a new column indicating what rows were used for training and those for training. 5. Viewing our output of this model: - + 6. Let’s pop back over to Snowflake and check that our logistic regression model has been stored in our `MODELSTAGE` using the command: @@ -1573,10 +1573,10 @@ If you haven’t seen code like this before or use joblib files to save machine list @modelstage ``` - + 7. To investigate the commands run as part of `train_test_position` script, navigate to Snowflake query history to view it **Activity > Query History**. We can view the portions of query that we wrote such as `create or replace stage MODELSTAGE`, but we also see additional queries that Snowflake uses to interpret python code. - + ### Predicting on new data @@ -1731,7 +1731,7 @@ Since the output of our Python models are tables, we can test SQL and Python mod ### Generic tests 1. To implement generic out-of-the-box tests dbt comes with, we can use YAML files to specify information about our models. To add generic tests to our aggregates model, create a file called `aggregates.yml`, copy the code block below into the file, and save. - + ```yaml version: 2 @@ -1762,7 +1762,7 @@ Since the output of our Python models are tables, we can test SQL and Python mod ### Using macros for testing 1. Under your `macros` folder, create a new file and name it `test_all_values_gte_zero.sql`. Copy the code block below and save the file. For clarity, “gte” is an abbreviation for greater than or equal to. - + ```sql {% macro test_all_values_gte_zero(table, column) %} @@ -1776,7 +1776,7 @@ Since the output of our Python models are tables, we can test SQL and Python mod 3. We use the `{% macro %}` to indicate the start of the macro and `{% endmacro %}` for the end. The text after the beginning of the macro block is the name we are giving the macro to later call it. In this case, our macro is called `test_all_values_gte_zero`. Macros take in *arguments* to pass through, in this case the `table` and the `column`. In the body of the macro, we see an SQL statement that is using the `ref` function to dynamically select the table and then the column. You can always view macros without having to run them by using `dbt run-operation`. You can learn more [here](https://docs.getdbt.com/reference/commands/run-operation). 4. Great, now we want to reference this macro as a test! Let’s create a new test file called `macro_pit_stops_mean_is_positive.sql` in our `tests` folder. - + 5. Copy the following code into the file and save: @@ -1805,7 +1805,7 @@ These tests are defined in `.sql` files, typically in your `tests` directory (as Let’s add a custom test that asserts that the moving average of the lap time over the last 5 years is greater than zero (it’s impossible to have time less than 0!). It is easy to assume if this is not the case the data has been corrupted. 1. Create a file `lap_times_moving_avg_assert_positive_or_null.sql` under the `tests` folder. - + 2. Copy the following code and save the file: @@ -1841,11 +1841,11 @@ Let’s add a custom test that asserts that the moving average of the lap time o dbt test --select fastest_pit_stops_by_constructor lap_times_moving_avg ``` - + 3. All 4 of our tests passed (yay for clean data)! To understand the SQL being run against each of our tables, we can click into the details of the test. 4. Navigating into the **Details** of the `unique_fastest_pit_stops_by_constructor_name`, we can see that each line `constructor_name` should only have one row. - + ## Document your dbt project @@ -1865,17 +1865,17 @@ To start, let’s look back at our `intermediate.md` file. We can see that we pr ``` This will generate the documentation for your project. Click the book button, as shown in the screenshot below to access the docs. - + 2. Go to our project area and view `int_results`. View the description that we created in our doc block. - + 3. View the mini-lineage that looks at the model we are currently selected on (`int_results` in this case). - + 4. In our `dbt_project.yml`, we configured `node_colors` depending on the file directory. Starting in dbt v1.3, we can see how our lineage in our docs looks. By color coding your project, it can help you cluster together similar models or steps and more easily troubleshoot. - + ## Deploy your code @@ -1890,18 +1890,18 @@ Now that we've completed testing and documenting our work, we're ready to deploy 1. Before getting started, let's make sure that we've committed all of our work to our feature branch. If you still have work to commit, you'll be able to select the **Commit and push**, provide a message, and then select **Commit** again. 2. Once all of your work is committed, the git workflow button will now appear as **Merge to main**. Select **Merge to main** and the merge process will automatically run in the background. - + 3. When it's completed, you should see the git button read **Create branch** and the branch you're currently looking at will become **main**. 4. Now that all of our development work has been merged to the main branch, we can build our deployment job. Given that our production environment and production job were created automatically for us through Partner Connect, all we need to do here is update some default configurations to meet our needs. 5. In the menu, select **Deploy** **> Environments** - + 6. You should see two environments listed and you'll want to select the **Deployment** environment then **Settings** to modify it. 7. Before making any changes, let's touch on what is defined within this environment. The Snowflake connection shows the credentials that dbt Cloud is using for this environment and in our case they are the same as what was created for us through Partner Connect. Our deployment job will build in our `PC_DBT_DB` database and use the default Partner Connect role and warehouse to do so. The deployment credentials section also uses the info that was created in our Partner Connect job to create the credential connection. However, it is using the same default schema that we've been using as the schema for our development environment. 8. Let's update the schema to create a new schema specifically for our production environment. Click **Edit** to allow you to modify the existing field values. Navigate to **Deployment Credentials >** **schema.** 9. Update the schema name to **production**. Remember to select **Save** after you've made the change. - + 10. By updating the schema for our production environment to **production**, it ensures that our deployment job for this environment will build our dbt models in the **production** schema within the `PC_DBT_DB` database as defined in the Snowflake Connection section. 11. Now let's switch over to our production job. Click on the deploy tab again and then select **Jobs**. You should see an existing and preconfigured **Partner Connect Trial Job**. Similar to the environment, click on the job, then select **Settings** to modify it. Let's take a look at the job to understand it before making changes. @@ -1912,11 +1912,11 @@ Now that we've completed testing and documenting our work, we're ready to deploy So, what are we changing then? Just the name! Click **Edit** to allow you to make changes. Then update the name of the job to **Production Job** to denote this as our production deployment job. After that's done, click **Save**. 12. Now let's go to run our job. Clicking on the job name in the path at the top of the screen will take you back to the job run history page where you'll be able to click **Run run** to kick off the job. If you encounter any job failures, try running the job again before further troubleshooting. - - + + 13. Let's go over to Snowflake to confirm that everything built as expected in our production schema. Refresh the database objects in your Snowflake account and you should see the production schema now within our default Partner Connect database. If you click into the schema and everything ran successfully, you should be able to see all of the models we developed. - + ### Conclusion diff --git a/website/docs/guides/dremio-lakehouse.md b/website/docs/guides/dremio-lakehouse.md index c8a8c4cf83b..378ec857f6a 100644 --- a/website/docs/guides/dremio-lakehouse.md +++ b/website/docs/guides/dremio-lakehouse.md @@ -143,7 +143,7 @@ dremioSamples: Now that you have a running environment and a completed job, you can view the data in Dremio and expand your code. This is a snapshot of the project structure in an IDE: - + ## About the schema.yml @@ -156,7 +156,7 @@ The models correspond to both weather and trip data respectively and will be joi The sources can be found by navigating to the **Object Storage** section of the Dremio Cloud UI. - + ## About the models @@ -170,11 +170,11 @@ The sources can be found by navigating to the **Object Storage** section of the When you run the dbt job, it will create a **dev** space folder that has all the data assets created. This is what you will see in Dremio Cloud UI. Spaces in Dremio is a way to organize data assets which map to business units or data products. - + Open the **Application folder** and you will see the output of the simple transformation we did using dbt. - + ## Query the data diff --git a/website/docs/guides/manual-install-qs.md b/website/docs/guides/manual-install-qs.md index 082d23bc77e..fcd1e5e9599 100644 --- a/website/docs/guides/manual-install-qs.md +++ b/website/docs/guides/manual-install-qs.md @@ -67,7 +67,7 @@ $ pwd 5. Use a code editor like Atom or VSCode to open the project directory you created in the previous steps, which we named jaffle_shop. The content includes folders and `.sql` and `.yml` files generated by the `init` command.
- +
6. dbt provides the following values in the `dbt_project.yml` file: @@ -126,7 +126,7 @@ $ dbt debug ```
- +
### FAQs @@ -150,7 +150,7 @@ dbt run You should have an output that looks like this:
- +
## Commit your changes @@ -197,7 +197,7 @@ $ git checkout -b add-customers-model 4. From the command line, enter `dbt run`.
- +
When you return to the BigQuery console, you can `select` from this model. diff --git a/website/docs/guides/redshift-qs.md b/website/docs/guides/redshift-qs.md index 5f3395acb82..c81a4d247a5 100644 --- a/website/docs/guides/redshift-qs.md +++ b/website/docs/guides/redshift-qs.md @@ -43,17 +43,17 @@ You can check out [dbt Fundamentals](https://courses.getdbt.com/courses/fundamen 3. Click **Next** for each page until you reach the **Select acknowledgement** checkbox. Select **I acknowledge that AWS CloudFormation might create IAM resources with custom names** and click **Create Stack**. You should land on the stack page with a CREATE_IN_PROGRESS status. - + 4. When the stack status changes to CREATE_COMPLETE, click the **Outputs** tab on the top to view information that you will use throughout the rest of this guide. Save those credentials for later by keeping this open in a tab. 5. Type `Redshift` in the search bar at the top and click **Amazon Redshift**. - + 6. Confirm that your new Redshift cluster is listed in **Cluster overview**. Select your new cluster. The cluster name should begin with `dbtredshiftcluster-`. Then, click **Query Data**. You can choose the classic query editor or v2. We will be using the v2 version for the purpose of this guide. - + 7. You might be asked to Configure account. For this sandbox environment, we recommend selecting “Configure account”. @@ -63,9 +63,9 @@ You can check out [dbt Fundamentals](https://courses.getdbt.com/courses/fundamen - **User name** — `dbtadmin` - **Password** — Use the autogenerated `RSadminpassword` from the output of the stack and save it for later. - + - + 9. Click **Create connection**. @@ -80,15 +80,15 @@ Now we are going to load our sample data into the S3 bucket that our Cloudformat 2. Now we are going to use the S3 bucket that you created with CloudFormation and upload the files. Go to the search bar at the top and type in `S3` and click on S3. There will be sample data in the bucket already, feel free to ignore it or use it for other modeling exploration. The bucket will be prefixed with `dbt-data-lake`. - + 3. Click on the `name of the bucket` S3 bucket. If you have multiple S3 buckets, this will be the bucket that was listed under “Workshopbucket” on the Outputs page. - + 4. Click **Upload**. Drag the three files into the UI and click the **Upload** button. - + 5. Remember the name of the S3 bucket for later. It should look like this: `s3://dbt-data-lake-xxxx`. You will need it for the next section. 6. Now let’s go back to the Redshift query editor. Search for Redshift in the search bar, choose your cluster, and select Query data. @@ -171,7 +171,7 @@ Now we are going to load our sample data into the S3 bucket that our Cloudformat - **Port** — `5439` - **Database** — `dbtworkshop`.
- +
5. Set your development credentials. These credentials will be used by dbt Cloud to connect to Redshift. Those credentials (as provided in your CloudFormation output) will be: @@ -179,7 +179,7 @@ Now we are going to load our sample data into the S3 bucket that our Cloudformat - **Password** — This is the autogenerated password that you used earlier in the guide - **Schema** — dbt Cloud automatically generates a schema name for you. By convention, this is `dbt_`. This is the schema connected directly to your development environment, and it's where your models will be built when running dbt within the Cloud IDE.
- +
6. Click **Test Connection**. This verifies that dbt Cloud can access your Redshift cluster. diff --git a/website/docs/guides/refactoring-legacy-sql.md b/website/docs/guides/refactoring-legacy-sql.md index b12baac95cd..a339e523020 100644 --- a/website/docs/guides/refactoring-legacy-sql.md +++ b/website/docs/guides/refactoring-legacy-sql.md @@ -44,7 +44,7 @@ While refactoring you'll be **moving around** a lot of logic, but ideally you wo To get going, you'll copy your legacy SQL query into your dbt project, by saving it in a `.sql` file under the `/models` directory of your project. - + Once you've copied it over, you'll want to `dbt run` to execute the query and populate the in your warehouse. @@ -76,7 +76,7 @@ If you're migrating multiple stored procedures into dbt, with sources you can se This allows you to consolidate modeling work on those base tables, rather than calling them separately in multiple places. - + #### Build the habit of analytics-as-code Sources are an easy way to get your feet wet using config files to define aspects of your transformation pipeline. diff --git a/website/docs/guides/set-up-ci.md b/website/docs/guides/set-up-ci.md index aa4811d9339..89d7c5a14fa 100644 --- a/website/docs/guides/set-up-ci.md +++ b/website/docs/guides/set-up-ci.md @@ -22,7 +22,7 @@ After that, there's time to get fancy, but let's walk before we run. In this guide, we're going to add a **CI environment**, where proposed changes can be validated in the context of the entire project without impacting production systems. We will use a single set of deployment credentials (like the Prod environment), but models are built in a separate location to avoid impacting others (like the Dev environment). Your git flow will look like this: - + ### Prerequisites @@ -309,7 +309,7 @@ The team at Sunrun maintained a SOX-compliant deployment in dbt while reducing t In this section, we will add a new **QA** environment. New features will branch off from and be merged back into the associated `qa` branch, and a member of your team (the "Release Manager") will create a PR against `main` to be validated in the CI environment before going live. The git flow will look like this: - + ### Advanced prerequisites @@ -323,7 +323,7 @@ As noted above, this branch will outlive any individual feature, and will be the See [Custom branch behavior](/docs/dbt-cloud-environments#custom-branch-behavior). Setting `qa` as your custom branch ensures that the IDE creates new branches and PRs with the correct target, instead of using `main`. - + ### 3. Create a new QA environment diff --git a/website/docs/guides/snowflake-qs.md b/website/docs/guides/snowflake-qs.md index 492609c9bcf..0401c37871f 100644 --- a/website/docs/guides/snowflake-qs.md +++ b/website/docs/guides/snowflake-qs.md @@ -143,35 +143,35 @@ Using Partner Connect allows you to create a complete dbt account with your [Sno 1. In the Snowflake UI, click on the home icon in the upper left corner. In the left sidebar, select **Admin**. Then, select **Partner Connect**. Find the dbt tile by scrolling or by searching for dbt in the search bar. Click the tile to connect to dbt. - + If you’re using the classic version of the Snowflake UI, you can click the **Partner Connect** button in the top bar of your account. From there, click on the dbt tile to open up the connect box. - + 2. In the **Connect to dbt** popup, find the **Optional Grant** option and select the **RAW** and **ANALYTICS** databases. This will grant access for your new dbt user role to each database. Then, click **Connect**. - + - + 3. Click **Activate** when a popup appears: - + - + 4. After the new tab loads, you will see a form. If you already created a dbt Cloud account, you will be asked to provide an account name. If you haven't created account, you will be asked to provide an account name and password. - + 5. After you have filled out the form and clicked **Complete Registration**, you will be logged into dbt Cloud automatically. 6. From your **Account Settings** in dbt Cloud (using the gear menu in the upper right corner), choose the "Partner Connect Trial" project and select **snowflake** in the overview table. Select edit and update the fields **Database** and **Warehouse** to be `analytics` and `transforming`, respectively. - + - +
@@ -181,7 +181,7 @@ Using Partner Connect allows you to create a complete dbt account with your [Sno 2. Enter a project name and click **Continue**. 3. For the warehouse, click **Snowflake** then **Next** to set up your connection. - + 4. Enter your **Settings** for Snowflake with: * **Account** — Find your account by using the Snowflake trial account URL and removing `snowflakecomputing.com`. The order of your account information will vary by Snowflake version. For example, Snowflake's Classic console URL might look like: `oq65696.west-us-2.azure.snowflakecomputing.com`. The AppUI or Snowsight URL might look more like: `snowflakecomputing.com/west-us-2.azure/oq65696`. In both examples, your account will be: `oq65696.west-us-2.azure`. For more information, see [Account Identifiers](https://docs.snowflake.com/en/user-guide/admin-account-identifier.html) in the Snowflake docs. @@ -192,7 +192,7 @@ Using Partner Connect allows you to create a complete dbt account with your [Sno * **Database** — `analytics`. This tells dbt to create new models in the analytics database. * **Warehouse** — `transforming`. This tells dbt to use the transforming warehouse that was created earlier. - + 5. Enter your **Development Credentials** for Snowflake with: * **Username** — The username you created for Snowflake. The username is not your email address and is usually your first and last name together in one word. @@ -201,7 +201,7 @@ Using Partner Connect allows you to create a complete dbt account with your [Sno * **Target name** — Leave as the default. * **Threads** — Leave as 4. This is the number of simultaneous connects that dbt Cloud will make to build models concurrently. - + 6. Click **Test Connection**. This verifies that dbt Cloud can access your Snowflake account. 7. If the connection test succeeds, click **Next**. If it fails, you may need to check your Snowflake settings and credentials. diff --git a/website/docs/guides/starburst-galaxy-qs.md b/website/docs/guides/starburst-galaxy-qs.md index 9a6c44574cd..1822c83fa90 100644 --- a/website/docs/guides/starburst-galaxy-qs.md +++ b/website/docs/guides/starburst-galaxy-qs.md @@ -92,11 +92,11 @@ In addition to Amazon S3, Starburst Galaxy supports many other data sources. To The **Amazon S3** page should look similar to this, except for the **Authentication to S3** section which is dependant on your setup: - + 8. Click **Test connection**. This verifies that Starburst Galaxy can access your S3 bucket. 9. Click **Connect catalog** if the connection test passes. - + 10. On the **Set permissions** page, click **Skip**. You can add permissions later if you want. 11. On the **Add to cluster** page, choose the cluster you want to add the catalog to from the dropdown and click **Add to cluster**. @@ -113,7 +113,7 @@ In addition to Amazon S3, Starburst Galaxy supports many other data sources. To When done, click **Add privileges**. - + ## Create tables with Starburst Galaxy To query the Jaffle Shop data with Starburst Galaxy, you need to create tables using the Jaffle Shop data that you [loaded to your S3 bucket](#load-data-to-s3). You can do this (and run any SQL statement) from the [query editor](https://docs.starburst.io/starburst-galaxy/query/query-editor.html). @@ -121,7 +121,7 @@ To query the Jaffle Shop data with Starburst Galaxy, you need to create tables u 1. Click **Query > Query editor** on the left sidebar of the Starburst Galaxy UI. The main body of the page is now the query editor. 2. Configure the query editor so it queries your S3 bucket. In the upper right corner of the query editor, select your cluster in the first gray box and select your catalog in the second gray box: - + 3. Copy and paste these queries into the query editor. Then **Run** each query individually. @@ -181,7 +181,7 @@ To query the Jaffle Shop data with Starburst Galaxy, you need to create tables u ``` 4. When the queries are done, you can see the following hierarchy on the query editor's left sidebar: - + 5. Verify that the tables were created successfully. In the query editor, run the following queries: diff --git a/website/docs/reference/node-selection/graph-operators.md b/website/docs/reference/node-selection/graph-operators.md index 88d99d7b92a..8cba43e1b52 100644 --- a/website/docs/reference/node-selection/graph-operators.md +++ b/website/docs/reference/node-selection/graph-operators.md @@ -29,7 +29,7 @@ dbt run --select "3+my_model+4" # select my_model, its parents up to the ### The "at" operator The `@` operator is similar to `+`, but will also include _the parents of the children of the selected model_. This is useful in continuous integration environments where you want to build a model and all of its children, but the _parents_ of those children might not exist in the database yet. The selector `@snowplow_web_page_context` will build all three models shown in the diagram below. - + ```bash dbt run --models @my_model # select my_model, its children, and the parents of its children diff --git a/website/docs/reference/resource-configs/bigquery-configs.md b/website/docs/reference/resource-configs/bigquery-configs.md index a5198fd3487..8f323bc4236 100644 --- a/website/docs/reference/resource-configs/bigquery-configs.md +++ b/website/docs/reference/resource-configs/bigquery-configs.md @@ -379,7 +379,7 @@ models: - + ### Specifying tags BigQuery table and view *tags* can be created by supplying an empty string for the label value. diff --git a/website/docs/reference/resource-configs/persist_docs.md b/website/docs/reference/resource-configs/persist_docs.md index 481f25d4e95..15b1e0bdb40 100644 --- a/website/docs/reference/resource-configs/persist_docs.md +++ b/website/docs/reference/resource-configs/persist_docs.md @@ -186,8 +186,8 @@ models: Run dbt and observe that the created relation and columns are annotated with your descriptions: - - diff --git a/website/docs/reference/resource-configs/spark-configs.md b/website/docs/reference/resource-configs/spark-configs.md index 5c32fa5fc83..ce3b317f0f1 100644 --- a/website/docs/reference/resource-configs/spark-configs.md +++ b/website/docs/reference/resource-configs/spark-configs.md @@ -104,7 +104,7 @@ If no `partition_by` is specified, then the `insert_overwrite` strategy will ato - This strategy is not available when connecting via Databricks SQL endpoints (`method: odbc` + `endpoint`). - If connecting via a Databricks cluster + ODBC driver (`method: odbc` + `cluster`), you **must** include `set spark.sql.sources.partitionOverwriteMode DYNAMIC` in the [cluster Spark Config](https://docs.databricks.com/clusters/configure.html#spark-config) in order for dynamic partition replacement to work (`incremental_strategy: insert_overwrite` + `partition_by`). - + + If mixing images and text together, also consider using a docs block. diff --git a/website/docs/terms/dag.md b/website/docs/terms/dag.md index b108c68806a..c6b91300bfc 100644 --- a/website/docs/terms/dag.md +++ b/website/docs/terms/dag.md @@ -32,7 +32,7 @@ One of the great things about DAGs is that they are *visual*. You can clearly id Take this mini-DAG for an example: - + What can you learn from this DAG? Immediately, you may notice a handful of things: @@ -57,7 +57,7 @@ You can additionally use your DAG to help identify bottlenecks, long-running dat ...to name just a few. Understanding the factors impacting model performance can help you decide on [refactoring approaches](https://courses.getdbt.com/courses/refactoring-sql-for-modularity), [changing model materialization](https://docs.getdbt.com/blog/how-we-shaved-90-minutes-off-model#attempt-2-moving-to-an-incremental-model)s, replacing multiple joins with surrogate keys, or other methods. - + ### Modular data modeling best practices @@ -83,7 +83,7 @@ The marketing team at dbt Labs would be upset with us if we told you we think db Whether you’re using dbt Core or Cloud, dbt docs and the Lineage Graph are available to all dbt developers. The Lineage Graph in dbt Docs can show a model or source’s entire lineage, all within a visual frame. Clicking within a model, you can view the Lineage Graph and adjust selectors to only show certain models within the DAG. Analyzing the DAG here is a great way to diagnose potential inefficiencies or lack of modularity in your dbt project. - + The DAG is also [available in the dbt Cloud IDE](https://www.getdbt.com/blog/on-dags-hierarchies-and-ides/), so you and your team can refer to your lineage while you build your models. diff --git a/website/docs/terms/data-lineage.md b/website/docs/terms/data-lineage.md index 163047187ba..d0162c35616 100644 --- a/website/docs/terms/data-lineage.md +++ b/website/docs/terms/data-lineage.md @@ -69,7 +69,7 @@ Your is used to visually show upstream dependencies, the nodes Ultimately, DAGs are an effective way to see relationships between data sources, models, and dashboards. DAGs are also a great way to see visual bottlenecks, or inefficiencies in your data work (see image below for a DAG with...many bottlenecks). Data teams can additionally add [meta fields](https://docs.getdbt.com/reference/resource-configs/meta) and documentation to nodes in the DAG to add an additional layer of governance to their dbt project. - + :::tip Automatic > Manual diff --git a/website/snippets/quickstarts/intro-build-models-atop-other-models.md b/website/snippets/quickstarts/intro-build-models-atop-other-models.md index eeedec34892..1104461079b 100644 --- a/website/snippets/quickstarts/intro-build-models-atop-other-models.md +++ b/website/snippets/quickstarts/intro-build-models-atop-other-models.md @@ -2,4 +2,4 @@ As a best practice in SQL, you should separate logic that cleans up your data fr Now you can experiment by separating the logic out into separate models and using the [ref](/reference/dbt-jinja-functions/ref) function to build models on top of other models: - + diff --git a/website/src/components/lightbox/styles.module.css b/website/src/components/lightbox/styles.module.css index 3027a88f45a..36d59ad42a3 100644 --- a/website/src/components/lightbox/styles.module.css +++ b/website/src/components/lightbox/styles.module.css @@ -10,7 +10,7 @@ margin: 10px auto; padding-right: 10px; display: block; - max-width: 400px; + max-width: 100%; } :local(.collapsed) {