diff --git a/README.md b/README.md index c749fedf95a..d306651f545 100644 --- a/README.md +++ b/README.md @@ -62,18 +62,3 @@ You can click a link available in a Vercel bot PR comment to see and review your Advisory: - If you run into an `fatal error: 'vips/vips8' file not found` error when you run `npm install`, you may need to run `brew install vips`. Warning: this one will take a while -- go ahead and grab some coffee! - -## Running the Cypress tests locally - -Method 1: Utilizing the Cypress GUI -1. `cd` into the repo: `cd docs.getdbt.com` -2. `cd` into the `website` subdirectory: `cd website` -3. Install the required node packages: `npm install` -4. Run `npx cypress open` to open the Cypress GUI, and choose `E2E Testing` as the Testing Type, before finally selecting your browser and clicking `Start E2E testing in {broswer}` -5. Click on a test and watch it run! - -Method 2: Running the Cypress E2E tests headlessly -1. `cd` into the repo: `cd docs.getdbt.com` -2. `cd` into the `website` subdirectory: `cd website` -3. Install the required node packages: `npm install` -4. Run `npx cypress run` diff --git a/contributing/developer-blog.md b/contributing/developer-blog.md deleted file mode 100644 index 0d9b3becba2..00000000000 --- a/contributing/developer-blog.md +++ /dev/null @@ -1,67 +0,0 @@ - -* [Contributing](#contributing) -* [Core Principles](#core-principles) - -## Contributing - -The dbt Developer Blog is a place where analytics practitioners can go to share their knowledge with the community. Analytics Engineering is a discipline we’re all building together. The developer blog exists to cultivate the collective knowledge that exists on how to build and scale effective data teams. - -We currently have editorial capacity for a few Community contributed developer blogs per quarter - if we are oversubscribed we suggest you post on another platform or hold off until the editorial team is ready to take on more posts. - -### What makes a good developer blog post? - -- The short answer: Practical, hands on analytics engineering tutorials and stories - - [Slim CI/CD with Bitbucket](https://docs.getdbt.com/blog/slim-ci-cd-with-bitbucket-pipelines) - - [So You Want to Build a dbt Package](https://docs.getdbt.com/blog/so-you-want-to-build-a-package) - - [Founding an Analytics Engineering Team](https://docs.getdbt.com/blog/founding-an-analytics-engineering-team-smartsheet) -- See the [Developer Blog Core Principles](#core-principles) - -### How do I submit a proposed post? - -To submit a proposed post, open a `Contribute to the dbt Developer Blog` issue on the [Developer Hub repo](https://github.com/dbt-labs/docs.getdbt.com/issues/new/choose). You will be asked for: - -- A short (one paragraph) summary of the post you’d like to publish -- An outline of the post - -You’ll hear back from a member of the dbt Labs teams within 7 days with one of three responses: - -- The post looks good to go as is! We’ll ask you to start creating a draft based off of the initial outline you submitted -- Proposed changes to the outline. This could be additional focus on a topic you mention that’s of high community interest or a tweak to the structure to help with narrative flow -- Not a fit for the developer blog right now. We hugely appreciate *any* interest in submitting to the Developer Blog - right now our biggest backlog is capacity to help folks get these published. See below on how we are thinking about and evaluating potential posts. - -### What is the process once my blog is accepted? - -Once a blog is accepted, we’ll ask you for a date when we can expect the draft by. Typically we’ll ask that you can commit to having this ready within a month of submitting the issue. - -Once you submit a draft, we’ll return a first set of edits within 5 business days. - -The typical turnaround time from issue creation to going live on the developer blog is ~4 to 6 weeks. - -### What happens after my blog is published? - -We’ll share the blog on the dbt Labs social media channels! We also encourage you to share on the dbt Slack in #i-made-this. - -### What if my post doesn’t get approved? - -We want to publish as many community contributors as possible, but not every post will be a fit for the Developer Blog. That’s ok! There are many different reasons why we might not be able to publish a post right now and none of them reflect on the quality of the proposed post. - -- **dbt Labs capacity**: We’re committed to providing hands-on feedback and coaching throughout the process. Our goal is not just to generate great developer blogs - it’s to help build a community of great writers / practitioners who can share their knowledge with the community for years to come. This necessarily means we will be able to take on a lower absolute number of posts in the short term, but will hopefully be helpful for the community long term. -- **Focus on narrative / problem solving - not industry trends**: The developer blog exists, primarily, to tell the stories of analytics engineering practitioners and how they solve problems. The idea is that reading the developer blog gives a feel for what it is like to be a data practitioner on the ground today. This is not a hard and fast rule, but a good way to approach this is “How I/we solved X problem” rather than “How everyone should solve X problem”. - -We are very interested in stacks, new tools and integrations and will happily publish posts about this - with the caveat that the *focus* of the post should be solving real world problems. Hopefully if you are writing about these, this is something that you have used yourself in a hands on, production implementation. - -- **Right sized scope**: We want to be able to cover a topic in-depth and dig into the nuances. Big topics like “How should you structure your data team” or “How to ensure data quality in your organization” will be tough to cover in the scope of a single post. If you have a big idea - try subdividing it! “How should you structure your data team” could become “How we successfully partnered with our RevOps team on improving lead tracking” and “How to ensure data quality in your organization” might be “How we cleaned up our utm tracking”. - -### What if I need help / have questions: - -- Feel free to post any questions in #community-writers on the dbt Slack. - -## Core Principles - -- 🧑🏻‍🤝‍🧑🏾 The dbt Developer blog is written by humans **- individual analytics professionals sharing their insight with the world. To the extent feasible, a community member posting on the developer blog is not staking an official organizational stance, but something that *they* have learned or believe based on their work. This is true for dbt Labs employees as well. -- 💍 Developer blog content is knowledge rich - these are posts that readers share, bookmark and come back to time and time again. -- ⛹🏼‍♂️ Developer blog content is written by and for *practitioners* - end users of analytics tools (and sometimes people that work with practitioners). -- ⭐ Developer blog content is best when it is *the story which the author is uniquely positioned to tell.* Authors are encouraged to consider what insight they have that is specific to them and the work they have done. -- 🏎️ Developer blog content is actionable - readers walk away with a clear sense of how they can use this information to be a more effective practitioner. Posts include code snippets, Loom walkthroughs and hands-on, practical information that can be integrated into daily workflows. -- 🤏 Nothing is too small to share - what you think is simple has the potential to change someone's week. -- 🔮 Developer blog content is present focused —posts tell a story of a thing that you've already done or are actively doing, not something that you may do in the future. diff --git a/contributing/single-sourcing-content.md b/contributing/single-sourcing-content.md index 537980ebdfb..6dc14d760b1 100644 --- a/contributing/single-sourcing-content.md +++ b/contributing/single-sourcing-content.md @@ -90,7 +90,7 @@ This component can be added directly to a markdown file in a similar way as othe Both properties can be used together to set a range where the content should show. In the example below, this content will only show if the selected version is between **0.21** and **1.0**: ```markdown - + Versioned content here diff --git a/website/blog/2022-07-13-star-sql-love-letter.md b/website/blog/2022-07-13-star-sql-love-letter.md index a84750198de..0d5aec181a2 100644 --- a/website/blog/2022-07-13-star-sql-love-letter.md +++ b/website/blog/2022-07-13-star-sql-love-letter.md @@ -44,7 +44,7 @@ So what does this mean for the example from above? Instead of writing out all 55 ```sql select - {{ dbt_utils.star(from=ref('table_a'), except=['column_56'] }} + {{ dbt_utils.star(from=ref('table_a'), except=['column_56']) }} from {{ ref('table_a') }} ``` diff --git a/website/dbt-versions.js b/website/dbt-versions.js index 9007d719bc0..825af8ac6ee 100644 --- a/website/dbt-versions.js +++ b/website/dbt-versions.js @@ -15,7 +15,7 @@ */ exports.versions = [ { - version: "1.9.1", + version: "1.10", customDisplay: "Cloud (Versionless)", }, { @@ -28,11 +28,7 @@ exports.versions = [ }, { version: "1.7", - EOLDate: "2024-10-30", - }, - { - version: "1.6", - EOLDate: "2024-07-31", + EOLDate: "2024-11-01", }, ]; @@ -66,134 +62,6 @@ exports.versionedPages = [ page: "reference/global-configs/indirect-selection", firstVersion: "1.8", }, - { - page: "reference/resource-configs/store_failures_as", - firstVersion: "1.7", - }, - { - page: "docs/build/build-metrics-intro", - firstVersion: "1.6", - }, - { - page: "docs/build/sl-getting-started", - firstVersion: "1.6", - }, - { - page: "docs/build/about-metricflow", - firstVersion: "1.6", - }, - { - page: "docs/build/join-logic", - firstVersion: "1.6", - }, - { - page: "docs/build/validation", - firstVersion: "1.6", - }, - { - page: "docs/build/semantic-models", - firstVersion: "1.6", - }, - { - page: "docs/build/group-by", - firstVersion: "1.6", - }, - { - page: "docs/build/entities", - firstVersion: "1.6", - }, - { - page: "docs/build/metrics-overview", - firstVersion: "1.6", - }, - { - page: "docs/build/cumulative", - firstVersion: "1.6", - }, - { - page: "docs/build/derived", - firstVersion: "1.6", - }, - { - page: "docs/build/measure-proxy", - firstVersion: "1.6", - }, - { - page: "docs/build/ratio", - firstVersion: "1.6", - }, - { - page: "reference/commands/clone", - firstVersion: "1.6", - }, - { - page: "docs/collaborate/govern/project-dependencies", - firstVersion: "1.6", - }, - { - page: "reference/dbt-jinja-functions/thread_id", - firstVersion: "1.6", - }, - { - page: "reference/resource-properties/deprecation_date", - firstVersion: "1.6", - }, - { - page: "reference/commands/retry", - firstVersion: "1.6", - }, - { - page: "docs/build/groups", - firstVersion: "1.5", - }, - { - page: "docs/collaborate/govern/model-contracts", - firstVersion: "1.5", - }, - { - page: "reference/commands/show", - firstVersion: "1.5", - }, - { - page: "docs/collaborate/govern/model-access", - firstVersion: "1.5", - }, - { - page: "docs/collaborate/govern/model-versions", - firstVersion: "1.5", - }, - { - page: "reference/programmatic-invocations", - firstVersion: "1.5", - }, - { - page: "reference/resource-configs/contract", - firstVersion: "1.5", - }, - { - page: "reference/resource-configs/group", - firstVersion: "1.5", - }, - { - page: "reference/resource-properties/access", - firstVersion: "1.5", - }, - { - page: "reference/resource-properties/constraints", - firstVersion: "1.5", - }, - { - page: "reference/resource-properties/latest_version", - firstVersion: "1.5", - }, - { - page: "reference/resource-properties/versions", - firstVersion: "1.5", - }, - { - page: "reference/resource-configs/on_configuration_change", - firstVersion: "1.6", - }, ]; /** @@ -206,12 +74,5 @@ exports.versionedPages = [ * @property {string} firstVersion The first version the category is visible in the sidebar */ exports.versionedCategories = [ - { - category: "Model governance", - firstVersion: "1.5", - }, - { - category: "Build your metrics", - firstVersion: "1.6", - }, + ]; diff --git a/website/docs/best-practices/how-we-structure/4-marts.md b/website/docs/best-practices/how-we-structure/4-marts.md index 21de31a9e0d..995dea7e96f 100644 --- a/website/docs/best-practices/how-we-structure/4-marts.md +++ b/website/docs/best-practices/how-we-structure/4-marts.md @@ -26,7 +26,8 @@ models/marts ✅ **Group by department or area of concern.** If you have fewer than 10 or so marts you may not have much need for subfolders, so as with the intermediate layer, don’t over-optimize too early. If you do find yourself needing to insert more structure and grouping though, use useful business concepts here. In our marts layer, we’re no longer worried about source-conformed data, so grouping by departments (marketing, finance, etc.) is the most common structure at this stage. -✅ **Name by entity.** Use plain English to name the file based on the concept that forms the grain of the mart `customers`, `orders`. Note that for pure marts, there should not be a time dimension (`orders_per_day`) here, that is typically best captured via metrics. +✅ **Name by entity.** Use plain English to name the file based on the concept that forms the grain of the mart’s `customers`, `orders`. Marts that don't include any time-based rollups (pure marts) should not have a time dimension (`orders_per_day`) here, typically best captured via metrics. + ❌ **Build the same concept differently for different teams.** `finance_orders` and `marketing_orders` is typically considered an anti-pattern. There are, as always, exceptions — a common pattern we see is that, finance may have specific needs, for example reporting revenue to the government in a way that diverges from how the company as a whole measures revenue day-to-day. Just make sure that these are clearly designed and understandable as _separate_ concepts, not departmental views on the same concept: `tax_revenue` and `revenue` not `finance_revenue` and `marketing_revenue`. diff --git a/website/docs/community/spotlight/bruno-de-lima.md b/website/docs/community/spotlight/bruno-de-lima.md index f5ffaa6a970..3c373db06e8 100644 --- a/website/docs/community/spotlight/bruno-de-lima.md +++ b/website/docs/community/spotlight/bruno-de-lima.md @@ -2,42 +2,39 @@ id: bruno-de-lima title: Bruno de Lima description: | - Hi all! I'm a Data Engineer, deeply fascinated by the awesomeness dbt. I love talking about dbt, creating content from daily tips to blogposts and engaging with this vibrant community! - - Started my career at the beginning of 2022 at Indicium as an Analytics Engineer, working with dbt from day 1. By 2023, my path took a global trajectory as I joined phData as a Data Engineer, expanding my experiences and forging connections beyond Brazil. While dbt is at the heart of my expertise, I've also delved into data warehouses such as Snowflake, Databricks, and BigQuery; visualization tools like Power BI and Tableau; and several minor modern data stack tools. - - I actively participate in the dbt community, having attended two dbt Meetups in Brazil organized by Indicium; writing about dbt-related topics in my Medium and LinkedIn profiles; contributing to the code; and frequently checking dbt Slack and Discourse, helping (and being helped by) other dbt practitioners. If you are a community member, you may have seen me around! -image: /img/community/spotlight/bruno-de-lima.jpg + Hey all! I was born and raised in Florianopolis, Brazil, and I'm a Senior Data Engineer at phData. I live with my fiancée and I enjoy music, photography, and powerlifting. + + I started my career in early 2022 at Indicium as an Analytics Engineer, working with dbt from day 1. By 2023, my path took a global trajectory as I joined phData as a Data Engineer, expanding my experiences and creating connections beyond Brazil. While dbt is my main expertise, because of my work in consultancy I have experience with a large range of tools, specially the ones related to Snowflake, Databricks, AWS and GCP; but I have already tried several other modern data stack tools too. + + I actively participate in the dbt community, having organized dbt Meetups in Brazil (in Floripa and São Paulo); writing about dbt-related topics in my Medium and LinkedIn profiles; contributing to the dbt Core code and to the docs; and frequently checking dbt Slack and Discourse, helping (and being helped by) other dbt practitioners. If you are a community member, you may have seen me around! +image: /img/community/spotlight/bruno-souza-de-lima-newimage.jpg pronouns: he/him location: Florianópolis, Brazil -jobTitle: Data Engineer +jobTitle: Senior Data Engineer companyName: phData -organization: "" socialLinks: - name: LinkedIn link: https://www.linkedin.com/in/brunoszdl/ - name: Medium link: https://medium.com/@bruno.szdl -dateCreated: 2023-11-05 +dateCreated: 2024-11-03 hide_table_of_contents: true communityAward: true -communityAwardYear: 2023 +communityAwardYear: 2024 --- ## When did you join the dbt community and in what way has it impacted your career? -I was not truly happy with my academic life. My career took a new turn when I enrolled in the Analytics Engineer course by Indicium. That was my first contact with dbt, and I didn't realize how much it would transform my career. After that, I was hired at the company as an Analytics Engineer and worked extensively with dbt from day one. +I was not truly happy with my academic life. My career took a new turn when I enrolled in the Analytics Engineer course by Indicium. That was my first contact with dbt, and I didn't realize how much it would transform my career. After that, I was hired at the company as an Analytics Engineer and worked extensively with dbt from day one. It took me some time to become an active member of the dbt community. I started working with dbt at the beginning of 2022 and became more involved towards the end of that year, encouraged by Daniel Avancini. I regret not doing this earlier, because being an active community member has been a game-changer for me, as my knowledge of dbt has grown exponentially just by participating in daily discussions on Slack. I have found #advice-dbt-help and #advice-dbt-for-power-users channels particularly useful, as well as the various database-specific channels. Additionally, the #i-made-this and #i-read-this channels have allowed me to learn about the innovative things that community members are doing. Inspired by other members, especially Josh Devlin and Owen Prough, I began answering questions on Slack and Discourse. For questions I couldn't answer, I would try engaging in discussions about possible solutions or provide useful links. I also started posting dbt tips on LinkedIn to help practitioners learn about new features or to refresh their memories about existing ones. -By being more involved in the community, I felt more connected and supported. I received help from other members, and now, I could help others, too. I was happy with this arrangement, but more unexpected surprises came my way. My active participation in Slack, Discourse, and LinkedIn opened doors to new connections and career opportunities. I had the pleasure of meeting a lot of incredible people and receiving exciting job offers, including the one for working at phData. +By being more involved in the community, I felt more connected and supported. I received help from other members, and now, I could help others, too. I was happy with this arrangement, but more unexpected surprises came my way. My active participation in Slack, Discourse, and LinkedIn opened doors to new connections and career opportunities. I had the pleasure of meeting a lot of incredible people and receiving exciting job offers, including the ones for working at phData and teaching at Zach Wilson's data engineering bootcamp. Thanks to the dbt community, I went from feeling uncertain about my career prospects to having a solid career and being surrounded by incredible people. -I would like to thank the Indicium folks for opening the first door for me for this career in data, and not just for me but for lots of people in Brazil trying to migrate from different fields who would not have this opportunity otherwise. - ## What dbt community leader do you identify with? How are you looking to grow your leadership in the dbt community? I identify with Gwen Windflower and Joel Labes, or at least they are the kind of leader I admire. Their strong presence and continuous interaction with all types of dbt enthusiasts make everyone feel welcomed in the community. They uplift those who contribute to the community, whether it's through a LinkedIn post or answering a question, and provide constructive feedback to help them improve. And of course they show a very strong knowledge about dbt and data in general, which is reflected in their contributions. diff --git a/website/docs/community/spotlight/christophe-oudar.md b/website/docs/community/spotlight/christophe-oudar.md new file mode 100644 index 00000000000..2381d88a381 --- /dev/null +++ b/website/docs/community/spotlight/christophe-oudar.md @@ -0,0 +1,35 @@ +--- +id: christophe-oudar +title: Christophe Oudar +description: | + I joined the dbt Community in November 2021 after exchanging some issues in Github. I currently work as a staff engineer at a scaleup in the ad tech industry called Teads, which I joined 11 years ago as a new grad. I've been using dbt Core on BigQuery since then. I write about data engineering both on Medium and Substack. I contribute on dbt-bigquery. I wrote an article that was then featured on the Developer Blog called BigQuery ingestion-time partitioning and partition copy with dbt. +image: /img/community/spotlight/christophe-oudar.jpg +pronouns: he/him +location: Montpellier, France +jobTitle: Staff Engineer +companyName: Teads +socialLinks: + - name: X + link: https://x.com/Kayrnt + - name: LinkedIn + link: https://www.linkedin.com/in/christopheoudar/ + - name: Substack + link: https://smallbigdata.substack.com/ +dateCreated: 2024-11-08 +hide_table_of_contents: true +communityAward: true +communityAwardYear: 2024 +--- + +## When did you join the dbt community and in what way has it impacted your career? + +I joined the community in November 2021 as a way to explore how to move our in-house data modeling layer to dbt. The transition took over a year while we ensured we could cover all our bases and add missing features to dbt-bigquery. That project was one of stepping stones that helped me to move from senior to staff level at my current job. + +## What dbt community leader do you identify with? How are you looking to grow your leadership in the dbt community? + +I identify with leaders that have strong convictions about how data engineering should move forward but remain open to innovation and ideas from everyone to bring the best to the field and make it as inclusive as possible to all cultures and profiles. I think that could mean people like Jordan Tigani or Mark Raasveldt. In the dbt community, my leadership has looked like helping people struggling and offering better ways to simplify one's day to day work when possible. + +## What have you learned from community members? What do you hope others can learn from you? + +I read a lot of articles about dbt, especially when I got started with it. It helped me a lot to build a proper Slim CI that could fit my company's ways of working. I also got to see how data pipelines were done in other companies and the pros and cons of my approaches. I hope I can share more of that knowledge for people to pick what's best for their needs. +​ diff --git a/website/docs/community/spotlight/fabiyi-opeyemi.md b/website/docs/community/spotlight/fabiyi-opeyemi.md index 18a311fa437..b5b4bf8c9e0 100644 --- a/website/docs/community/spotlight/fabiyi-opeyemi.md +++ b/website/docs/community/spotlight/fabiyi-opeyemi.md @@ -2,13 +2,11 @@ id: fabiyi-opeyemi title: Opeyemi Fabiyi description: | - I'm an Analytics Engineer with Data Culture, a Data Consulting firm where I use dbt regularly to help clients build quality-tested data assets. I've also got a background in financial services and supply chain. I'm passionate about helping organizations to become data-driven and I majorly use dbt for data modeling, while the other aspect of the stack is largely dependent on the client infrastructure I'm working for, so I often say I'm tool-agnostic. 😀 - - I'm the founder of Nigeria's Young Data Professional Community. I'm also the organizer of the Lagos dbt Meetup which I started, and one of the organizers of the DataFest Africa Conference. I became an active member of the dbt Community in 2021 & spoke at Coalesce 2022. + I’m an Analytics Engineer with Data Culture, a Data Consulting firm where I use dbt regularly to help clients build quality-tested data assets. Before Data Culture, I worked at Cowrywise, one of the leading Fintech companies in Nigeria, where I was a solo data team member, and that was my first introduction to dbt and Analytics Engineering. Before that, I was doing Data Science and Analytics at Deloitte Nigeria. It’s been an exciting journey since I started using dbt and joining the community.Outside of work, I’m very passionate about Community building and Data Advocacy. I founded one of Nigeria’s most vibrant Data communities, “The Young Data Professional Community.” I’m also the Founder of the Lagos dbt Meetup and one of the organizers of the Largest Data Conference in Africa, DataFest Africa Conference. I became an active member of the dbt community in 2021 & spoke at Coalesce 2022. So when I’m not actively working I’m involved in one community activity or the other. image: /img/community/spotlight/fabiyi-opeyemi.jpg pronouns: he/him location: Lagos, Nigeria -jobTitle: Senior Analytics Engineer +jobTitle: Analytics Manager companyName: Data Culture organization: Young Data Professionals (YDP) socialLinks: @@ -16,10 +14,10 @@ socialLinks: link: https://twitter.com/Opiano_1 - name: LinkedIn link: https://www.linkedin.com/in/opeyemifabiyi/ -dateCreated: 2023-11-06 +dateCreated: 2024-11-02 hide_table_of_contents: true communityAward: true -communityAwardYear: 2023 +communityAwardYear: 2024 --- ## When did you join the dbt community and in what way has it impacted your career? @@ -40,4 +38,4 @@ I've learned how to show empathy as a data professional and be a great engineer ## Anything else interesting you want to tell us? -Maybe, I will consider DevRel as a career sometime because of my innate passion and love for community and people. Several folks tell me I'm a strong DevRel talent and a valuable asset for any product-led company. If you need someone to bounce ideas off of or discuss😃 your community engagement efforts, please feel free to reach out. +Maybe I will consider DevRel as a career sometime because of my innate passion and love for community and people. Several folks tell me I’m a strong DevRel talent and a valuable asset for any product-led company. If you need someone to bounce ideas off of or discuss your community engagement efforts, please feel free to reach out. On a side note, it was really exciting for me to attend Coalesce 2024 in Vegas in person, which allowed me not only to learn but, most importantly, to meet amazing persons I’ve only interacted with online, like Bruno, Kuberjain, Dakota and many more; shout-out to Zenlytic and Lightdash for making that possible and, most importantly, a huge shout-out to the dbt Lab community team: Amada, Natasha and everyone on the community team for their constant supports to helping out with making the dbt Lagos (Nigeria) meetup a success. diff --git a/website/docs/community/spotlight/jenna-jordan.md b/website/docs/community/spotlight/jenna-jordan.md new file mode 100644 index 00000000000..86f19f125f8 --- /dev/null +++ b/website/docs/community/spotlight/jenna-jordan.md @@ -0,0 +1,36 @@ +--- +id: jenna-jordan +title: Jenna Jordan +description: | + I am a Senior Data Management Consultant with Analytics8, where I advise clients on dbt best practices (especially regarding dbt Mesh and the various shifts in governance and strategy that come with it). My experiences working within a dbt Mesh architecture and all of the difficulties organizations could run into with such a major paradigm shift inspired my peer exchange (role-playing/simulation game) at Coalesce 2024: "Governance co-lab: We the people, in order to govern data, do establish processes." I also experimented with bringing role-playing scenarios to data problems at the September 2024 Chicago dbt Meetup, hosted by Analytics8. I occasionally write long blog posts on my website, if you're up for the read. +image: /img/community/spotlight/jenna-jordan.jpg +pronouns: she/her +location: Asheville, USA +jobTitle: Senior Data Management Consultant +companyName: Analytics8 +socialLinks: + - name: LinkedIn + link: https://www.linkedin.com/in/jennajordan1/ + - name: Personal website + link: https://jennajordan.me/ +dateCreated: 2024-11-01 +hide_table_of_contents: true +communityAward: true +communityAwardYear: 2024 +--- + +## When did you join the dbt community and in what way has it impacted your career? + +My dbt learning journey kicked off with the CoRise (now Uplimit) course Analytics Engineering with dbt, with Emily Hawkins and Jake Hannan, in February 2022 – less than a month after starting as a data engineer with the City of Boston Analytics Team. About a year later, I spearheaded the adoption of dbt at the City and got to build the project and associated architecture from scratch – which is probably the best learning experience you could ask for! I saw the value dbt could bring to improving data management processes at the City, and I knew there were other cities and local governments that could benefit from dbt as well, which motivated me to find my fellow co-speakers Ian Rose and Laurie Merrell to give a talk at Coalesce 2023 called "From Coast to Coast: Implementing dbt in the public sector." As a part of our goal to identify and cultivate a community of dbt practitioners in the public (and adjacent) sectors, we also started the dbt Community Slack channel #industry-public-sector. That experience allowed me to continue to grow my career and find my current role - as well as connect with so many amazing data folks! + +## What dbt community leader do you identify with? How are you looking to grow your leadership in the dbt community? + +There are many leaders in the dbt community that I admire and identify with – I won’t list them all out because I will invariably miss someone (but… you probably know who you are). Technical prowess is always enviable, but I most admire those who bring the human element to data work: those who aren’t afraid to be their authentic selves, cultivate a practice of empathy and compassion, and are driven by curiosity and a desire to help others. I’ve never set out to be a leader, and I still don’t really consider myself to be a leader – I’m much more comfortable in the role of a librarian. I just want to help people by connecting them to the information and resources that they may need. + +## What have you learned from community members? What do you hope others can learn from you? + +Pretty much everything I’ve learned about dbt and working in a mature analytics ecosystem I’ve learned from dbt community members. The dbt Community Slack is full of useful information and advice, and has also helped me identify experts about certain topics that I can chat with to learn even more. When I find someone sharing useful information, I usually try to find and follow them on social media so I can see more of their content. If there is one piece of advice I want to share, it is this: don’t be afraid to engage. Ask for help when you need it, but also offer help freely. Engage with the community with the same respect and grace you would offer your friends and coworkers. + +## Anything else interesting you want to tell us? + +Library Science is so much more than the Dewey Decimal System (seriously, ask a librarian about Dewey for a juicy rant). RDF triples (for knowledge graphs) are queried using SPARQL (pronounced “sparkle”). An antelope can be a document. The correct way to write a date/time is ISO-8601. The oldest known table (of the spreadsheet variety) is from 5,000 years ago – record-keeping predates literature by a significant margin. Zip codes aren’t polygons – they don’t contain an area or have boundaries. Computers don’t always return 0.3 when asked to add 0.1 + 0.2. SQL was the sequel to SQUARE. Before computers, people programmed looms (weaving is binary). What? You asked!! On a more serious note – data teams: start hiring librarians. No, seriously. No degree could have prepared me better for what I do in the data field than my M.S. in Library & Information Science. I promise, you want the skillset & mindset that a librarian will bring to your team. diff --git a/website/docs/community/spotlight/meagan-palmer.md b/website/docs/community/spotlight/meagan-palmer.md index ff45a3d6b7d..fffc2a6e0d6 100644 --- a/website/docs/community/spotlight/meagan-palmer.md +++ b/website/docs/community/spotlight/meagan-palmer.md @@ -3,8 +3,11 @@ id: meagan-palmer title: Meagan Palmer description: | I first started using dbt in 2016 or 2017 (I can't remember exactly). Since then, I have moved into data and analytics consulting and have dipped in and out of the dbt Community. + Late last year, I started leading dbt Cloud training courses and spending more time in the dbt Slack. + In consulting, I get to use a range of stacks. I've used dbt with Redshift, Snowflake, and Databricks in production settings with a range of loaders & reporting tools, and I've been enjoying using DuckDB for some home experimentation. + To share some of the experiences, I regularly post to LinkedIn and have recently started Analytics Engineering Today, a twice monthly newsletter about dbt in practice. image: /img/community/spotlight/Meagan-Palmer.png pronouns: she/her @@ -14,9 +17,10 @@ companyName: Altis Consulting socialLinks: - name: LinkedIn link: https://www.linkedin.com/in/meaganpalmer/ -dateCreated: 2024-07-29 +dateCreated: 2024-11-04 hide_table_of_contents: true -communityAward: false +communityAward: true +communityAwardYear: 2024 --- ## When did you join the dbt community and in what way has it impacted your career? @@ -27,9 +31,9 @@ I was fortunate that Jon Bradley at Nearmap had the vision to engage the then Fi Being in Australia, I often see replies from Jeremy Yeo to people in the dbt Slack. His clarity of communication is impressive. -For growth, I'm hoping that others can benefit from the wide range of experience I have. My newsletter, Analytics Engineering Today on LinkedIn aims to upskill the dbt Community and shed some light on some useful features that might not be well known. +For growth, I'm hoping that others can benefit from the wide range of experience I have. My LinkedIn Newsletter, Analytics Engineering Today aims to upskill the dbt Community and shed some light on some useful features that might not be well known. -I'll be at Coalesce and am doing some webinars/events later in the year. Come say hi, I love talking dbt and analytics engineering with people. +I was at Coalesce Onlineand am doing some webinars/events later in the year. Come say hi, I love talking dbt and analytics engineering with people. ## What have you learned from community members? What do you hope others can learn from you? diff --git a/website/docs/community/spotlight/mike-stanley.md b/website/docs/community/spotlight/mike-stanley.md new file mode 100644 index 00000000000..853b0e2f704 --- /dev/null +++ b/website/docs/community/spotlight/mike-stanley.md @@ -0,0 +1,30 @@ +--- +id: mike-stanley +title: Mike Stanley +description: | + I've split my time between financial services and the video games industry. Back when I wrote code every day, I worked in marketing analytics and marketing technology. I've been in the dbt community for about two years. I haven't authored any extensions to dbt's adapters yet but I've given feedback on proposed changes! +image: /img/community/spotlight/mike-stanley.jpg +pronouns: he/him +location: London, United Kingdom +jobTitle: Manager, Data +companyName: Freetrade +socialLinks: + - name: LinkedIn + link: https://www.linkedin.com/in/mike-stanley-31616994/ +dateCreated: 2024-11-05 +hide_table_of_contents: true +communityAward: true +communityAwardYear: 2024 +--- + +## When did you join the dbt community and in what way has it impacted your career? + +I've led data teams for almost ten years now and it can be a challenge to stay current on new technology when you're spending a lot of time on leadership and management. I joined the dbt Community to learn how to get more from it, how to solve problems and use more advanced features, and to learn best practices. I find that answering questions is the way I learn best, so I started helping people! + +## Which dbt Community leader do you identify with? How are you looking to grow your leadership in the dbt community? + +I hope that we can all continue to level up our dbt skills and leave the data environments that we work in better than we found them. + +## What have you learned from community members? What do you hope others can learn from you? + +Everything! People share so much about their best practices and when and how to deviate from them, interesting extensions to dbt that they've worked on, common bugs and problems, and how to think in a "dbtish" way. I couldn't have learned any of that without the community! diff --git a/website/docs/community/spotlight/original-dbt-athena-maintainers.md b/website/docs/community/spotlight/original-dbt-athena-maintainers.md new file mode 100644 index 00000000000..b3728a71d63 --- /dev/null +++ b/website/docs/community/spotlight/original-dbt-athena-maintainers.md @@ -0,0 +1,44 @@ +--- +id: original-dbt-athena-maintainers +title: The Original dbt-athena Maintainers +description: | + The original dbt-athena Maintainers is a group of 5 people—Jérémy Guiselin, Mattia, Jesse Dobbelaere, Serhii Dimchenko, and Nicola Corda—who met via dbt Slack in the #db-athena channel, with the aim to make make dbt-athena a production-ready adapter. + + In the first periods, Winter 2022 and Spring 2023, we focused on contributing directly to the adapter, adding relevant features like Iceberg and Lake Formation support, and stabilizing some internal behaviour. + + On a second iteration our role was triaging, providing community support and bug fixing. We encouraged community members to make their first contributions, and helped them to merge their PRs. +image: /img/community/spotlight/dbt-athena-groupheadshot.jpg +location: Europe +jobTitle: A group of data-engineers +companyName: Mix of companies +organization: dbt-athena (since November 2022) +socialLinks: + - name: Jérémy's LinkedIn + link: https://www.linkedin.com/in/jrmyy/ + - name: Mattia's LinkedIn + link: https://www.linkedin.com/in/mattia-sappa/ + - name: Jesse's LinkedIn + link: https://www.linkedin.com/in/dobbelaerejesse/ + - name: Serhii's LinkedIn + link: https://www.linkedin.com/in/serhii-dimchenko-075b3061/ + - name: Nicola's LinkedIn + link: https://www.linkedin.com/in/nicolacorda/ +dateCreated: 2024-11-06 +hide_table_of_contents: true +communityAward: true +communityAwardYear: 2024 +--- + +## When did you join the dbt community and in what way has it impacted your career? + +The dbt community allowed the dbt-athena maintainers to meet each other, and share the common goal of making the dbt-athena adapter production-ready. + +## Which dbt Community leader do you identify with? How are you looking to grow your leadership in the dbt community? + +As we grow, we are looking to embody democratic leadership. + +## What have you learned from community members? What do you hope others can learn from you? + +We learned that the power of the community was endless. People started to share best practises, and some of the best practises were incorporated directly in dbt-athena, allowing people to run the adapter smoothly in their production environment. +We reached a point where people started to ask advice for their AWS architecture, which we found pretty awesome. + diff --git a/website/docs/community/spotlight/ruth-onyekwe.md b/website/docs/community/spotlight/ruth-onyekwe.md new file mode 100644 index 00000000000..cf07e98a4f7 --- /dev/null +++ b/website/docs/community/spotlight/ruth-onyekwe.md @@ -0,0 +1,31 @@ +--- +id: ruth-onyekwe +title: Ruth Onyekwe +description: | + I've been working in the world of Data Analytics for over 5 years and have been part of the dbt community for the last 4. With a background in International Business and Digital Marketing, I experienced first hand the need for reliable data to fuel business decisions. This inspired a career move into the technology space to be able to work with the tools and the people that were facilitating this process. Today I am leading teams to deliver data modernization projects, as well as helping grow the analytics arm of my company on a day to day basis. I also have the privilege of organising the dbt Meetups in Barcelona, Spain - and am excited to continue to grow the community across Europe. +image: /img/community/spotlight/ruth-onyekwe.jpeg +pronouns: she/her +location: Madrid, Spain +jobTitle: Data Analytics Manager +companyName: Spaulding Ridge +socialLinks: + - name: LinkedIn + link: https://www.linkedin.com/in/ruth-onyekwe/ +dateCreated: 2024-11-07 +hide_table_of_contents: true +communityAward: true +communityAwardYear: 2024 +--- + +## When did you join the dbt community and in what way has it impacted your career? + +I joined the dbt community in 2021, after meeting dbt Labs reps at a conference. Through partnering with dbt Labs and learning the technology, we (Spaulding Ridge) were able to open a whole new offering in our service catalogue, and meet the growing needs of our customers. + +## Which dbt Community leader do you identify with? How are you looking to grow your leadership in the dbt community? + +I identify with the transparent leaders - those willing to share their learnings, knowledge, and experiences. I want to encourage other dbt enthusiasts to stretch themselves professionally and actively participate in the analytics community. + +## What have you learned from community members? What do you hope others can learn from you? + +I've learnt that most of us working in data have experienced the same struggles, be it searching for the best testing frameworks, or deciding how to build optimised and scalable models, or searching for the answers to non-technical questions like how to best organise teams or how to communicate with business stakeholders and translate their needs - we're all faced with the same dilemmas. And the great thing I've learned being in the dbt community, is that if you're brave enough to share your stories, you'll connect with someone who has already gone through those experiences, and can help you reach a solution a lot faster than if you tried to start from scratch. + diff --git a/website/docs/docs/build/dimensions.md b/website/docs/docs/build/dimensions.md index 170626ee7cc..5026f4c45cd 100644 --- a/website/docs/docs/build/dimensions.md +++ b/website/docs/docs/build/dimensions.md @@ -67,7 +67,7 @@ semantic_models: type: categorical ``` -Dimensions are bound to the primary entity of the semantic model they are defined in. For example the dimensoin `type` is defined in a model that has `transaction` as a primary entity. `type` is scoped to the `transaction` entity, and to reference this dimension you would use the fully qualified dimension name i.e `transaction__type`. +Dimensions are bound to the primary entity of the semantic model they are defined in. For example the dimension `type` is defined in a model that has `transaction` as a primary entity. `type` is scoped to the `transaction` entity, and to reference this dimension you would use the fully qualified dimension name i.e `transaction__type`. MetricFlow requires that all semantic models have a primary entity. This is to guarantee unique dimension names. If your data source doesn't have a primary entity, you need to assign the entity a name using the `primary_entity` key. It doesn't necessarily have to map to a column in that table and assigning the name doesn't affect query generation. We recommend making these "virtual primary entities" unique across your semantic model. An example of defining a primary entity for a data source that doesn't have a primary entity column is below: diff --git a/website/docs/docs/build/environment-variables.md b/website/docs/docs/build/environment-variables.md index c26425401a7..b87786ac596 100644 --- a/website/docs/docs/build/environment-variables.md +++ b/website/docs/docs/build/environment-variables.md @@ -102,7 +102,7 @@ dbt Cloud has a number of pre-defined variables built in. Variables are set auto The following environment variable is set automatically for the dbt Cloud IDE: - `DBT_CLOUD_GIT_BRANCH` — Provides the development Git branch name in the [dbt Cloud IDE](/docs/cloud/dbt-cloud-ide/develop-in-the-cloud). - - Available in dbt v 1.6 and later. + - Available in dbt v1.6 and later. - The variable changes when the branch is changed. - Doesn't require restarting the IDE after a branch change. - Currently not available in the [dbt Cloud CLI](/docs/cloud/cloud-cli-installation). diff --git a/website/docs/docs/build/incremental-microbatch.md b/website/docs/docs/build/incremental-microbatch.md index 38e0edddfd6..e1c39e6ae47 100644 --- a/website/docs/docs/build/incremental-microbatch.md +++ b/website/docs/docs/build/incremental-microbatch.md @@ -8,7 +8,7 @@ id: "incremental-microbatch" :::info Microbatch -The `microbatch` strategy is available in beta for [dbt Cloud Versionless](/docs/dbt-versions/upgrade-dbt-version-in-cloud#versionless) and dbt Core v1.9. We have been developing it behind a flag to prevent unintended interactions with existing custom incremental strategies. To enable this feature, set the environment variable `DBT_EXPERIMENTAL_MICROBATCH` to `True` in your dbt Cloud environments or wherever you're running dbt Core. +The `microbatch` strategy is available in beta for [dbt Cloud Versionless](/docs/dbt-versions/upgrade-dbt-version-in-cloud#versionless) and dbt Core v1.9. We have been developing it behind a flag to prevent unintended interactions with existing custom incremental strategies. To enable this feature, [set the environment variable](/docs/build/environment-variables#setting-and-overriding-environment-variables) `DBT_EXPERIMENTAL_MICROBATCH` to `True` in your dbt Cloud environments or wherever you're running dbt Core. Read and participate in the discussion: [dbt-core#10672](https://github.com/dbt-labs/dbt-core/discussions/10672) @@ -165,7 +165,7 @@ Several configurations are relevant to microbatch models, and some are required: | `event_time` | Column (required) | The column indicating "at what time did the row occur." Required for your microbatch model and any direct parents that should be filtered. | N/A | | `begin` | Date (required) | The "beginning of time" for the microbatch model. This is the starting point for any initial or full-refresh builds. For example, a daily-grain microbatch model run on `2024-10-01` with `begin = '2023-10-01` will process 366 batches (it's a leap year!) plus the batch for "today." | N/A | | `batch_size` | String (required) | The granularity of your batches. Supported values are `hour`, `day`, `month`, and `year` | N/A | -| `lookback` | Integer (optional) | Process X batches prior to the latest bookmark to capture late-arriving records. | `0` | +| `lookback` | Integer (optional) | Process X batches prior to the latest bookmark to capture late-arriving records. | `1` | @@ -192,12 +192,15 @@ During standard incremental runs, dbt will process batches according to the curr Whether to fix erroneous source data or retroactively apply a change in business logic, you may need to reprocess a large amount of historical data. -Backfilling a microbatch model is as simple as selecting it to run or build, and specifying a "start" and "end" for `event_time`. As always, dbt will process the batches between the start and end as independent queries. +Backfilling a microbatch model is as simple as selecting it to run or build, and specifying a "start" and "end" for `event_time`. Note that `--event-time-start` and `--event-time-end` are mutually necessary, meaning that if you specify one, you must specify the other. + +As always, dbt will process the batches between the start and end as independent queries. ```bash dbt run --event-time-start "2024-09-01" --event-time-end "2024-09-04" ``` + ### Retry diff --git a/website/docs/docs/build/incremental-models.md b/website/docs/docs/build/incremental-models.md index 2968496290a..a56246addf3 100644 --- a/website/docs/docs/build/incremental-models.md +++ b/website/docs/docs/build/incremental-models.md @@ -212,11 +212,11 @@ Currently, `on_schema_change` only tracks top-level column changes. It does not ### Default behavior -This is the behavior if `on_schema_change: ignore`, which is set by default, and on older versions of dbt. +This is the behavior of `on_schema_change: ignore`, which is set by default. If you add a column to your incremental model, and execute a `dbt run`, this column will _not_ appear in your target table. -Similarly, if you remove a column from your incremental model, and execute a `dbt run`, this column will _not_ be removed from your target table. +If you remove a column from your incremental model and execute a `dbt run`, `dbt run` will fail. Instead, whenever the logic of your incremental changes, execute a full-refresh run of both your incremental model and any downstream models. diff --git a/website/docs/docs/build/incremental-strategy.md b/website/docs/docs/build/incremental-strategy.md index 30de135b09b..1fb35ba637c 100644 --- a/website/docs/docs/build/incremental-strategy.md +++ b/website/docs/docs/build/incremental-strategy.md @@ -27,7 +27,7 @@ Click the name of the adapter in the below table for more information about supp | Data platform adapter | `append` | `merge` | `delete+insert` | `insert_overwrite` | `microbatch` | |-----------------------|:--------:|:-------:|:---------------:|:------------------:|:-------------------:| | [dbt-postgres](/reference/resource-configs/postgres-configs#incremental-materialization-strategies) | ✅ | ✅ | ✅ | | ✅ | -| [dbt-redshift](/reference/resource-configs/redshift-configs#incremental-materialization-strategies) | ✅ | ✅ | ✅ | | | +| [dbt-redshift](/reference/resource-configs/redshift-configs#incremental-materialization-strategies) | ✅ | ✅ | ✅ | | ✅ | | [dbt-bigquery](/reference/resource-configs/bigquery-configs#merge-behavior-incremental-models) | | ✅ | | ✅ | ✅ | | [dbt-spark](/reference/resource-configs/spark-configs#incremental-models) | ✅ | ✅ | | ✅ | ✅ | | [dbt-databricks](/reference/resource-configs/databricks-configs#incremental-models) | ✅ | ✅ | | ✅ | | diff --git a/website/docs/docs/build/measures.md b/website/docs/docs/build/measures.md index 977b630fada..d60aa3f7e21 100644 --- a/website/docs/docs/build/measures.md +++ b/website/docs/docs/build/measures.md @@ -200,7 +200,7 @@ Parameters under the `non_additive_dimension` will specify dimensions that the m ```yaml semantic_models: - - name: subscription_id + - name: subscriptions description: A subscription table with one row per date for each active user and their subscription plans. model: ref('your_schema.subscription_table') defaults: @@ -209,7 +209,7 @@ semantic_models: entities: - name: user_id type: foreign - primary_entity: subscription_table + primary_entity: subscription dimensions: - name: subscription_date @@ -224,21 +224,21 @@ semantic_models: expr: user_id agg: count_distinct non_additive_dimension: - name: metric_time + name: subscription_date window_choice: max - name: mrr description: Aggregate by summing all users' active subscription plans expr: subscription_value agg: sum non_additive_dimension: - name: metric_time + name: subscription_date window_choice: max - name: user_mrr description: Group by user_id to achieve each user's MRR expr: subscription_value agg: sum non_additive_dimension: - name: metric_time + name: subscription_date window_choice: max window_groupings: - user_id @@ -255,15 +255,15 @@ We can query the semi-additive metrics using the following syntax: For dbt Cloud: ```bash -dbt sl query --metrics mrr_by_end_of_month --group-by metric_time__month --order metric_time__month -dbt sl query --metrics mrr_by_end_of_month --group-by metric_time__week --order metric_time__week +dbt sl query --metrics mrr_by_end_of_month --group-by subscription__subscription_date__month --order subscription__subscription_date__month +dbt sl query --metrics mrr_by_end_of_month --group-by subscription__subscription_date__week --order subscription__subscription_date__week ``` For dbt Core: ```bash -mf query --metrics mrr_by_end_of_month --group-by metric_time__month --order metric_time__month -mf query --metrics mrr_by_end_of_month --group-by metric_time__week --order metric_time__week +mf query --metrics mrr_by_end_of_month --group-by subscription__subscription_date__month --order subscription__subscription_date__month +mf query --metrics mrr_by_end_of_month --group-by subscription__subscription_date__week --order subscription__subscription_date__week ``` import SetUpPages from '/snippets/_metrics-dependencies.md'; diff --git a/website/docs/docs/build/metricflow-commands.md b/website/docs/docs/build/metricflow-commands.md index 466a61cc5bc..2da5618b86f 100644 --- a/website/docs/docs/build/metricflow-commands.md +++ b/website/docs/docs/build/metricflow-commands.md @@ -259,7 +259,7 @@ Create a new query with MetricFlow and execute it against your data platform. Th ```bash dbt sl query --metrics --group-by # In dbt Cloud -dbt sl query --saved-query # In dbt Cloud CLI +dbt sl query --saved-query # In dbt Cloud mf query --metrics --group-by # In dbt Core diff --git a/website/docs/docs/build/metricflow-time-spine.md b/website/docs/docs/build/metricflow-time-spine.md index 9932a35839c..5f16af38023 100644 --- a/website/docs/docs/build/metricflow-time-spine.md +++ b/website/docs/docs/build/metricflow-time-spine.md @@ -124,42 +124,6 @@ For an example project, refer to our [Jaffle shop](https://github.com/dbt-labs/j - - -```sql -{{ - config( - materialized = 'table', - ) -}} - -with days as ( - - {{ - dbt_utils.date_spine( - 'day', - "to_date('01/01/2000','mm/dd/yyyy')", - "to_date('01/01/2025','mm/dd/yyyy')" - ) - }} - -), - -final as ( - select cast(date_day as date) as date_day - from days -) - -select * from final --- filter the time spine to a specific range -where date_day > dateadd(year, -4, current_timestamp()) -and date_hour < dateadd(day, 30, current_timestamp()) -``` - - - - - ```sql {{ config( @@ -186,45 +150,12 @@ final as ( select * from final where date_day > dateadd(year, -4, current_timestamp()) -and date_hour < dateadd(day, 30, current_timestamp()) +and date_day < dateadd(day, 30, current_timestamp()) ``` - - ### Daily (BigQuery) Use this model if you're using BigQuery. BigQuery supports `DATE()` instead of `TO_DATE()`: - - - - -```sql -{{config(materialized='table')}} -with days as ( - {{dbt_utils.date_spine( - 'day', - "DATE(2000,01,01)", - "DATE(2025,01,01)" - ) - }} -), - -final as ( - select cast(date_day as date) as date_day - from days -) - -select * -from final --- filter the time spine to a specific range -where date_day > dateadd(year, -4, current_timestamp()) -and date_hour < dateadd(day, 30, current_timestamp()) -``` - - - - - @@ -249,11 +180,10 @@ select * from final -- filter the time spine to a specific range where date_day > dateadd(year, -4, current_timestamp()) -and date_hour < dateadd(day, 30, current_timestamp()) +and date_day < dateadd(day, 30, current_timestamp()) ``` - @@ -306,42 +236,6 @@ To create this table, you need to create a model in your dbt project called `met ### Daily - - - -```sql -{{ - config( - materialized = 'table', - ) -}} - -with days as ( - - {{ - dbt_utils.date_spine( - 'day', - "to_date('01/01/2000','mm/dd/yyyy')", - "to_date('01/01/2025','mm/dd/yyyy')" - ) - }} - -), - -final as ( - select cast(date_day as date) as date_day - from days -) - -select * from final --- filter the time spine to a specific range -where date_day > dateadd(year, -4, current_timestamp()) -and date_hour < dateadd(day, 30, current_timestamp()) -``` - - - - @@ -371,47 +265,15 @@ final as ( select * from final where date_day > dateadd(year, -4, current_timestamp()) -and date_hour < dateadd(day, 30, current_timestamp()) +and date_day < dateadd(day, 30, current_timestamp()) ``` - ### Daily (BigQuery) Use this model if you're using BigQuery. BigQuery supports `DATE()` instead of `TO_DATE()`: - - - - -```sql -{{config(materialized='table')}} -with days as ( - {{dbt_utils.date_spine( - 'day', - "DATE(2000,01,01)", - "DATE(2025,01,01)" - ) - }} -), - -final as ( - select cast(date_day as date) as date_day - from days -) - -select * -from final --- filter the time spine to a specific range -where date_day > dateadd(year, -4, current_timestamp()) -and date_hour < dateadd(day, 30, current_timestamp()) -``` - - - - - ```sql @@ -434,11 +296,10 @@ select * from final -- filter the time spine to a specific range where date_day > dateadd(year, -4, current_timestamp()) -and date_hour < dateadd(day, 30, current_timestamp()) +and date_day < dateadd(day, 30, current_timestamp()) ``` - You only need to include the `date_day` column in the table. MetricFlow can handle broader levels of detail, but finer grains are only supported in versions 1.9 and higher. diff --git a/website/docs/docs/build/packages.md b/website/docs/docs/build/packages.md index b4880d88794..49cd7e00b1c 100644 --- a/website/docs/docs/build/packages.md +++ b/website/docs/docs/build/packages.md @@ -83,11 +83,7 @@ packages: version: [">=0.7.0", "<0.8.0"] ``` - - -Beginning in v1.7, `dbt deps` "pins" each package by default. See ["Pinning packages"](#pinning-packages) for details. - - +`dbt deps` "pins" each package by default. See ["Pinning packages"](#pinning-packages) for details. Where possible, we recommend installing packages via dbt Hub, since this allows dbt to handle duplicate dependencies. This is helpful in situations such as: * Your project uses both the dbt-utils and Snowplow packages, and the Snowplow package _also_ uses the dbt-utils package. @@ -146,18 +142,8 @@ packages: revision: 4e28d6da126e2940d17f697de783a717f2503188 ``` - - -We **strongly recommend** ["pinning" your packages](#pinning-packages) to a specific release by specifying a release name. - - - - - By default, `dbt deps` "pins" each package. See ["Pinning packages"](#pinning-packages) for details. - - ### Internally hosted tarball URL Some organizations have security requirements to pull resources only from internal services. To address the need to install packages from hosted environments such as Artifactory or cloud storage buckets, dbt Core enables you to install packages from internally-hosted tarball URLs. @@ -319,18 +305,6 @@ When you remove a package from your `packages.yml` file, it isn't automatically ### Pinning packages - - -We **strongly recommend** "pinning" your package to a specific release by specifying a tagged release name or a specific commit hash. - -If you do not provide a revision, or if you use the main branch, then any updates to the package will be incorporated into your project the next time you run `dbt deps`. While we generally try to avoid making breaking changes to these packages, they are sometimes unavoidable. Pinning a package revision helps prevent your code from changing without your explicit approval. - -To find the latest release for a package, navigate to the `Releases` tab in the relevant GitHub repository. For example, you can find all of the releases for the dbt-utils package [here](https://github.com/dbt-labs/dbt-utils/releases). - - - - - Beginning with v1.7, running [`dbt deps`](/reference/commands/deps) "pins" each package by creating or updating the `package-lock.yml` file in the _project_root_ where `packages.yml` is recorded. - The `package-lock.yml` file contains a record of all packages installed. @@ -338,8 +312,6 @@ Beginning with v1.7, running [`dbt deps`](/reference/commands/deps) "pins" each For example, if you use a branch name, the `package-lock.yml` file pins to the head commit. If you use a version range, it pins to the latest release. In either case, subsequent commits or versions will **not** be installed. To get new commits or versions, run `dbt deps --upgrade` or add `package-lock.yml` to your .gitignore file. - - As of v0.14.0, dbt will warn you if you install a package using the `git` syntax without specifying a revision (see below). ### Configuring packages diff --git a/website/docs/docs/build/saved-queries.md b/website/docs/docs/build/saved-queries.md index 649885f9506..ed56d13dcc9 100644 --- a/website/docs/docs/build/saved-queries.md +++ b/website/docs/docs/build/saved-queries.md @@ -154,8 +154,6 @@ saved_queries: - - #### Project-level saved queries To enable saved queries at the project level, you can set the `saved-queries` configuration in the [`dbt_project.yml` file](/reference/dbt_project.yml). This saves you time in configuring saved queries in each file: @@ -171,7 +169,6 @@ saved-queries: For more information on `dbt_project.yml` and config naming conventions, see the [dbt_project.yml reference page](/reference/dbt_project.yml#naming-convention). - To build `saved_queries`, use the [`--resource-type` flag](/reference/global-configs/resource-type) and run the command `dbt build --resource-type saved_query`. diff --git a/website/docs/docs/build/semantic-models.md b/website/docs/docs/build/semantic-models.md index d683d7cd020..609d7f1ff8d 100644 --- a/website/docs/docs/build/semantic-models.md +++ b/website/docs/docs/build/semantic-models.md @@ -119,8 +119,6 @@ semantic_models: type: categorical ``` - - Semantic models support [`meta`](/reference/resource-configs/meta), [`group`](/reference/resource-configs/group), and [`enabled`](/reference/resource-configs/enabled) [`config`](/reference/resource-properties/config) property in either the schema file or at the project level: - Semantic model config in `models/semantic.yml`: @@ -148,8 +146,6 @@ Semantic models support [`meta`](/reference/resource-configs/meta), [`group`](/r For more information on `dbt_project.yml` and config naming conventions, see the [dbt_project.yml reference page](/reference/dbt_project.yml#naming-convention). - - ### Name Define the name of the semantic model. You must define a unique name for the semantic model. The semantic graph will use this name to identify the model, and you can update it at any time. Avoid using double underscores (\_\_) in the name as they're not supported. diff --git a/website/docs/docs/build/snapshots.md b/website/docs/docs/build/snapshots.md index f5321aa626a..dd7a44fd48c 100644 --- a/website/docs/docs/build/snapshots.md +++ b/website/docs/docs/build/snapshots.md @@ -390,29 +390,6 @@ snapshots: -## Snapshot query best practices - -This section outlines some best practices for writing snapshot queries: - -- #### Snapshot source data - Your models should then select from these snapshots, treating them like regular data sources. As much as possible, snapshot your source data in its raw form and use downstream models to clean up the data - -- #### Use the `source` function in your query - This helps when understanding data lineage in your project. - -- #### Include as many columns as possible - In fact, go for `select *` if performance permits! Even if a column doesn't feel useful at the moment, it might be better to snapshot it in case it becomes useful – after all, you won't be able to recreate the column later. - -- #### Avoid joins in your snapshot query - Joins can make it difficult to build a reliable `updated_at` timestamp. Instead, snapshot the two tables separately, and join them in downstream models. - -- #### Limit the amount of transformation in your query - If you apply business logic in a snapshot query, and this logic changes in the future, it can be impossible (or, at least, very difficult) to apply the change in logic to your snapshots. - -Basically – keep your query as simple as possible! Some reasonable exceptions to these recommendations include: -* Selecting specific columns if the table is wide. -* Doing light transformation to get data into a reasonable shape, for example, unpacking a blob to flatten your source data into columns. - ## Snapshot meta-fields Snapshot tables will be created as a clone of your source dataset, plus some additional meta-fields*. @@ -498,7 +475,9 @@ Snapshot results: -This section is for users on dbt versions 1.8 and earlier. To configure snapshots in versions 1.9 and later, refer to [Configuring snapshots](#configuring-snapshots). The latest versions use an updated snapshot configuration syntax that optimizes performance. +For information about configuring snapshots in dbt versions 1.8 and earlier, select **1.8** from the documentation version picker, and it will appear in this section. + +To configure snapshots in versions 1.9 and later, refer to [Configuring snapshots](#configuring-snapshots). The latest versions use an updated snapshot configuration syntax that optimizes performance. diff --git a/website/docs/docs/cloud/account-settings.md b/website/docs/docs/cloud/account-settings.md index 3b2632c8747..aaad9b28e5c 100644 --- a/website/docs/docs/cloud/account-settings.md +++ b/website/docs/docs/cloud/account-settings.md @@ -45,6 +45,6 @@ To use, select the **Enable partial parsing between deployment runs** option fro To use Advanced CI features, your dbt Cloud account must have access to them. Ask your dbt Cloud administrator to enable Advanced CI features on your account, which they can do by choosing the **Enable account access to Advanced CI** option from the account settings. -Once enabled, the **Run compare changes** option becomes available in the CI job settings for you to select. +Once enabled, the **dbt compare** option becomes available in the CI job settings for you to select. - \ No newline at end of file + diff --git a/website/docs/docs/cloud/connect-data-platform/about-connections.md b/website/docs/docs/cloud/connect-data-platform/about-connections.md index 6f2f140b724..6497e86de89 100644 --- a/website/docs/docs/cloud/connect-data-platform/about-connections.md +++ b/website/docs/docs/cloud/connect-data-platform/about-connections.md @@ -20,9 +20,12 @@ dbt Cloud can connect with a variety of data platform providers including: - [Starburst or Trino](/docs/cloud/connect-data-platform/connect-starburst-trino) - [Teradata](/docs/cloud/connect-data-platform/connect-teradata) -You can connect to your database in dbt Cloud by clicking the gear in the top right and selecting **Account Settings**. From the Account Settings page, click **+ New Project**. +To connect to your database in dbt Cloud: - +1. Click your account name at the bottom of the left-side menu and click **Account settings** +2. Select **Projects** from the top left, and from there click **New Project** + + These connection instructions provide the basic fields required for configuring a data platform connection in dbt Cloud. For more detailed guides, which include demo project data, read our [Quickstart guides](https://docs.getdbt.com/guides) @@ -41,7 +44,7 @@ Connections created with APIs before this change cannot be accessed with the [la Warehouse connections are an account-level resource. As such you can find them under **Accounts Settings** > **Connections**: - + Warehouse connections can be re-used across projects. If multiple projects all connect to the same warehouse, you should re-use the same connection to streamline your management operations. Connections are assigned to a project via an [environment](/docs/dbt-cloud-environments). @@ -85,7 +88,7 @@ Please consider the following actions, as the steps you take will depend on the - Normalization - - Undertsand how new connections should be created to avoid local overrides. If you currently use extended attributes to override the warehouse instance in your production environment - you should instead create a new connection for that instance, and wire your production environment to it, removing the need for the local overrides + - Understand how new connections should be created to avoid local overrides. If you currently use extended attributes to override the warehouse instance in your production environment - you should instead create a new connection for that instance, and wire your production environment to it, removing the need for the local overrides - Create new connections, update relevant environments to target these connections, removing now unecessary local overrides (which may not be all of them!) - Test the new wiring by triggering jobs or starting IDE sessions diff --git a/website/docs/docs/cloud/connect-data-platform/connect-amazon-athena.md b/website/docs/docs/cloud/connect-data-platform/connect-amazon-athena.md index 0b2f844ccac..f1009f61274 100644 --- a/website/docs/docs/cloud/connect-data-platform/connect-amazon-athena.md +++ b/website/docs/docs/cloud/connect-data-platform/connect-amazon-athena.md @@ -5,7 +5,7 @@ description: "Configure the Amazon Athena data platform connection in dbt Cloud. sidebar_label: "Connect Amazon Athena" --- -# Connect Amazon Athena +# Connect Amazon Athena Your environment(s) must be on ["Versionless"](/docs/dbt-versions/versionless-cloud) to use the Amazon Athena connection. diff --git a/website/docs/docs/cloud/connect-data-platform/connect-redshift-postgresql-alloydb.md b/website/docs/docs/cloud/connect-data-platform/connect-redshift-postgresql-alloydb.md index 4719095b87f..5be802cae77 100644 --- a/website/docs/docs/cloud/connect-data-platform/connect-redshift-postgresql-alloydb.md +++ b/website/docs/docs/cloud/connect-data-platform/connect-redshift-postgresql-alloydb.md @@ -118,7 +118,7 @@ Once the connection is saved, a public key will be generated and displayed for t To configure the SSH tunnel in dbt Cloud, you'll need to provide the hostname/IP of your bastion server, username, and port, of your choosing, that dbt Cloud will connect to. Review the following steps: - Verify the bastion server has its network security rules set up to accept connections from the [dbt Cloud IP addresses](/docs/cloud/about-cloud/access-regions-ip-addresses) on whatever port you configured. -- Set up the user account by using the bastion servers instance's CLI, The following example uses the username `dbtcloud:` +- Set up the user account by using the bastion servers instance's CLI, The following example uses the username `dbtcloud`: ```shell sudo groupadd dbtcloud diff --git a/website/docs/docs/cloud/git/authenticate-azure.md b/website/docs/docs/cloud/git/authenticate-azure.md index 42028bf993b..5278c134f72 100644 --- a/website/docs/docs/cloud/git/authenticate-azure.md +++ b/website/docs/docs/cloud/git/authenticate-azure.md @@ -13,9 +13,9 @@ If you use the dbt Cloud IDE or dbt Cloud CLI to collaborate on your team's Azur Connect your dbt Cloud profile to Azure DevOps using OAuth: -1. Click the gear icon at the top right and select **Profile settings**. -2. Click **Linked Accounts**. -3. Next to Azure DevOps, click **Link**. +1. Click your account name at the bottom of the left-side menu and click **Account settings** +2. Scroll down to **Your profile** and select **Personal profile**. +3. Go to the **Linked accounts** section in the middle of the page. 4. Once you're redirected to Azure DevOps, sign into your account. diff --git a/website/docs/docs/cloud/manage-access/about-access.md b/website/docs/docs/cloud/manage-access/about-access.md index b970b0d5763..b9d23b28add 100644 --- a/website/docs/docs/cloud/manage-access/about-access.md +++ b/website/docs/docs/cloud/manage-access/about-access.md @@ -79,7 +79,7 @@ Refer to [role-based access control](#role-based-access-control) for more inform ## Grant access -dbt Cloud users have both a license (individually or by group) and permissions (by group only) that determine what actions they can take. Licenses are account-wide, and permissions provide more granular access or restrictions to specific features. +dbt Cloud users have both a license (assigned to an individual user or by group membership) and permissions (by group membership only) that determine what actions they can take. Licenses are account-wide, and permissions provide more granular access or restrictions to specific features. ### Licenses diff --git a/website/docs/docs/cloud/manage-access/audit-log.md b/website/docs/docs/cloud/manage-access/audit-log.md index 4d07afe2cde..a7be86a7f99 100644 --- a/website/docs/docs/cloud/manage-access/audit-log.md +++ b/website/docs/docs/cloud/manage-access/audit-log.md @@ -62,7 +62,7 @@ The audit log supports various events for different objects in dbt Cloud. You wi | Auth Provider Changed | auth_provider.Changed | Authentication provider settings changed | | Credential Login Succeeded | auth.CredentialsLoginSucceeded | User successfully logged in with username and password | | SSO Login Failed | auth.SsoLoginFailed | User login via SSO failed | -| SSO Login Succeeded | auth.SsoLoginSucceeded | User successfully logged in via SSO +| SSO Login Succeeded | auth.SsoLoginSucceeded | User successfully logged in via SSO | ### Environment @@ -93,7 +93,7 @@ The audit log supports various events for different objects in dbt Cloud. You wi | ------------- | ----------------------------- | ------------------------------ | | Group Added | user_group.Added | New Group successfully created | | Group Changed | user_group.Changed | Group settings changed | -| Group Removed | user_group.Changed | Group successfully removed | +| Group Removed | user_group.Removed | Group successfully removed | ### User @@ -149,12 +149,65 @@ The audit log supports various events for different objects in dbt Cloud. You wi ### Credentials -| Event Name | Event Type | Description | -| -------------------------------- | ----------------------------- | -------------------------------- | +| Event Name | Event Type | Description | +| -------------------------------- | ----------------------------- | -----------------------| | Credentials Added to Project | credentials.Added | Project credentials added | | Credentials Changed in Project | credentials.Changed | Credentials changed in project | | Credentials Removed from Project | credentials.Removed | Credentials removed from project | + +### Git integration + +| Event Name | Event Type | Description | +| -------------------------------- | ----------------------------- | -----------------------| +| GitLab Application Changed | gitlab_application.changed | GitLab configuration in dbt Cloud changed | + +### Webhooks + +| Event Name | Event Type | Description | +| -------------------------------- | ----------------------------- | -----------------------| +| Webhook Subscriptions Added | webhook_subscription.added | New webhook configured in settings | +| Webhook Subscriptions Changed | webhook_subscription.changed | Existing webhook configuration altered | +| Webhook Subscriptions Removed | webhook_subscription.removed | Existing webhook deleted | + + +### Semantic Layer + +| Event Name | Event Type | Description | +| -------------------------------- | ----------------------------- | -----------------------| +| Semantic Layer Config Added | semantic_layer_config.added | Semantic Layer config added | +| Semantic Layer Config Changed | semantic_layer_config.changed | Semantic Layer config (not related to credentials) changed | +| Semantic Layer Config Removed | semantic_layer_config.removed | Semantic Layer config removed | +| Semantic Layer Credentials Added | semantic_layer_credentials.added | Semantic Layer credentials added | +| Semantic Layer Credentials Changed| semantic_layer_credentials.changed | Semantic Layer credentials changed. Does not trigger semantic_layer_config.changed| +| Semantic Layer Credentials Removed| semantic_layer_credentials.removed | Semantic Layer credentials removed | + +### Extended attributes + +| Event Name | Event Type | Description | +| -------------------------------- | ----------------------------- | -----------------------| +| Extended Attribute Added | extended_attributes.added | Extended attribute added to a project | +| Extended Attribute Changed | extended_attributes.changed | Extended attribute changed or removed | + + +### Account-scoped personal access token + +| Event Name | Event Type | Description | +| -------------------------------- | ----------------------------- | -----------------------| +| Account Scoped Personal Access Token Created | account_scoped_pat.created | An account-scoped PAT was created | +| Account Scoped Personal Access Token Deleted | account_scoped_pat.deleted | An account-scoped PAT was deleted | + +### IP restrictions + +| Event Name | Event Type | Description | +| -------------------------------- | ----------------------------- | -----------------------| +| IP Restrictions Toggled | ip_restrictions.toggled | IP restrictions feature enabled or disabled | +| IP Restrictions Rule Added | ip_restrictions.rule.added | IP restriction rule created | +| IP Restrictions Rule Changed | ip_restrictions.rule.changed | IP restriction rule edited | +| IP Restrictions Rule Removed | ip_restrictions.rule.removed | IP restriction rule deleted | + + + ## Searching the audit log You can search the audit log to find a specific event or actor, which is limited to the ones listed in [Events in audit log](#events-in-audit-log). The audit log successfully lists historical events spanning the last 90 days. You can search for an actor or event using the search bar, and then narrow your results using the time window. diff --git a/website/docs/docs/cloud/manage-access/cloud-seats-and-users.md b/website/docs/docs/cloud/manage-access/cloud-seats-and-users.md index 66d821b90d0..f814d58777a 100644 --- a/website/docs/docs/cloud/manage-access/cloud-seats-and-users.md +++ b/website/docs/docs/cloud/manage-access/cloud-seats-and-users.md @@ -124,9 +124,7 @@ Great work! After completing these steps, your dbt Cloud user count and billing ## Managing license types -Licenses can be assigned manually, or automatically based on IdP configuration -(enterprise only). By default, new users in an account will be assigned a -Developer license. +Licenses can be assigned to users individually or through group membership. To assign a license via group membership, you can manually add a user to a group during the invitation process or assign them to a group after they’ve enrolled in dbt Cloud. Alternatively, with [SSO configuration](/docs/cloud/manage-access/sso-overview) and [role-based access control](/docs/cloud/manage-access/about-user-access#role-based-access-control-) (Enterprise only), users can be automatically assigned to groups. By default, new users in an account are assigned a Developer license. ### Manual configuration @@ -142,16 +140,9 @@ change. -### Mapped configuration +### Mapped configuration -**Note:** This feature is only available on the Enterprise plan. - -If your account is connected to an Identity Provider (IdP) for [Single Sign -On](/docs/cloud/manage-access/sso-overview), you can automatically map IdP user -groups to specific license types in dbt Cloud. To configure license mappings, -navigate to the Account Settings > Team > License Mappings page. From -here, you can create or edit SSO mappings for both Read-Only and Developer -license types. +If your account is connected to an Identity Provider (IdP) for [Single Sign On](/docs/cloud/manage-access/sso-overview), you can automatically map IdP user groups to specific groups in dbt Cloud and assign license types to those groups. To configure license mappings, navigate to the **Account Settings** > **Groups & Licenses** > **License Mappings** page. From here, you can create or edit SSO mappings for both Read-Only and Developer license types. By default, all new members of a dbt Cloud account will be assigned a Developer license. To assign Read-Only licenses to certain groups of users, create a new diff --git a/website/docs/docs/cloud/manage-access/mfa.md b/website/docs/docs/cloud/manage-access/mfa.md index a06251e6468..bcddc04f072 100644 --- a/website/docs/docs/cloud/manage-access/mfa.md +++ b/website/docs/docs/cloud/manage-access/mfa.md @@ -7,6 +7,13 @@ sidebar: null # Multi-factor authentication +:::important + + +dbt Cloud enforces multi-factor authentication (MFA) for all users with username and password credentials. If MFA is not set up, you will see a notification bar prompting you to configure one of the supported methods when you log in. If you do not, you will have to configure MFA upon subsequent logins, or you will be unable to access dbt Cloud. + +::: + dbt Cloud provides multiple options for multi-factor authentication (MFA). MFA provides an additional layer of security to username and password logins for Developer and Team plan accounts. The available MFA methods are: - SMS verification code (US-based phone numbers only) diff --git a/website/docs/docs/collaborate/govern/project-dependencies.md b/website/docs/docs/collaborate/govern/project-dependencies.md index c054d1b27b7..7813e25efcb 100644 --- a/website/docs/docs/collaborate/govern/project-dependencies.md +++ b/website/docs/docs/collaborate/govern/project-dependencies.md @@ -18,7 +18,7 @@ This year, dbt Labs is introducing an expanded notion of `dependencies` across m ## Prerequisites - Available in [dbt Cloud Enterprise](https://www.getdbt.com/pricing). If you have an Enterprise account, you can unlock these features by designating a [public model](/docs/collaborate/govern/model-access) and adding a [cross-project ref](#how-to-write-cross-project-ref). -- Use a supported version of dbt (v1.6, v1.7, or go versionless with "[Versionless](/docs/dbt-versions/upgrade-dbt-version-in-cloud#versionless)") for both the upstream ("producer") project and the downstream ("consumer") project. +- Use a supported version of dbt (v1.6 or newer or go versionless with "[Versionless](/docs/dbt-versions/upgrade-dbt-version-in-cloud#versionless)") for both the upstream ("producer") project and the downstream ("consumer") project. - Define models in an upstream ("producer") project that are configured with [`access: public`](/reference/resource-configs/access). You need at least one successful job run after defining their `access`. - Define a deployment environment in the upstream ("producer") project [that is set to be your Production environment](/docs/deploy/deploy-environments#set-as-production-environment), and ensure it has at least one successful job run in that environment. - If the upstream project has a Staging environment, run a job in that Staging environment to ensure the downstream cross-project ref resolves. diff --git a/website/docs/docs/collaborate/project-recommendations.md b/website/docs/docs/collaborate/project-recommendations.md index 12007c6b88b..c9499579e54 100644 --- a/website/docs/docs/collaborate/project-recommendations.md +++ b/website/docs/docs/collaborate/project-recommendations.md @@ -20,7 +20,7 @@ The Recommendations overview page includes two top-level metrics measuring the t - **Model test coverage** — The percent of models in your project (models not from a package or imported via dbt Mesh) with at least one dbt test configured on them. - **Model documentation coverage** — The percent of models in your project (models not from a package or imported via dbt Mesh) with a description. - + ## List of rules The following table lists the rules currently defined in the `dbt_project_evaluator` [package](https://hub.getdbt.com/dbt-labs/dbt_project_evaluator/latest/). diff --git a/website/docs/docs/core/connect-data-platform/about-core-connections.md b/website/docs/docs/core/connect-data-platform/about-core-connections.md index 461aeea2e87..221f495d054 100644 --- a/website/docs/docs/core/connect-data-platform/about-core-connections.md +++ b/website/docs/docs/core/connect-data-platform/about-core-connections.md @@ -32,8 +32,6 @@ If you're using dbt from the command line (CLI), you'll need a profiles.yml file For detailed info, you can refer to the [Connection profiles](/docs/core/connect-data-platform/connection-profiles). - - ## Adapter features The following table lists the features available for adapters: @@ -55,5 +53,3 @@ For adapters that support it, you can partially build the catalog. This allows t ### Source freshness You can measure source freshness using the warehouse metadata tables on supported adapters. This allows for calculating source freshness without using the [`loaded_at_field`](/reference/resource-properties/freshness#loaded_at_field) and without querying the table directly. This is faster and more flexible (though it might sometimes be inaccurate, depending on how the warehouse tracks altered tables). You can override this with the `loaded_at_field` in the [source config](/reference/source-configs). If the adapter doesn't support this, you can still use the `loaded_at_field`. - - diff --git a/website/docs/docs/core/connect-data-platform/snowflake-setup.md b/website/docs/docs/core/connect-data-platform/snowflake-setup.md index 266840cafae..b692ba5c0d6 100644 --- a/website/docs/docs/core/connect-data-platform/snowflake-setup.md +++ b/website/docs/docs/core/connect-data-platform/snowflake-setup.md @@ -211,7 +211,7 @@ my-snowflake-db: -### SSO Authentication +### SSO authentication To use SSO authentication for Snowflake, omit a `password` and instead supply an `authenticator` config to your target. `authenticator` can be one of 'externalbrowser' or a valid Okta URL. @@ -332,7 +332,7 @@ my-snowflake-db: -### SSO Authentication +### SSO authentication To use SSO authentication for Snowflake, omit a `password` and instead supply an `authenticator` config to your target. `authenticator` can be one of 'externalbrowser' or a valid Okta URL. @@ -421,6 +421,30 @@ my-snowflake-db: Refer to the [Snowflake docs](https://docs.snowflake.com/en/sql-reference/parameters.html#label-allow-id-token) for info on how to enable this feature in your account. +### OAuth authorization + +To learn how to configure OAuth in Snowflake, refer to their [documentation](https://docs.snowflake.com/en/user-guide/oauth-snowflake-overview). Your Snowflake admin needs to generate an [OAuth token](https://community.snowflake.com/s/article/HOW-TO-OAUTH-TOKEN-GENERATION-USING-SNOWFLAKE-CUSTOM-OAUTH) for your configuration to work. + +Provide the OAUTH_REDIRECT_URI in Snowflake:`http://localhost:PORT_NUMBER`. For example, `http://localhost:8080`. + +Once your Snowflake admin has configured OAuth, add the following to your `profiles.yml` file: + +```yaml + +my-snowflake-db: + target: dev + outputs: + dev: + type: snowflake + account: [account id] + + # The following fields are retrieved from the Snowflake configuration + authenticator: oauth + oauth_client_id: [OAuth client id] + oauth_client_secret: [OAuth client secret] + token: [OAuth refresh token] +``` + ## Configurations The "base" configs for Snowflake targets are shown below. Note that you should also specify auth-related configs specific to the authentication method you are using as described above. diff --git a/website/docs/docs/core/connect-data-platform/spark-setup.md b/website/docs/docs/core/connect-data-platform/spark-setup.md index 01318211c8f..611642e91b7 100644 --- a/website/docs/docs/core/connect-data-platform/spark-setup.md +++ b/website/docs/docs/core/connect-data-platform/spark-setup.md @@ -197,14 +197,9 @@ connect_retries: 3 - - - - ### Server side configuration Spark can be customized using [Application Properties](https://spark.apache.org/docs/latest/configuration.html). Using these properties the execution can be customized, for example, to allocate more memory to the driver process. Also, the Spark SQL runtime can be set through these properties. For example, this allows the user to [set a Spark catalogs](https://spark.apache.org/docs/latest/configuration.html#spark-sql). - ## Caveats diff --git a/website/docs/docs/core/connect-data-platform/teradata-setup.md b/website/docs/docs/core/connect-data-platform/teradata-setup.md index 7b964b23b3d..f4ffbe37f35 100644 --- a/website/docs/docs/core/connect-data-platform/teradata-setup.md +++ b/website/docs/docs/core/connect-data-platform/teradata-setup.md @@ -8,7 +8,7 @@ meta: github_repo: 'Teradata/dbt-teradata' pypi_package: 'dbt-teradata' min_core_version: 'v0.21.0' - cloud_support: Not Supported + cloud_support: Supported min_supported_version: 'n/a' slack_channel_name: '#db-teradata' slack_channel_link: 'https://getdbt.slack.com/archives/C027B6BHMT3' @@ -18,6 +18,7 @@ meta: Some core functionality may be limited. If you're interested in contributing, check out the source code in the repository listed in the next section. + import SetUpPages from '/snippets/_setup-pages-intro.md'; @@ -26,17 +27,17 @@ import SetUpPages from '/snippets/_setup-pages-intro.md'; ## Python compatibility -| Plugin version | Python 3.9 | Python 3.10 | Python 3.11 | -| -------------- | ----------- | ----------- | ------------ | -|1.0.0.x | ✅ | ❌ | ❌ -|1.1.x.x | ✅ | ✅ | ❌ -|1.2.x.x | ✅ | ✅ | ❌ -|1.3.x.x | ✅ | ✅ | ❌ -|1.4.x.x | ✅ | ✅ | ✅ -|1.5.x | ✅ | ✅ | ✅ -|1.6.x | ✅ | ✅ | ✅ -|1.7.x | ✅ | ✅ | ✅ -|1.8.x | ✅ | ✅ | ✅ +| Plugin version | Python 3.9 | Python 3.10 | Python 3.11 | Python 3.12 | +|----------------|------------|-------------|-------------|-------------| +| 1.0.0.x | ✅ | ❌ | ❌ | ❌ | +| 1.1.x.x | ✅ | ✅ | ❌ | ❌ | +| 1.2.x.x | ✅ | ✅ | ❌ | ❌ | +| 1.3.x.x | ✅ | ✅ | ❌ | ❌ | +| 1.4.x.x | ✅ | ✅ | ✅ | ❌ | +| 1.5.x | ✅ | ✅ | ✅ | ❌ | +| 1.6.x | ✅ | ✅ | ✅ | ❌ | +| 1.7.x | ✅ | ✅ | ✅ | ❌ | +| 1.8.x | ✅ | ✅ | ✅ | ✅ | ## dbt dependent packages version compatibility @@ -46,6 +47,8 @@ import SetUpPages from '/snippets/_setup-pages-intro.md'; | 1.6.7 | 1.6.7 | 1.1.1 | 1.1.1 | | 1.7.x | 1.7.x | 1.1.1 | 1.1.1 | | 1.8.x | 1.8.x | 1.1.1 | 1.1.1 | +| 1.8.x | 1.8.x | 1.2.0 | 1.2.0 | +| 1.8.x | 1.8.x | 1.3.0 | 1.3.0 | ### Connecting to Teradata diff --git a/website/docs/docs/dbt-cloud-apis/discovery-api.md b/website/docs/docs/dbt-cloud-apis/discovery-api.md index ca84347ffad..db6819a5e09 100644 --- a/website/docs/docs/dbt-cloud-apis/discovery-api.md +++ b/website/docs/docs/dbt-cloud-apis/discovery-api.md @@ -32,7 +32,7 @@ Use the API to look at historical information like model build time to determine You can use, for example, the [model timing](/docs/deploy/run-visibility#model-timing) tab to help identify and optimize bottlenecks in model builds: - + @@ -50,7 +50,7 @@ Use the API to find and understand dbt assets in integrated tools using informat Data producers must manage and organize data for stakeholders, while data consumers need to quickly and confidently analyze data on a large scale to make informed decisions that improve business outcomes and reduce organizational overhead. The API is useful for discovery data experiences in catalogs, analytics, apps, and machine learning (ML) tools. It can help you understand the origin and meaning of datasets for your analysis. - + @@ -65,7 +65,6 @@ Use the API to review who developed the models and who uses them to help establi Use the API to review dataset changes and uses by examining exposures, lineage, and dependencies. From the investigation, you can learn how to define and build more effective dbt projects. For more details, refer to [Development](/docs/dbt-cloud-apis/discovery-use-cases-and-examples#development). - diff --git a/website/docs/docs/dbt-cloud-apis/discovery-use-cases-and-examples.md b/website/docs/docs/dbt-cloud-apis/discovery-use-cases-and-examples.md index b99853cd547..e095374343f 100644 --- a/website/docs/docs/dbt-cloud-apis/discovery-use-cases-and-examples.md +++ b/website/docs/docs/dbt-cloud-apis/discovery-use-cases-and-examples.md @@ -25,7 +25,7 @@ For performance use cases, people typically query the historical or latest appli It’s helpful to understand how long it takes to build models (tables) and tests to execute during a dbt run. Longer model build times result in higher infrastructure costs and fresh data arriving later to stakeholders. Analyses like these can be in observability tools or ad-hoc queries, like in a notebook. - +
Example query with code diff --git a/website/docs/docs/dbt-cloud-apis/sl-jdbc.md b/website/docs/docs/dbt-cloud-apis/sl-jdbc.md index 3a9832dd706..d9ce3bf4fd1 100644 --- a/website/docs/docs/dbt-cloud-apis/sl-jdbc.md +++ b/website/docs/docs/dbt-cloud-apis/sl-jdbc.md @@ -56,7 +56,7 @@ The Semantic Layer JDBC API has built-in metadata calls which can provide a user Expand the following toggles for examples and metadata commands: - + You can use this query to fetch all defined metrics in your dbt project: @@ -65,9 +65,9 @@ select * from {{ semantic_layer.metrics() }} ``` - + - + You can use this query to fetch all dimensions for a metric. @@ -77,9 +77,9 @@ Note, metrics is a required argument that lists one or multiple metrics in it. select * from {{ semantic_layer.dimensions(metrics=['food_order_amount'])}} ``` - + - + You can use this query to fetch dimension values for one or multiple metrics and a single dimension. @@ -89,9 +89,9 @@ Note, metrics is a required argument that lists one or multiple metrics, and a s select * from {{ semantic_layer.dimension_values(metrics=['food_order_amount'], group_by=['customer__customer_name'])}} ``` - + - + You can use this query to fetch queryable granularities for a list of metrics. @@ -103,9 +103,9 @@ select * from {{ semantic_layer.queryable_granularities(metrics=['food_order_amount', 'order_gross_profit'])}} ``` - + - + You can use this query to fetch available metrics given dimensions. This command is essentially the opposite of getting dimensions given a list of metrics. @@ -117,9 +117,9 @@ select * from {{ }} ``` - + - + You can use this example query to fetch available granularities for all time dimensions (the similar queryable granularities API call only returns granularities for the primary time dimensions for metrics). @@ -133,9 +133,9 @@ select NAME, QUERYABLE_GRANULARITIES from {{ }} ``` - + - + It may be useful in your application to expose the names of the time dimensions that represent metric_time or the common thread across all metrics. @@ -147,9 +147,44 @@ select * from {{ }} ``` - + + + + +You can filter your metrics to include only those that contain a specific substring (sequence of characters contained within a larger string (text)). Use the `search` argument to specify the substring you want to match. + +```sql +select * from {{ semantic_layer.metrics(search='order') }} +``` + +If no substring is provided, the query returns all metrics. - + + + + +In the case when you don't want to return the full result set from a metadata call, you can paginate the results for both `semantic_layer.metrics()` and `semantic_layer.dimensions()` calls using the `page_size` and `page_number` parameters. + +- `page_size`: This is an optional variable which sets the number of records per page. If left as None, there is no page limit. +- `page_number`: This is an optional variable which specifies the page number to retrieve. Defaults to `1` (first page) if not specified. + +Examples: + +```sql +-- Retrieves the 5th page with a page size of 10 metrics +select * from {{ semantic_layer.metrics(page_size=10, page_number=5) }} + +-- Retrieves the 1st page with a page size of 10 metrics +select * from {{ semantic_layer.metrics(page_size=10) }} + +-- Retrieves all metrics without pagination +select * from {{ semantic_layer.metrics() }} +``` + +You can use the same pagination parameters for `semantic_layer.dimensions(...)`. + + + You can use this example query to list all available saved queries in your dbt project. @@ -165,7 +200,7 @@ select * from semantic_layer.saved_queries() | NAME | DESCRIPTION | LABEL | METRICS | GROUP_BY | WHERE_FILTER | ``` - + - - The run_results.json includes three attributes related to the `applied` state that complement `unique_id`: - `compiled`: Boolean entry of the node compilation status (`False` after parsing, but `True` after compiling). @@ -195,5 +193,3 @@ Here's a printed snippet from the `run_results.json`: } ], ``` - - diff --git a/website/docs/reference/commands/cmd-docs.md b/website/docs/reference/commands/cmd-docs.md index f20da08a4ae..03e11ae89f0 100644 --- a/website/docs/reference/commands/cmd-docs.md +++ b/website/docs/reference/commands/cmd-docs.md @@ -20,8 +20,6 @@ The command is responsible for generating your project's documentation website b dbt docs generate ``` - - Use the `--select` argument to limit the nodes included within `catalog.json`. When this flag is provided, step (3) will be restricted to the selected nodes. All other nodes will be excluded. Step (2) is unaffected. **Example**: @@ -30,8 +28,6 @@ Use the `--select` argument to limit the nodes included within `catalog.json`. W dbt docs generate --select +orders ``` - - Use the `--no-compile` argument to skip re-compilation. When this flag is provided, `dbt docs generate` will skip step (2) described above. **Example**: diff --git a/website/docs/reference/commands/deps.md b/website/docs/reference/commands/deps.md index 85c103e6337..0cb8e50f7a6 100644 --- a/website/docs/reference/commands/deps.md +++ b/website/docs/reference/commands/deps.md @@ -58,8 +58,6 @@ Updates available for packages: ['tailsdotcom/dbt_artifacts', 'dbt-labs/snowplow Update your versions in packages.yml, then run dbt deps ``` - - ## Predictable package installs Starting in dbt Core v1.7, dbt generates a `package-lock.yml` file in the root of your project. This contains the complete set of resolved packages based on the `packages` configuration in `dependencies.yml` or `packages.yml`. Each subsequent invocation of `dbt deps` will install from the _locked_ set of packages specified in this file. Storing the complete set of required packages (with pinned versions) in version-controlled code ensures predictable installs in production and consistency across all developers and environments. @@ -97,5 +95,3 @@ dbt deps --add-package https://github.com/fivetran/dbt_amplitude@v0.3.0 --source # add package from local dbt deps --add-package /opt/dbt/redshift --source local ``` - - diff --git a/website/docs/reference/commands/init.md b/website/docs/reference/commands/init.md index 8945eb823db..112fff63a38 100644 --- a/website/docs/reference/commands/init.md +++ b/website/docs/reference/commands/init.md @@ -17,15 +17,10 @@ Then, it will: - Create a new folder with your project name and sample files, enough to get you started with dbt - Create a connection profile on your local machine. The default location is `~/.dbt/profiles.yml`. Read more in [configuring your profile](/docs/core/connect-data-platform/connection-profiles). - - When using `dbt init` to initialize your project, include the `--profile` flag to specify an existing `profiles.yml` as the `profile:` key to use instead of creating a new one. For example, `dbt init --profile profile_name`. - - If the profile does not exist in `profiles.yml` or the command is run inside an existing project, the command raises an error. - ## Existing project diff --git a/website/docs/reference/commands/version.md b/website/docs/reference/commands/version.md index 2ed14117828..3847b3cd593 100644 --- a/website/docs/reference/commands/version.md +++ b/website/docs/reference/commands/version.md @@ -13,7 +13,7 @@ The `--version` command-line flag returns information about the currently instal ## Versioning To learn more about release versioning for dbt Core, refer to [How dbt Core uses semantic versioning](/docs/dbt-versions/core#how-dbt-core-uses-semantic-versioning). -If using [versionless dbt Cloud](/docs/dbt-versions/upgrade-dbt-version-in-cloud#versionless), then `dbt_version` uses the latest (continuous) release version. This also follows semantic versioning guidelines, using the `YYYY.xx.yy` format, where the year is the major version (for example, `2024.04.1234`) +If using [versionless dbt Cloud](/docs/dbt-versions/upgrade-dbt-version-in-cloud#versionless), then `dbt_version` uses the latest (continuous) release version. This also follows semantic versioning guidelines, using the `YYYY.MM.DD+` format. The year, month, and day represent the date the version was built (for example, `2024.10.28+996c6a8`). The suffix provides an additional unique identification for each build. ## Example usages diff --git a/website/docs/reference/dbt-commands.md b/website/docs/reference/dbt-commands.md index 8386cf61731..ca9a7725eb2 100644 --- a/website/docs/reference/dbt-commands.md +++ b/website/docs/reference/dbt-commands.md @@ -11,7 +11,7 @@ A key distinction with the tools mentioned, is that dbt Cloud CLI and IDE are de ## Parallel execution -dbt Cloud allows for parallel execution of commands, enhancing efficiency without compromising data integrity. This enables you to run multiple commands at the same time, however it's important to understand which commands can be run in parallel and which can't. +dbt Cloud allows for concurrent execution of commands, enhancing efficiency without compromising data integrity. This enables you to run multiple commands at the same time. However, it's important to understand which commands can be run in parallel and which can't. In contrast, [`dbt-core` _doesn't_ support](/reference/programmatic-invocations#parallel-execution-not-supported) safe parallel execution for multiple invocations in the same process, and requires users to manage concurrency manually to ensure data integrity and system stability. diff --git a/website/docs/reference/dbt_project.yml.md b/website/docs/reference/dbt_project.yml.md index e7cd5bbeb79..1bb9dd2cf9c 100644 --- a/website/docs/reference/dbt_project.yml.md +++ b/website/docs/reference/dbt_project.yml.md @@ -14,8 +14,6 @@ Every [dbt project](/docs/build/projects) needs a `dbt_project.yml` file — thi The following example is a list of all available configurations in the `dbt_project.yml` file: - - ```yml @@ -94,77 +92,6 @@ vars: ``` - - - - - - -```yml -[name](/reference/project-configs/name): string - -[config-version](/reference/project-configs/config-version): 2 -[version](/reference/project-configs/version): version - -[profile](/reference/project-configs/profile): profilename - -[model-paths](/reference/project-configs/model-paths): [directorypath] -[seed-paths](/reference/project-configs/seed-paths): [directorypath] -[test-paths](/reference/project-configs/test-paths): [directorypath] -[analysis-paths](/reference/project-configs/analysis-paths): [directorypath] -[macro-paths](/reference/project-configs/macro-paths): [directorypath] -[snapshot-paths](/reference/project-configs/snapshot-paths): [directorypath] -[docs-paths](/reference/project-configs/docs-paths): [directorypath] -[asset-paths](/reference/project-configs/asset-paths): [directorypath] - -[packages-install-path](/reference/project-configs/packages-install-path): directorypath - -[clean-targets](/reference/project-configs/clean-targets): [directorypath] - -[query-comment](/reference/project-configs/query-comment): string - -[require-dbt-version](/reference/project-configs/require-dbt-version): version-range | [version-range] - -[dbt-cloud](/docs/cloud/cloud-cli-installation): - [project-id](/docs/cloud/configure-cloud-cli#configure-the-dbt-cloud-cli): project_id # Required - [defer-env-id](/docs/cloud/about-cloud-develop-defer#defer-in-dbt-cloud-cli): environment_id # Optional - -[quoting](/reference/project-configs/quoting): - database: true | false - schema: true | false - identifier: true | false - -models: - [](/reference/model-configs) - -seeds: - [](/reference/seed-configs) - -snapshots: - [](/reference/snapshot-configs) - -sources: - [](source-configs) - -tests: - [](/reference/data-test-configs) - -vars: - [](/docs/build/project-variables) - -[on-run-start](/reference/project-configs/on-run-start-on-run-end): sql-statement | [sql-statement] -[on-run-end](/reference/project-configs/on-run-start-on-run-end): sql-statement | [sql-statement] - -[dispatch](/reference/project-configs/dispatch-config): - - macro_namespace: packagename - search_order: [packagename] - -[restrict-access](/docs/collaborate/govern/model-access): true | false - -``` - - - ## Naming convention diff --git a/website/docs/reference/global-configs/adapter-behavior-changes.md b/website/docs/reference/global-configs/adapter-behavior-changes.md index bd0ba9f7404..a755f8cfe50 100644 --- a/website/docs/reference/global-configs/adapter-behavior-changes.md +++ b/website/docs/reference/global-configs/adapter-behavior-changes.md @@ -14,10 +14,17 @@ Some adapters can display behavior changes when certain flags are enabled. The f
+ + + -
\ No newline at end of file + diff --git a/website/docs/reference/global-configs/behavior-changes.md b/website/docs/reference/global-configs/behavior-changes.md index fadf424c389..299674ae9c1 100644 --- a/website/docs/reference/global-configs/behavior-changes.md +++ b/website/docs/reference/global-configs/behavior-changes.md @@ -58,6 +58,7 @@ flags: require_model_names_without_spaces: False source_freshness_run_project_hooks: False restrict_direct_pg_catalog_access: False + require_yaml_configuration_for_mf_time_spines: False ``` @@ -66,12 +67,13 @@ When we use dbt Cloud in the following table, we're referring to accounts that h | Flag | dbt Cloud: Intro | dbt Cloud: Maturity | dbt Core: Intro | dbt Core: Maturity | |-----------------------------------------------------------------|------------------|---------------------|-----------------|--------------------| -| require_explicit_package_overrides_for_builtin_materializations | 2024.04 | 2024.06 | 1.6.14, 1.7.14 | 1.8.0 | -| require_resource_names_without_spaces | 2024.05 | TBD* | 1.8.0 | 1.9.0 | -| source_freshness_run_project_hooks | 2024.03 | TBD* | 1.8.0 | 1.9.0 | +| [require_explicit_package_overrides_for_builtin_materializations](#package-override-for-built-in-materialization) | 2024.04 | 2024.06 | 1.6.14, 1.7.14 | 1.8.0 | +| [require_resource_names_without_spaces](#no-spaces-in-resource-names) | 2024.05 | TBD* | 1.8.0 | 1.9.0 | +| [source_freshness_run_project_hooks](#project-hooks-with-source-freshness) | 2024.03 | TBD* | 1.8.0 | 1.9.0 | | [Redshift] [restrict_direct_pg_catalog_access](/reference/global-configs/redshift-changes#the-restrict_direct_pg_catalog_access-flag) | 2024.09 | TBD* | dbt-redshift v1.9.0 | 1.9.0 | -| skip_nodes_if_on_run_start_fails | 2024.10 | TBD* | 1.9.0 | TBD* | -| state_modified_compare_more_unrendered_values | 2024.10 | TBD* | 1.9.0 | TBD* | +| [skip_nodes_if_on_run_start_fails](#failures-in-on-run-start-hooks) | 2024.10 | TBD* | 1.9.0 | TBD* | +| [state_modified_compare_more_unrendered_values](#source-definitions-for-state) | 2024.10 | TBD* | 1.9.0 | TBD* | +| [require_yaml_configuration_for_mf_time_spines](#metricflow-time-spine-yaml) | 2024.10 | TBD* | 1.9.0 | TBD* | When the dbt Cloud Maturity is "TBD," it means we have not yet determined the exact date when these flags' default values will change. Affected users will see deprecation warnings in the meantime, and they will receive emails providing advance warning ahead of the maturity date. In the meantime, if you are seeing a deprecation warning, you can either: - Migrate your project to support the new behavior, and then set the flag to `True` to stop seeing the warnings. @@ -143,7 +145,7 @@ The names of dbt resources (models, sources, etc) should contain letters, number Set the `source_freshness_run_project_hooks` flag to `True` to include "project hooks" ([`on-run-start` / `on-run-end`](/reference/project-configs/on-run-start-on-run-end)) in the `dbt source freshness` command execution. -If you have specific project [`on-run-start` / `on-run-end`](/reference/project-configs/on-run-start-on-run-end) hooks that should not run before/after `source freshness` command, you can add a conditional check to those hooks: +If you have a specific project [`on-run-start` / `on-run-end`](/reference/project-configs/on-run-start-on-run-end) hooks that should not run before/after `source freshness` command, you can add a conditional check to those hooks: @@ -152,3 +154,13 @@ on-run-start: - '{{ ... if flags.WHICH != 'freshness' }}' ``` + + +### MetricFlow time spine YAML +The `require_yaml_configuration_for_mf_time_spines` flag is set to `False` by default. + +In previous versions (dbt Core 1.8 and earlier), the MetricFlow time spine configuration was stored in a `metricflow_time_spine.sql` file. + +When the flag is set to `True`, dbt will continue to support the SQL file configuration. When the flag is set to `False`, dbt will raise a deprecation warning if it detects a MetricFlow time spine configured in a SQL file. + +The MetricFlow YAML file should have the `time_spine:` field. Refer to [MetricFlow timespine](/docs/build/metricflow-time-spine) for more details. diff --git a/website/docs/reference/global-configs/cache.md b/website/docs/reference/global-configs/cache.md index 1a74fef8d30..03f33286aa4 100644 --- a/website/docs/reference/global-configs/cache.md +++ b/website/docs/reference/global-configs/cache.md @@ -6,7 +6,7 @@ sidebar: "Cache" ### Cache population -At the start of runs, dbt caches metadata about all the objects in all the schemas where it might materialize resources (such as models). By default, dbt populates the cache with information on all schemas related to the project. +At the start of runs, dbt caches metadata about all the objects in all the schemas where it might materialize resources (such as models). By default, dbt populates the relational cache with information on all schemas related to the project. There are two ways to optionally modify this behavior: - `POPULATE_CACHE` (default: `True`): Whether to populate the cache at all. To skip cache population entirely, use the `--no-populate-cache` flag or `DBT_POPULATE_CACHE: False`. Note that this does not _disable_ the cache; missed cache lookups will run queries, and update the cache afterward. @@ -26,3 +26,11 @@ Or, to improve speed and performance while focused on developing Salesforce mode dbt --cache-selected-only run --select salesforce ``` + +### Logging relational cache events + +import LogLevel from '/snippets/_log-relational-cache.md'; + + diff --git a/website/docs/reference/global-configs/databricks-changes.md b/website/docs/reference/global-configs/databricks-changes.md new file mode 100644 index 00000000000..ca24b822ae5 --- /dev/null +++ b/website/docs/reference/global-configs/databricks-changes.md @@ -0,0 +1,26 @@ +--- +title: "Databricks adapter behavior changes" +id: "databricks-changes" +sidebar: "Databricks" +--- + +The following are the current [behavior change flags](/docs/reference/global-configs/behavior-changes.md#behavior-change-flags) that are specific to `dbt-databricks`: + +| Flag | `dbt-databricks`: Intro | `dbt-databricks`: Maturity | +| ----------------------------- | ----------------------- | -------------------------- | +| `use_info_schema_for_columns` | 1.9.0 | TBD | +| `use_user_folder_for_python` | 1.9.0 | TBD | + +### Use information schema for columns + +The `use_info_schema_for_columns` flag is `False` by default. + +Setting this flag to `True` will use `information_schema` rather than `describe extended` to get column metadata for Unity Catalog tables. This setting helps you avoid issues where `describe extended` truncates information when the type is a complex struct. However, this setting is not yet the default behavior, as there are performance impacts due to a Databricks metadata limitation because of the need to run `REPAIR TABLE {{relation}} SYNC METADATA` before querying to ensure the `information_schema` is complete. + +This flag will become the default behavior when this additional query is no longer needed. + +### Use user's folder for Python model notebooks + +The `use_user_folder_for_python` flag is `False` by default and results in writing uploaded python model notebooks to `/Shared/dbt_python_models/{{schema}}/`. Setting this flag to `True` will write notebooks to `/Users/{{current user}}/{{catalog}}/{{schema}}/` Writing to the `Shared` folder is deprecated by Databricks as it does not align with governance best practices. + +We plan to promote this flag to maturity in v1.10.0. diff --git a/website/docs/reference/global-configs/logs.md b/website/docs/reference/global-configs/logs.md index 972a731854d..682b9fc8393 100644 --- a/website/docs/reference/global-configs/logs.md +++ b/website/docs/reference/global-configs/logs.md @@ -137,11 +137,11 @@ You can use either of these parameters to ensure clean output that's compatible ### Logging relational cache events -The `LOG_CACHE_EVENTS` config allows detailed logging for [relational cache](/reference/global-configs/cache) events, which are disabled by default. +import LogLevel from '/snippets/_log-relational-cache.md'; -```text -dbt --log-cache-events compile -``` +relational cache} +/> ### Color diff --git a/website/docs/reference/global-configs/resource-type.md b/website/docs/reference/global-configs/resource-type.md index 9e6ec82df06..431b6c049cb 100644 --- a/website/docs/reference/global-configs/resource-type.md +++ b/website/docs/reference/global-configs/resource-type.md @@ -24,20 +24,7 @@ The `--exclude-resource-type` flag is only available in dbt version 1.8 and high The available resource types are: - - -- [`analysis`](/docs/build/analyses) -- [`exposure`](/docs/build/exposures) -- [`metric`](/docs/build/metrics-overview) -- [`model`](/docs/build/models) -- [`seed`](/docs/build/seeds) -- [`snapshot`](/docs/build/snapshots) -- [`source`](/docs/build/sources) -- [`test`](/docs/build/data-tests) - - - - + - [`analysis`](/docs/build/analyses) - [`exposure`](/docs/build/exposures) @@ -82,7 +69,6 @@ Instead of targeting specific resources, use the `--resource-flag` or `--exclude - - In this example, run the following command to include _all_ saved queries with the `--resource-type` flag: @@ -94,8 +80,6 @@ Instead of targeting specific resources, use the `--resource-flag` or `--exclude - - - In this example, use the following command to exclude _all_ unit tests from your dbt build process. Note that the `--exclude-resource-type` flag is only available in dbt version 1.8 and higher: diff --git a/website/docs/reference/macro-properties.md b/website/docs/reference/macro-properties.md index 91a616ded0d..69a66f308d9 100644 --- a/website/docs/reference/macro-properties.md +++ b/website/docs/reference/macro-properties.md @@ -19,6 +19,7 @@ macros: [description](/reference/resource-properties/description): [docs](/reference/resource-configs/docs): show: true | false + [meta](/reference/resource-configs/meta): {} arguments: - name: [type](/reference/resource-properties/argument-type): diff --git a/website/docs/reference/node-selection/methods.md b/website/docs/reference/node-selection/methods.md index 38484494e4b..7587a9fd2b1 100644 --- a/website/docs/reference/node-selection/methods.md +++ b/website/docs/reference/node-selection/methods.md @@ -310,10 +310,6 @@ dbt list --select "+semantic_model:orders" # list your semantic model named "or ``` ### The "saved_query" method - -Supported in v1.7 or newer. - - The `saved_query` method selects [saved queries](/docs/build/saved-queries). @@ -322,8 +318,6 @@ dbt list --select "saved_query:*" # list all saved queries dbt list --select "+saved_query:orders_saved_query" # list your saved query named "orders_saved_query" and all upstream resources ``` - - ### The "unit_test" method diff --git a/website/docs/reference/programmatic-invocations.md b/website/docs/reference/programmatic-invocations.md index 09e41b1789f..61250e6debb 100644 --- a/website/docs/reference/programmatic-invocations.md +++ b/website/docs/reference/programmatic-invocations.md @@ -25,9 +25,9 @@ for r in res.result: ## Parallel execution not supported -[`dbt-core`](https://pypi.org/project/dbt-core/) doesn't support [safe parallel execution](/reference/dbt-commands#parallel-execution) for multiple invocations in the same process. This means it's not safe to run multiple dbt commands at the same time. It's officially discouraged and requires a wrapping process to handle sub-processes. This is because: +[`dbt-core`](https://pypi.org/project/dbt-core/) doesn't support [safe parallel execution](/reference/dbt-commands#parallel-execution) for multiple invocations in the same process. This means it's not safe to run multiple dbt commands concurrently. It's officially discouraged and requires a wrapping process to handle sub-processes. This is because: -- Running simultaneous commands can unexpectedly interact with the data platform. For example, running `dbt run` and `dbt build` for the same models simultaneously could lead to unpredictable results. +- Running concurrent commands can unexpectedly interact with the data platform. For example, running `dbt run` and `dbt build` for the same models simultaneously could lead to unpredictable results. - Each `dbt-core` command interacts with global Python variables. To ensure safe operation, commands need to be executed in separate processes, which can be achieved using methods like spawning processes or using tools like Celery. To run [safe parallel execution](/reference/dbt-commands#available-commands), you can use the [dbt Cloud CLI](/docs/cloud/cloud-cli-installation) or [dbt Cloud IDE](/docs/cloud/dbt-cloud-ide/develop-in-the-cloud), both of which does that additional work to manage concurrency (multiple processes) on your behalf. diff --git a/website/docs/reference/resource-configs/access.md b/website/docs/reference/resource-configs/access.md index 0f67a454344..c73e09dd639 100644 --- a/website/docs/reference/resource-configs/access.md +++ b/website/docs/reference/resource-configs/access.md @@ -15,14 +15,6 @@ models: - - -Access modifiers may be applied to models one-by-one in YAML properties. In v1.5 and v1.6, you are unable to configure `access` for multiple models at once. Upgrade to v1.7 for additional configuration options. A group or subfolder contains models with varying access levels, so when you designate a model with `access: public`, make sure you intend for this behavior. - - - - - You can apply access modifiers in config files, including the `dbt_project.yml`, or to models one-by-one in `properties.yml`. Applying access configs to a subfolder modifies the default for all models in that subfolder, so make sure you intend for this behavior. When setting individual model access, a group or subfolder might contain a variety of access levels, so when you designate a model with `access: public` make sure you intend for this behavior. There are multiple approaches to configuring access: @@ -83,8 +75,6 @@ There are multiple approaches to configuring access: ``` - - After you define `access`, rerun a production job to apply the change. ## Definition diff --git a/website/docs/reference/resource-configs/bigquery-configs.md b/website/docs/reference/resource-configs/bigquery-configs.md index b943f114861..9dd39c936b6 100644 --- a/website/docs/reference/resource-configs/bigquery-configs.md +++ b/website/docs/reference/resource-configs/bigquery-configs.md @@ -710,8 +710,6 @@ models: Views with this configuration will be able to select from objects in `project_1.dataset_1` and `project_2.dataset_2`, even when they are located elsewhere and queried by users who do not otherwise have access to `project_1.dataset_1` and `project_2.dataset_2`. - - ## Materialized views The BigQuery adapter supports [materialized views](https://cloud.google.com/bigquery/docs/materialized-views-intro) @@ -894,10 +892,6 @@ As with most data platforms, there are limitations associated with materialized Find more information about materialized view limitations in Google's BigQuery [docs](https://cloud.google.com/bigquery/docs/materialized-views-intro#limitations). - - - - ## Python models The BigQuery adapter supports Python models with the following additional configuration parameters: @@ -914,4 +908,3 @@ By default, this is set to `True` to support the default `intermediate_format` o ### The `intermediate_format` parameter The `intermediate_format` parameter specifies which file format to use when writing records to a table. The default is `parquet`. - diff --git a/website/docs/reference/resource-configs/contract.md b/website/docs/reference/resource-configs/contract.md index 2f52fc26e1f..fb25076b0d9 100644 --- a/website/docs/reference/resource-configs/contract.md +++ b/website/docs/reference/resource-configs/contract.md @@ -16,14 +16,6 @@ This is to ensure that the people querying your model downstream—both inside a ## Data type aliasing - - -The `data_type` defined in your YAML file must match a data type your data platform recognizes. dbt does not do any type aliasing itself. If your data platform recognizes both `int` and `integer` as corresponding to the same type, then they will return a match. - - - - - dbt uses built-in type aliasing for the `data_type` defined in your YAML. For example, you can specify `string` in your contract, and on Postgres/Redshift, dbt will convert it to `text`. If dbt doesn't recognize the `data_type` name among its known aliases, it will pass it through as-is. This is enabled by default, but you can opt-out by setting `alias_types` to `false`. Example for disabling: @@ -42,7 +34,6 @@ models: ``` - ## Size, precision, and scale diff --git a/website/docs/reference/resource-configs/databricks-configs.md b/website/docs/reference/resource-configs/databricks-configs.md index 88014d0ac4d..f807b1c0d88 100644 --- a/website/docs/reference/resource-configs/databricks-configs.md +++ b/website/docs/reference/resource-configs/databricks-configs.md @@ -7,23 +7,7 @@ id: "databricks-configs" When materializing a model as `table`, you may include several optional configs that are specific to the dbt-databricks plugin, in addition to the standard [model configs](/reference/model-configs). - - - -| Option | Description | Required? | Model Support | Example | -|---------------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-------------------------------------------|---------------|--------------------------| -| file_format | The file format to use when creating tables (`parquet`, `delta`, `hudi`, `csv`, `json`, `text`, `jdbc`, `orc`, `hive` or `libsvm`). | Optional | SQL, Python | `delta` | -| location_root | The created table uses the specified directory to store its data. The table alias is appended to it. | Optional | SQL, Python | `/mnt/root` | -| partition_by | Partition the created table by the specified columns. A directory is created for each partition. | Optional | SQL, Python | `date_day` | -| liquid_clustered_by | Cluster the created table by the specified columns. Clustering method is based on [Delta's Liquid Clustering feature](https://docs.databricks.com/en/delta/clustering.html). Available since dbt-databricks 1.6.2. | Optional | SQL | `date_day` | -| clustered_by | Each partition in the created table will be split into a fixed number of buckets by the specified columns. | Optional | SQL, Python | `country_code` | -| buckets | The number of buckets to create while clustering. | Required if `clustered_by` is specified | SQL, Python | `8` | -| tblproperties | [Tblproperties](https://docs.databricks.com/en/sql/language-manual/sql-ref-syntax-ddl-tblproperties.html) to be set on the created table. | Optional | SQL | `{'this.is.my.key': 12}` | -| compression | Set the compression algorithm. | Optional | SQL, Python | `zstd` | - - - - + | Option | Description | Required? | Model Support | Example | @@ -42,7 +26,7 @@ We do not yet have a PySpark API to set tblproperties at table creation, so this - + 1.8 introduces support for [Tags](https://docs.databricks.com/en/data-governance/unity-catalog/tags.html) at the table level, in addition to all table configuration supported in 1.7. @@ -51,7 +35,7 @@ We do not yet have a PySpark API to set tblproperties at table creation, so this | file_format | The file format to use when creating tables (`parquet`, `delta`, `hudi`, `csv`, `json`, `text`, `jdbc`, `orc`, `hive` or `libsvm`). | Optional | SQL, Python | `delta` | | location_root | The created table uses the specified directory to store its data. The table alias is appended to it. | Optional | SQL, Python | `/mnt/root` | | partition_by | Partition the created table by the specified columns. A directory is created for each partition. | Optional | SQL, Python | `date_day` | -| liquid_clustered_by | Cluster the created table by the specified columns. Clustering method is based on [Delta's Liquid Clustering feature](https://docs.databricks.com/en/delta/clustering.html). Available since dbt-databricks 1.6.2. | Optional | SQL | `date_day` | +| liquid_clustered_by | Cluster the created table by the specified columns. Clustering method is based on [Delta's Liquid Clustering feature](https://docs.databricks.com/en/delta/clustering.html). Available since dbt-databricks 1.6.2. | Optional | SQL, Python | `date_day` | | clustered_by | Each partition in the created table will be split into a fixed number of buckets by the specified columns. | Optional | SQL, Python | `country_code` | | buckets | The number of buckets to create while clustering | Required if `clustered_by` is specified | SQL, Python | `8` | | tblproperties | [Tblproperties](https://docs.databricks.com/en/sql/language-manual/sql-ref-syntax-ddl-tblproperties.html) to be set on the created table | Optional | SQL, Python* | `{'this.is.my.key': 12}` | @@ -67,6 +51,29 @@ We do not yet have a PySpark API to set tblproperties at table creation, so this +dbt Core v.9 and Versionless dbt Clouyd support for `table_format: iceberg`, in addition to all previous table configurations supported in 1.8. + +| Option | Description | Required? | Model Support | Example | +|---------------------|-----------------------------|-------------------------------------------|-----------------|--------------------------| +| table_format | Whether or not to provision [Iceberg](https://docs.databricks.com/en/delta/uniform.html) compatibility for the materialization | Optional | SQL, Python | `iceberg` | +| file_format+ | The file format to use when creating tables (`parquet`, `delta`, `hudi`, `csv`, `json`, `text`, `jdbc`, `orc`, `hive` or `libsvm`). | Optional | SQL, Python | `delta` | +| location_root | The created table uses the specified directory to store its data. The table alias is appended to it. | Optional | SQL, Python | `/mnt/root` | +| partition_by | Partition the created table by the specified columns. A directory is created for each partition. | Optional | SQL, Python | `date_day` | +| liquid_clustered_by | Cluster the created table by the specified columns. Clustering method is based on [Delta's Liquid Clustering feature](https://docs.databricks.com/en/delta/clustering.html). Available since dbt-databricks 1.6.2. | Optional | SQL, Python | `date_day` | +| clustered_by | Each partition in the created table will be split into a fixed number of buckets by the specified columns. | Optional | SQL, Python | `country_code` | +| buckets | The number of buckets to create while clustering | Required if `clustered_by` is specified | SQL, Python | `8` | +| tblproperties | [Tblproperties](https://docs.databricks.com/en/sql/language-manual/sql-ref-syntax-ddl-tblproperties.html) to be set on the created table | Optional | SQL, Python* | `{'this.is.my.key': 12}` | +| databricks_tags | [Tags](https://docs.databricks.com/en/data-governance/unity-catalog/tags.html) to be set on the created table | Optional | SQL++, Python++ | `{'my_tag': 'my_value'}` | +| compression | Set the compression algorithm. | Optional | SQL, Python | `zstd` | + +\* We do not yet have a PySpark API to set tblproperties at table creation, so this feature is primarily to allow users to anotate their python-derived tables with tblproperties. +\+ When `table_format` is `iceberg`, `file_format` must be `delta`. +\++ `databricks_tags` are currently only supported at the table level, and applied via `ALTER` statements. + + + + + ### Python submission methods In dbt v1.9 and higher, or in [Versionless](/docs/dbt-versions/versionless-cloud) dbt Cloud, you can use these four options for `submission_method`: @@ -166,6 +173,7 @@ models: + ## Incremental models dbt-databricks plugin leans heavily on the [`incremental_strategy` config](/docs/build/incremental-strategy). This config tells the incremental materialization how to build models in runs beyond their first. It can be set to one of four values: @@ -176,6 +184,23 @@ dbt-databricks plugin leans heavily on the [`incremental_strategy` config](/docs Each of these strategies has its pros and cons, which we'll discuss below. As with any model config, `incremental_strategy` may be specified in `dbt_project.yml` or within a model file's `config()` block. + + + + +## Incremental models + +dbt-databricks plugin leans heavily on the [`incremental_strategy` config](/docs/build/incremental-strategy). This config tells the incremental materialization how to build models in runs beyond their first. It can be set to one of five values: + - **`append`**: Insert new records without updating or overwriting any existing data. + - **`insert_overwrite`**: If `partition_by` is specified, overwrite partitions in the with new data. If no `partition_by` is specified, overwrite the entire table with new data. + - **`merge`** (default; Delta and Hudi file format only): Match records based on a `unique_key`, updating old records, and inserting new ones. (If no `unique_key` is specified, all new data is inserted, similar to `append`.) + - **`replace_where`** (Delta file format only): Match records based on `incremental_predicates`, replacing all records that match the predicates from the existing table with records matching the predicates from the new data. (If no `incremental_predicates` are specified, all new data is inserted, similar to `append`.) + - **`microbatch`** (Delta file format only): Implements the [microbatch strategy](/docs/build/incremental-microbatch) using `replace_where` with predicates generated based `event_time`. + +Each of these strategies has its pros and cons, which we'll discuss below. As with any model config, `incremental_strategy` may be specified in `dbt_project.yml` or within a model file's `config()` block. + + + ### The `append` strategy Following the `append` strategy, dbt will perform an `insert into` statement with all new data. The appeal of this strategy is that it is straightforward and functional across all platforms, file types, connection methods, and Apache Spark versions. However, this strategy _cannot_ update, overwrite, or delete existing data, so it is likely to insert duplicate records for many data sources. @@ -322,7 +347,7 @@ The `merge` incremental strategy requires: - Databricks Runtime 5.1 and above for delta file format - Apache Spark for hudi file format -dbt will run an [atomic `merge` statement](https://docs.databricks.com/spark/latest/spark-sql/language-manual/merge-into.html) which looks nearly identical to the default merge behavior on Snowflake and BigQuery. If a `unique_key` is specified (recommended), dbt will update old records with values from new records that match on the key column. If a `unique_key` is not specified, dbt will forgo match criteria and simply insert all new records (similar to `append` strategy). +The Databricks adapter will run an [atomic `merge` statement](https://docs.databricks.com/spark/latest/spark-sql/language-manual/merge-into.html) similar to the default merge behavior on Snowflake and BigQuery. If a `unique_key` is specified (recommended), dbt will update old records with values from new records that match on the key column. If a `unique_key` is not specified, dbt will forgo match criteria and simply insert all new records (similar to `append` strategy). Specifying `merge` as the incremental strategy is optional since it's the default strategy used when none is specified. @@ -403,6 +428,123 @@ merge into analytics.merge_incremental as DBT_INTERNAL_DEST + + +Beginning with 1.9, `merge` behavior can be modified with the following additional configuration options: + +- `target_alias`, `source_alias`: Aliases for the target and source to allow you to describe your merge conditions more naturally. These default to `DBT_INTERNAL_DEST` and `DBT_INTERNAL_SOURCE`, respectively. +- `skip_matched_step`: If set to `true`, the 'matched' clause of the merge statement will not be included. +- `skip_not_matched_step`: If set to `true`, the 'not matched' clause will not be included. +- `matched_condition`: Condition to apply to the `WHEN MATCHED` clause. You should use the `target_alias` and `source_alias` to write a conditional expression, such as `DBT_INTERNAL_DEST.col1 = hash(DBT_INTERNAL_SOURCE.col2, DBT_INTERNAL_SOURCE.col3)`. This condition further restricts the matched set of rows. +- `not_matched_condition`: Condition to apply to the `WHEN NOT MATCHED [BY TARGET]` clause. This condition further restricts the set of rows in the target that do not match the source that will be inserted into the merged table. +- `not_matched_by_source_condition`: Condition to apply to the further filter `WHEN NOT MATCHED BY SOURCE` clause. Only used in conjunction with `not_matched_by_source_action: delete`. +- `not_matched_by_source_action`: If set to `delete`, a `DELETE` clause is added to the merge statement for `WHEN NOT MATCHED BY SOURCE`. +- `merge_with_schema_evolution`: If set to `true`, the merge statement includes the `WITH SCHEMA EVOLUTION` clause. + +For more details on the meaning of each merge clause, please see [the Databricks documentation](https://docs.databricks.com/en/sql/language-manual/delta-merge-into.html). + +The following is an example demonstrating the use of these new options: + + + + + + +```sql +{{ config( + materialized = 'incremental', + unique_key = 'id', + incremental_strategy='merge', + target_alias='t', + source_alias='s', + matched_condition='t.tech_change_ts < s.tech_change_ts', + not_matched_condition='s.attr1 IS NOT NULL', + not_matched_by_source_condition='t.tech_change_ts < current_timestamp()', + not_matched_by_source_action='delete', + merge_with_schema_evolution=true +) }} + +select + id, + attr1, + attr2, + tech_change_ts +from + {{ ref('source_table') }} as s +``` + + + + + + + +```sql +create temporary view merge_incremental__dbt_tmp as + + select + id, + attr1, + attr2, + tech_change_ts + from upstream.source_table +; + +merge + with schema evolution +into + target_table as t +using ( + select + id, + attr1, + attr2, + tech_change_ts + from + source_table as s +) +on + t.id <=> s.id +when matched + and t.tech_change_ts < s.tech_change_ts + then update set + id = s.id, + attr1 = s.attr1, + attr2 = s.attr2, + tech_change_ts = s.tech_change_ts + +when not matched + and s.attr1 IS NOT NULL + then insert ( + id, + attr1, + attr2, + tech_change_ts + ) values ( + s.id, + s.attr1, + s.attr2, + s.tech_change_ts + ) + +when not matched by source + and t.tech_change_ts < current_timestamp() + then delete +``` + + + + + + + + ### The `replace_where` strategy The `replace_where` incremental strategy requires: @@ -492,7 +634,83 @@ insert into analytics.replace_where_incremental - + + +### The `microbatch` strategy + +The Databricks adapter implements the `microbatch` strategy using `replace_where`. Note the requirements and caution statements for `replace_where` above. For more information about this strategy, see the [microbatch reference page](/docs/build/incremental-microbatch). + +In the following example, the upstream table `events` have been annotated with an `event_time` column called `ts` in its schema file. + + + + + + +```sql +{{ config( + materialized='incremental', + file_format='delta', + incremental_strategy = 'microbatch' + event_time='date' # Use 'date' as the grain for this microbatch table +) }} + +with new_events as ( + + select * from {{ ref('events') }} + +) + +select + user_id, + date, + count(*) as visits + +from events +group by 1, 2 +``` + + + + + + + +```sql +create temporary view replace_where__dbt_tmp as + + with new_events as ( + + select * from (select * from analytics.events where ts >= '2024-10-01' and ts < '2024-10-02') + + ) + + select + user_id, + date, + count(*) as visits + from events + group by 1, 2 +; + +insert into analytics.replace_where_incremental + replace where CAST(date as TIMESTAMP) >= '2024-10-01' and CAST(date as TIMESTAMP) < '2024-10-02' + table `replace_where__dbt_tmp` +``` + + + + + + + + ## Selecting compute per model @@ -682,8 +900,6 @@ def model(dbt, session): If your default compute is a SQL Warehouse, you will need to specify an all purpose cluster `http_path` in this way. - - ## Persisting model descriptions Relation-level docs persistence is supported in dbt v0.17.0. For more @@ -895,9 +1111,5 @@ One application of this feature is making `delta` tables compatible with `iceber ) }} ``` - - `tblproperties` can be specified for python models, but they will be applied via an `ALTER` statement after table creation. This is due to a limitation in PySpark. - - diff --git a/website/docs/reference/resource-configs/enabled.md b/website/docs/reference/resource-configs/enabled.md index febf1e50c88..b74d7250907 100644 --- a/website/docs/reference/resource-configs/enabled.md +++ b/website/docs/reference/resource-configs/enabled.md @@ -230,14 +230,6 @@ exposures: - - -Support for disabling semantic models has been added in dbt Core v1.7 - - - - - ```yaml @@ -259,20 +251,10 @@ semantic_models: - - - - -Support for disabling saved queries has been added in dbt Core v1.7. - - - - - ```yaml @@ -294,8 +276,6 @@ saved_queries: - - diff --git a/website/docs/reference/resource-configs/group.md b/website/docs/reference/resource-configs/group.md index 717d7de89f5..cd0ad2683f5 100644 --- a/website/docs/reference/resource-configs/group.md +++ b/website/docs/reference/resource-configs/group.md @@ -218,14 +218,6 @@ metrics: - - -Support for grouping semantic models has been added in dbt Core v1.7. - - - - - ```yaml @@ -247,20 +239,10 @@ semantic_models: - - - - -Support for grouping saved queries has been added in dbt Core v1.7. - - - - - ```yaml @@ -282,8 +264,6 @@ saved_queries: - - diff --git a/website/docs/reference/resource-configs/meta.md b/website/docs/reference/resource-configs/meta.md index 2bcccdd4141..e1542bdbc82 100644 --- a/website/docs/reference/resource-configs/meta.md +++ b/website/docs/reference/resource-configs/meta.md @@ -56,7 +56,7 @@ See [configs and properties](/reference/configs-and-properties) for details. ```yml version: 2 -sources: +[sources](/reference/source-properties): - name: model_name config: meta: {} @@ -110,7 +110,7 @@ version: 2 snapshots: - name: snapshot_name config: - meta: {} + [meta](/reference/snapshot-properties): {} columns: - name: column_name @@ -147,7 +147,7 @@ The `meta` config is not currently supported for analyses. ```yml version: 2 -macros: +[macros](/reference/macro-properties): - name: macro_name meta: {} @@ -179,14 +179,6 @@ exposures: - - -Support for grouping semantic models was added in dbt Core v1.7 - - - - - ```yml @@ -201,8 +193,6 @@ semantic_models: The `meta` config can also be defined under the `semantic-models` config block in `dbt_project.yml`. See [configs and properties](/reference/configs-and-properties) for details. - - @@ -249,14 +239,6 @@ metrics: - - -Support for saved queries has been added in dbt Core v1.7. - - - - - ```yml @@ -268,8 +250,6 @@ saved_queries: - - @@ -307,7 +287,7 @@ models: ```yml version: 2 -sources: +[sources](/reference/source-properties): - name: salesforce tables: diff --git a/website/docs/reference/resource-configs/postgres-configs.md b/website/docs/reference/resource-configs/postgres-configs.md index 07cfc938f1c..f2bf90a93c0 100644 --- a/website/docs/reference/resource-configs/postgres-configs.md +++ b/website/docs/reference/resource-configs/postgres-configs.md @@ -185,20 +185,3 @@ It's worth noting that, unlike tables, dbt monitors this parameter for changes a This happens via a `DROP/CREATE` of the indexes, which can be thought of as an `ALTER` of the materialized view. Learn more about these parameters in Postgres's [docs](https://www.postgresql.org/docs/current/sql-creatematerializedview.html). - - - -### Limitations - -#### Changing materialization to and from "materialized_view" - -Swapping an already materialized model to a materialized view, and vice versa, is not supported. -The workaround is to manually drop the existing materialization in the data warehouse prior to calling `dbt run`. -Running with `--full-refresh` flag will not work to drop the existing table or view and create the materialized view (and vice versa). -This would only need to be done once as the existing object would then be a materialized view. - -For example,`my_model`, has already been materialized as a table in the underlying data platform via `dbt run`. -If the user changes the model's config to `materialized="materialized_view"`, they will get an error. -The solution is to execute `DROP TABLE my_model` on the data warehouse before trying the model again. - - diff --git a/website/docs/reference/resource-configs/redshift-configs.md b/website/docs/reference/resource-configs/redshift-configs.md index e7149ae484e..b033cd6267e 100644 --- a/website/docs/reference/resource-configs/redshift-configs.md +++ b/website/docs/reference/resource-configs/redshift-configs.md @@ -230,21 +230,6 @@ As with most data platforms, there are limitations associated with materialized Find more information about materialized view limitations in Redshift's [docs](https://docs.aws.amazon.com/redshift/latest/dg/materialized-view-create-sql-command.html#mv_CREATE_MATERIALIZED_VIEW-limitations). - - -#### Changing materialization from "materialized_view" to "table" or "view" - -Swapping a materialized view to a table or view is not supported. -You must manually drop the existing materialized view in the data warehouse before calling `dbt run`. -Normally, re-running with the `--full-refresh` flag would resolve this, but not in this case. -This would only need to be done once as the existing object would then be a materialized view. - -For example, assume that a materialized view, `my_mv.sql`, has already been materialized to the underlying data platform via `dbt run`. -If the user changes the model's config to `materialized="table"`, they will get an error. -The workaround is to execute `DROP MATERIALIZED VIEW my_mv CASCADE` on the data warehouse before trying the model again. - - - ## Unit test limitations diff --git a/website/docs/reference/resource-configs/snowflake-configs.md b/website/docs/reference/resource-configs/snowflake-configs.md index abb516d2258..7bef180e3d3 100644 --- a/website/docs/reference/resource-configs/snowflake-configs.md +++ b/website/docs/reference/resource-configs/snowflake-configs.md @@ -337,33 +337,6 @@ For dbt limitations, these dbt features are not supported: - [Model contracts](/docs/collaborate/govern/model-contracts) - [Copy grants configuration](/reference/resource-configs/snowflake-configs#copying-grants) - - -#### Changing materialization to and from "dynamic_table" - -Version `1.6.x` does not support altering the materialization from a non-dynamic table be a dynamic table and vice versa. -Re-running with the `--full-refresh` does not resolve this either. -The workaround is manually dropping the existing model in the warehouse prior to calling `dbt run`. -This only needs to be done once for the conversion. - -For example, assume for the example model below, `my_model`, has already been materialized to the underlying data platform via `dbt run`. -If the model config is updated to `materialized="dynamic_table"`, dbt will return an error. -The workaround is to execute `DROP TABLE my_model` on the data warehouse before trying the model again. - - - -```yaml - -{{ config( - materialized="table" # or any model type (e.g. view, incremental) -) }} - -``` - - - - - ## Temporary tables Incremental table merges for Snowflake prefer to utilize a `view` rather than a `temporary table`. The reasoning is to avoid the database write step that a temporary table would initiate and save compile time. @@ -705,3 +678,27 @@ Per the [Snowflake documentation](https://docs.snowflake.com/en/sql-reference/in >- DDL operations. >- DML operations (for tables only). >- Background maintenance operations on metadata performed by Snowflake. + + + +## Pagination for object results + +By default, when dbt encounters a schema with up to 100,000 objects, it will paginate the results from `show objects` at 10,000 per page for up to 10 pages. + +Environments with more than 100,000 objects in a schema can customize the number of results per page and the page limit using the following [flags](/reference/global-configs/about-global-configs) in the `dbt_project.yml`: + +- `list_relations_per_page` — The number of relations on each page (Max 10k as this is the most Snowflake allows). +- `list_relations_page_limit` — The maximum number of pages to include in the results. + +For example, if you wanted to include 10,000 objects per page and include up to 100 pages (1 million objects), configure the flags as follows: + + +```yml + +flags: + list_relations_per_page: 10000 + list_relations_page_limit: 100 + +``` + + \ No newline at end of file diff --git a/website/docs/reference/resource-properties/config.md b/website/docs/reference/resource-properties/config.md index 8190c7dd8ca..1e1867dda04 100644 --- a/website/docs/reference/resource-properties/config.md +++ b/website/docs/reference/resource-properties/config.md @@ -170,14 +170,6 @@ exposures: - - -Support for the `config` property on `semantic_models` was added in dbt Core v1.7 - - - - - ```yml @@ -193,20 +185,10 @@ semantic_models: - - - - -Support for the `config` property on `saved queries` was added in dbt Core v1.7. - - - - - ```yml @@ -226,8 +208,6 @@ saved-queries: - - diff --git a/website/docs/reference/resource-properties/constraints.md b/website/docs/reference/resource-properties/constraints.md index 63582974040..6ba20db090f 100644 --- a/website/docs/reference/resource-properties/constraints.md +++ b/website/docs/reference/resource-properties/constraints.md @@ -65,7 +65,7 @@ models: - type: unique - type: foreign_key to: ref('other_model_name') - to_columns: other_model_column + to_columns: [other_model_column] - type: ... ``` diff --git a/website/docs/reference/resource-properties/deprecation_date.md b/website/docs/reference/resource-properties/deprecation_date.md index 70f150dc465..501fdc30237 100644 --- a/website/docs/reference/resource-properties/deprecation_date.md +++ b/website/docs/reference/resource-properties/deprecation_date.md @@ -55,7 +55,7 @@ Additionally, [`WARN_ERROR_OPTIONS`](/reference/global-configs/warnings) gives a | `DeprecatedReference` | Referencing a model with a past deprecation date | Producer and consumers | | `UpcomingReferenceDeprecation` | Referencing a model with a future deprecation date | Producer and consumers | -** Example ** +**Example** Example output for an `UpcomingReferenceDeprecation` warning: ``` diff --git a/website/docs/reference/resource-properties/freshness.md b/website/docs/reference/resource-properties/freshness.md index 03037e7b681..d68dee4fade 100644 --- a/website/docs/reference/resource-properties/freshness.md +++ b/website/docs/reference/resource-properties/freshness.md @@ -37,8 +37,6 @@ A freshness block is used to define the acceptable amount of time between the mo In the `freshness` block, one or both of `warn_after` and `error_after` can be provided. If neither is provided, then dbt will not calculate freshness snapshots for the tables in this source. - - In most cases, the `loaded_at_field` is required. Some adapters support calculating source freshness from the warehouse metadata tables and can exclude the `loaded_at_field`. If a source has a `freshness:` block, dbt will attempt to calculate freshness for that source: @@ -62,29 +60,9 @@ To exclude a source from freshness calculations, you have two options: - Don't add a `freshness:` block. - Explicitly set `freshness: null`. - - - - -Additionally, the `loaded_at_field` is required to calculate freshness for a table. If a `loaded_at_field` is not provided, then dbt will not calculate freshness for the table. - -Freshness blocks are applied hierarchically: -- A `freshness` and `loaded_at_field` property added to a source will be applied to all tables defined in that source -- A `freshness` and `loaded_at_field` property added to a source _table_ will override any properties applied to the source. - -This is useful when all of the tables in a source have the same `loaded_at_field`, as is often the case. - - ## loaded_at_field - -(Optional on adapters that support pulling freshness from warehouse metadata tables, required otherwise.) - - - -(Required) - - +Optional on adapters that support pulling freshness from warehouse metadata tables, required otherwise.

A column name (or expression) that returns a timestamp indicating freshness. If using a date field, you may have to cast it to a timestamp: diff --git a/website/docs/reference/resource-properties/schema.md b/website/docs/reference/resource-properties/schema.md index 017d93e3235..6b5ba66ff8f 100644 --- a/website/docs/reference/resource-properties/schema.md +++ b/website/docs/reference/resource-properties/schema.md @@ -10,7 +10,7 @@ datatype: schema_name ```yml version: 2 -sources: +[sources](/reference/source-properties): - name: database: schema: @@ -25,7 +25,7 @@ sources: ## Definition The schema name as stored in the database. -This parameter is useful if you want to use a source name that differs from the schema name. +This parameter is useful if you want to use a [source](/reference/source-properties) name that differs from the schema name. :::info BigQuery terminology diff --git a/website/docs/reference/snapshot-configs.md b/website/docs/reference/snapshot-configs.md index e867747dc96..144ecafde9d 100644 --- a/website/docs/reference/snapshot-configs.md +++ b/website/docs/reference/snapshot-configs.md @@ -57,7 +57,7 @@ snapshots: [+](/reference/resource-configs/plus-prefix)[strategy](/reference/resource-configs/strategy): timestamp | check [+](/reference/resource-configs/plus-prefix)[updated_at](/reference/resource-configs/updated_at): [+](/reference/resource-configs/plus-prefix)[check_cols](/reference/resource-configs/check_cols): [] | all - + [+](/reference/resource-configs/plus-prefix)[invalidate_hard_deletes](/reference/resource-configs/invalidate_hard_deletes) : true | false ``` @@ -79,7 +79,7 @@ snapshots: [+](/reference/resource-configs/plus-prefix)[updated_at](/reference/resource-configs/updated_at): [+](/reference/resource-configs/plus-prefix)[check_cols](/reference/resource-configs/check_cols): [] | all [+](/reference/resource-configs/plus-prefix)[snapshot_meta_column_names](/reference/resource-configs/snapshot_meta_column_names): {} - + [+](/reference/resource-configs/plus-prefix)[invalidate_hard_deletes](/reference/resource-configs/invalidate_hard_deletes) : true | false ``` @@ -113,7 +113,7 @@ snapshots: [updated_at](/reference/resource-configs/updated_at): [check_cols](/reference/resource-configs/check_cols): [] | all [snapshot_meta_column_names](/reference/resource-configs/snapshot_meta_column_names): {} - + [invalidate_hard_deletes](/reference/resource-configs/invalidate_hard_deletes) : true | false ``` @@ -125,7 +125,7 @@ snapshots: -Configurations can be applied to snapshots using [YAML syntax](/docs/build/snapshots), available in Versionless and dbt v1.9 and higher, in the the `snapshot` directory file. +Configurations can be applied to snapshots using the [YAML syntax](/docs/build/snapshots), available in Versionless and dbt v1.9 and higher, in the `snapshot` directory file. @@ -140,7 +140,7 @@ Configurations can be applied to snapshots using [YAML syntax](/docs/build/snaps [strategy](/reference/resource-configs/strategy)="timestamp" | "check", [updated_at](/reference/resource-configs/updated_at)="", [check_cols](/reference/resource-configs/check_cols)=[""] | "all" - [snapshot_meta_column_names](/reference/resource-configs/snapshot_meta_column_names)={} + [invalidate_hard_deletes](/reference/resource-configs/invalidate_hard_deletes) : true | false ) }} ``` @@ -236,7 +236,7 @@ snapshots: -Configurations can be applied to snapshots using [YAML syntax](/docs/build/snapshots), available in Versionless and dbt v1.9 and higher, in the the `snapshot` directory file. +Configurations can be applied to snapshots using [YAML syntax](/docs/build/snapshots), available in Versionless and dbt v1.9 and higher, in the `snapshot` directory file. diff --git a/website/docusaurus.config.js b/website/docusaurus.config.js index b68e2e8ec5c..a6cb4e40628 100644 --- a/website/docusaurus.config.js +++ b/website/docusaurus.config.js @@ -82,7 +82,7 @@ var siteSettings = { "https://www.getdbt.com/resources/webinars/dbt-cloud-demos-with-experts/?utm_medium=i[…]ly-demos_aw&utm_content=biweekly-demos____&utm_term=all_all__", // Set community spotlight member on homepage // This is the ID for a specific file under docs/community/spotlight - communitySpotlightMember: "meagan-palmer", + communitySpotlightMember: "original-dbt-athena-maintainers", prism: { theme: (() => { var theme = themes.nightOwl; @@ -209,7 +209,7 @@ var siteSettings = { >