Skip to content

Commit

Permalink
Spike out mesh guide
Browse files Browse the repository at this point in the history
  • Loading branch information
gwenwindflower committed Sep 14, 2023
1 parent 1527282 commit dfbf23f
Show file tree
Hide file tree
Showing 4 changed files with 105 additions and 0 deletions.
35 changes: 35 additions & 0 deletions website/docs/guides/best-practices/how-we-mesh/mesh-1-intro.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,35 @@
---
title: "Intro to dbt Mesh"
description: Getting started with dbt Mesh patterns
hoverSnippet: Learn how to get started with dbt Mesh
---

## What is dbt Mesh?

Historically, building data teams has involved two extremes, building a centralized team or using embedded analysts. More recently, hub-and-spoke models have become popular as a way to balance the tradeoffs: using a centralized platform team _and_ embedded analysts, allowing embeds to develop domain expertise while the central team focuses on building a strong operational foundation. A major difficultly of this model though is managing the compplexity of dependencies, goverance, and workflows between all groups — creating friction in monorepos or complexity and silos in multi-repos. Ideally, you want to teams to be able to work independently, but also be able to collaborate; sharing data, code, and best practices. dbt Mesh provides the tooling for teams to finally achieve this.

dbt Mesh is not a product, but a pattern, enabled a convergence of several features in dbt Cloud. It’s inspired by dbt’s best practices and ideas from [data mesh](https://en.wikipedia.org/wiki/Data_mesh). These features include:

- Cross-project references - this is the core feature that enables a mesh structure. `ref`s now work across projects in dbt Cloud-enabled projects on Enterprise plans.
- Governance - dbt Cloud’s new governance features allow you to manage access and permissions across projects.
- Groups - groups allow you to assign models to subsets of models within a project.
- Access - access configs allow you to control who can view and reference models both within and across projects.
- Versioning - building a dbt Mesh involves treating your data models as stable APIs. To achieve this you need mechanisms to version your models and allow graceful adoption and deprecation of models as they evolve.
- Contracts - data contracts set strict expectations on the shape of the data to ensure data changes upstream of dbt or within a project's logic don't break downstream consumers.

## Who is dbt Mesh for?

dbt Mesh is not for every organization! If you're just starting your dbt journey, don't worry about building a dbt Mesh right away, it increases some meta-complexity around managing your projects that could distract from building initial value in dbt. However, if you're already using dbt and your project has started to experience any of the following, you're likely ready to start exploring a dbt Mesh:

- The number of models in your project is degrading performance and slowing down development.
- Teams have developed separate workflows and need to decouple development.
- Security and governance requirements are increasing and would benefit from increased isolation.

dbt Cloud is designed to coordinate the features above and simplify the meta-complexities (such as scoped CI and multi-project lineage) to solve for these problems.

## Learning goals

- Understand the purpose and tradeoffs of building a dbt Mesh.
- Develop an intuition for various dbt Mesh patterns and how to design a dbt Mesh for your organization.
- Establish recommended steps to incrementally adopt a dbt Mesh pattern in your dbt implementation.
- Offer tooling to help you more quickly and easily implement your dbt Mesh plan.
Original file line number Diff line number Diff line change
@@ -0,0 +1,30 @@
---
title: "Deciding how to structure your mesh"
description: Getting started with dbt Mesh patterns
hoverSnippet: Learn how to get started with dbt Mesh
---

## Exploring mesh patterns

Building a mesh is not a one-size-fits-all process. In fact, it's the opposite, it's about customizing your project structure to fit _your_ team and _your_ data. Often we've had to fit the data team and project structure into our company's org chart, or manage everything in one project to handle the constraints of our data and warehouse. dbt Mesh allows us to mold our organizational knowledge graph to our organizational people graph, bringing people and data closer together rather than compromising one for the other.

## Vertical splits

Vertical splits are about separating out layers of transformation in the DAG order. For example, splitting up staging and mart layers. Often the vertical separation will be based around security and governance requirements, such as separating out PII data from non-PII data and restricting raw data access to a platform team that's responsible for landing and cleaning data.

## Horizontal splits

Horizonal splits are about splitting up the data based on source or domain. Often the horizontal separation will be based around team consumption patterns, such as splitting out marketing data and financial data. Another common vector of horizontal splitting is data from different sources, such as click event data and transactional ecommerce data. These splits are often based around the shape and size of the data and how it's used, rather than the security or governance requirements.

## Combining these divisions

- These are not either/or techniques, you can and should combine them in any way that makes sense for your organization.

- **DRY applies to underlying data not just code.** Regardless of your split, you should not be sourcing the same rows and columns into multiple meshes. Working within a mesh structure it becomes increasingly important that we don’t duplicate work, which creates surface error for conflicts and erodes the single source of truth we're trying to create in our dbt project.

## Monorepo vs multi-repo

- A dbt Mesh can exist as multiple projects in a single repo (monorepo) or as multiple projects in their own repositories (multi-repo).
- Monorepos are often easier to get started with, but can become unwieldy as the number of models and teams grow.
- If you're a smaller team looking primarily to speed up and simplify development, a monorepo is likely the right choice.
- If you're a larger team with multiple groups, and need to decouple projects for security and enablement of different development styles and rhythms, a multi-repo is your best bet.
Original file line number Diff line number Diff line change
@@ -0,0 +1,29 @@
---
title: "Implementing your mesh plan"
description: Getting started with dbt Mesh patterns
hoverSnippet: Learn how to get started with dbt Mesh
---

## Implementing a dbt Mesh

Let's examine an outline of steps to start implementing a dbt Mesh in your organization.

### Research your current structure

- Look at your selectors to figure out how people are grouping models right now.
- Talk to teams about what sort of separation is naturally existing right now
- Are there various domains people are focused on?
- Are there various sizes, shapes, and sources of data that get handled separately (such as click event data)?
- Are there people focused on separate levels of transformation, such as landing and staging data or building marts?

## Add groups and access

Once you have a sense of some initial groupings, implement group and access permissions within a project.

- Incrementally start building your jobs based on these groups (we would recommend in parallel to your production jobs until you’re sure about them) to feel out that you’ve drawn the lines in the right place.

## Do the splits

- When you’ve confirmed the right groups, use `dbt-meshify` to pull chunks out into their own projects.
- Do _one_ group at a time, using the groups as your selectors.
- Do _not_ refactor as you migrate, however tempting that may be. Focus on getting 1-to-1 parity and log any issues you find in doing the migration for later. Once you’ve fully landed the project then you can start optimizing it for its new life as part of the mesh.
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
---
title: "Conclusion"
description: Getting started with dbt Mesh patterns
hoverSnippet: Learn how to get started with dbt Mesh
---

## Conclusion

dbt Mesh is a powerful new pattern for data transformation. It helps adapt teams and their data towards each other, rather than making arbitrary decisions based on the constraints of either one. By creating alignment between your people and data flows, developers can move faster, analysts can be more productive, and data consumers can be more confident in the data they use.

You can incrementally adopt the ideas in this guide in your organization as you hit constraints. There's no pressure to adopt this as the _right pattern_ to build with. That said, familiarizing yourself with dbt Mesh concepts and thinking through how they can apply to your organization will help you make better decisions as you grow. We hope this guide has given you a good starting point to do that.

0 comments on commit dfbf23f

Please sign in to comment.