From 75f070852acdace11f19496888b19803035e8c4f Mon Sep 17 00:00:00 2001 From: Mirna Wong <89008547+mirnawong1@users.noreply.github.com> Date: Mon, 25 Nov 2024 11:55:25 +0000 Subject: [PATCH 1/3] Update python-models.md This PR addresses issue below which adds alternatives to python model debugging Resolves #3327 --- website/docs/docs/build/python-models.md | 28 ++++++++++++++++++++++++ 1 file changed, 28 insertions(+) diff --git a/website/docs/docs/build/python-models.md b/website/docs/docs/build/python-models.md index 28136f91e9c..cea7fbc89fe 100644 --- a/website/docs/docs/build/python-models.md +++ b/website/docs/docs/build/python-models.md @@ -598,6 +598,34 @@ Python models have capabilities that SQL models do not. They also have some draw - **These capabilities are very new.** As data warehouses develop new features, we expect them to offer cheaper, faster, and more intuitive mechanisms for deploying Python transformations. **We reserve the right to change the underlying implementation for executing Python models in future releases.** Our commitment to you is around the code in your model `.py` files, following the documented capabilities and guidance we're providing here. - **Lack of `print()` support.** The data platform runs and compiles your Python model without dbt's oversight. This means it doesn't display the output of commands such as Python's built-in [`print()`](https://docs.python.org/3/library/functions.html#print) function in dbt's logs. + + +The following explains other methods you can use for debugging, such as writing messages to a dataframe column: + +- Using platform logs: Use your data platform's logs to debug your Python models. +- Return logs as a dataframe: Create a dataframe containing your logs and build it into the warehouse. +- Develop locally with DuckDB: Test and debug your models locally using DuckDB before deploying them. + +Here's an example of debugging in a Python model: + +```python +def model(dbt, session): + dbt.config( + materialized = "table" + ) + + df = dbt.ref("my_source_table").df() + + # One option for debugging: write messages to temporary table column + # Pros: visibility + # Cons: won't work if table isn't building for some reason + msg = "something" + df["debugging"] = f"My debug message here: {msg}" + + return df +``` + + As a general rule, if there's a transformation you could write equally well in SQL or Python, we believe that well-written SQL is preferable: it's more accessible to a greater number of colleagues, and it's easier to write code that's performant at scale. If there's a transformation you _can't_ write in SQL, or where ten lines of elegant and well-annotated Python could save you 1000 lines of hard-to-read Jinja-SQL, Python is the way to go. ## Specific data platforms {#specific-data-platforms} From 685a0c3db7a0d478efc3cc8e128678242aead3a0 Mon Sep 17 00:00:00 2001 From: mirnawong1 Date: Mon, 25 Nov 2024 12:21:09 +0000 Subject: [PATCH 2/3] update schema config --- website/docs/reference/resource-configs/schema.md | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/website/docs/reference/resource-configs/schema.md b/website/docs/reference/resource-configs/schema.md index 1e2ff47729c..b239e26bd87 100644 --- a/website/docs/reference/resource-configs/schema.md +++ b/website/docs/reference/resource-configs/schema.md @@ -108,7 +108,9 @@ This would result in the test results being stored in the `test_results` schema. Refer to [Usage](#usage) for more examples. ## Definition -Optionally specify a custom schema for a [model](/docs/build/sql-models) or [seed](/docs/build/seeds). (To specify a schema for a [snapshot](/docs/build/snapshots), use the [`target_schema` config](/reference/resource-configs/target_schema)). +Optionally specify a custom schema for a [model](/docs/build/sql-models), [seed](/docs/build/seeds), [snapshot](/docs/build/snapshots), [saved query](/docs/build/saved-queries), or [test](/docs/build/data-tests). + +For users on dbt Cloud v1.8 or earlier, use the [`target_schema` config](/reference/resource-configs/target_schema) to specify a custom schema for a snapshot. When dbt creates a relation (/) in a database, it creates it as: `{{ database }}.{{ schema }}.{{ identifier }}`, e.g. `analytics.finance.payments` From eb2a69616b0aa1932186cb13a8e0a24f51cdfb52 Mon Sep 17 00:00:00 2001 From: Mirna Wong <89008547+mirnawong1@users.noreply.github.com> Date: Mon, 25 Nov 2024 13:11:07 +0000 Subject: [PATCH 3/3] Update python-models.md --- website/docs/docs/build/python-models.md | 50 ++++++++++++------------ 1 file changed, 25 insertions(+), 25 deletions(-) diff --git a/website/docs/docs/build/python-models.md b/website/docs/docs/build/python-models.md index cea7fbc89fe..2267da192a9 100644 --- a/website/docs/docs/build/python-models.md +++ b/website/docs/docs/build/python-models.md @@ -598,33 +598,33 @@ Python models have capabilities that SQL models do not. They also have some draw - **These capabilities are very new.** As data warehouses develop new features, we expect them to offer cheaper, faster, and more intuitive mechanisms for deploying Python transformations. **We reserve the right to change the underlying implementation for executing Python models in future releases.** Our commitment to you is around the code in your model `.py` files, following the documented capabilities and guidance we're providing here. - **Lack of `print()` support.** The data platform runs and compiles your Python model without dbt's oversight. This means it doesn't display the output of commands such as Python's built-in [`print()`](https://docs.python.org/3/library/functions.html#print) function in dbt's logs. - +- -The following explains other methods you can use for debugging, such as writing messages to a dataframe column: - -- Using platform logs: Use your data platform's logs to debug your Python models. -- Return logs as a dataframe: Create a dataframe containing your logs and build it into the warehouse. -- Develop locally with DuckDB: Test and debug your models locally using DuckDB before deploying them. - -Here's an example of debugging in a Python model: - -```python -def model(dbt, session): - dbt.config( - materialized = "table" - ) - - df = dbt.ref("my_source_table").df() - - # One option for debugging: write messages to temporary table column - # Pros: visibility - # Cons: won't work if table isn't building for some reason - msg = "something" - df["debugging"] = f"My debug message here: {msg}" + The following explains other methods you can use for debugging, such as writing messages to a dataframe column: + + - Using platform logs: Use your data platform's logs to debug your Python models. + - Return logs as a dataframe: Create a dataframe containing your logs and build it into the warehouse. + - Develop locally with DuckDB: Test and debug your models locally using DuckDB before deploying them. + + Here's an example of debugging in a Python model: - return df -``` - + ```python + def model(dbt, session): + dbt.config( + materialized = "table" + ) + + df = dbt.ref("my_source_table").df() + + # One option for debugging: write messages to temporary table column + # Pros: visibility + # Cons: won't work if table isn't building for some reason + msg = "something" + df["debugging"] = f"My debug message here: {msg}" + + return df + ``` + As a general rule, if there's a transformation you could write equally well in SQL or Python, we believe that well-written SQL is preferable: it's more accessible to a greater number of colleagues, and it's easier to write code that's performant at scale. If there's a transformation you _can't_ write in SQL, or where ten lines of elegant and well-annotated Python could save you 1000 lines of hard-to-read Jinja-SQL, Python is the way to go.