Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update python-models.md #6531

Merged
merged 3 commits into from
Nov 25, 2024
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
28 changes: 28 additions & 0 deletions website/docs/docs/build/python-models.md
Original file line number Diff line number Diff line change
Expand Up @@ -598,6 +598,34 @@ Python models have capabilities that SQL models do not. They also have some draw
- **These capabilities are very new.** As data warehouses develop new features, we expect them to offer cheaper, faster, and more intuitive mechanisms for deploying Python transformations. **We reserve the right to change the underlying implementation for executing Python models in future releases.** Our commitment to you is around the code in your model `.py` files, following the documented capabilities and guidance we're providing here.
- **Lack of `print()` support.** The data platform runs and compiles your Python model without dbt's oversight. This means it doesn't display the output of commands such as Python's built-in [`print()`](https://docs.python.org/3/library/functions.html#print) function in dbt's logs.

- <Expandable alt_header="Alternatives to using print() in Python models">

The following explains other methods you can use for debugging, such as writing messages to a dataframe column:

- Using platform logs: Use your data platform's logs to debug your Python models.
- Return logs as a dataframe: Create a dataframe containing your logs and build it into the warehouse.
- Develop locally with DuckDB: Test and debug your models locally using DuckDB before deploying them.

Here's an example of debugging in a Python model:

```python
def model(dbt, session):
dbt.config(
materialized = "table"
)

df = dbt.ref("my_source_table").df()

# One option for debugging: write messages to temporary table column
# Pros: visibility
# Cons: won't work if table isn't building for some reason
msg = "something"
df["debugging"] = f"My debug message here: {msg}"

return df
```
</Expandable>

As a general rule, if there's a transformation you could write equally well in SQL or Python, we believe that well-written SQL is preferable: it's more accessible to a greater number of colleagues, and it's easier to write code that's performant at scale. If there's a transformation you _can't_ write in SQL, or where ten lines of elegant and well-annotated Python could save you 1000 lines of hard-to-read Jinja-SQL, Python is the way to go.

## Specific data platforms {#specific-data-platforms}
Expand Down
Loading