From 5e52b441947d58aadbeb907786e05943c09990aa Mon Sep 17 00:00:00 2001 From: Natalie Fiann Date: Tue, 5 Nov 2024 13:51:13 +0000 Subject: [PATCH 1/8] Added section to Python models doc to discuss third party packages following this the thread: https://dbt-labs.slack.com/archives/C05FWBP9X1U/p1730272033637189 --- website/docs/docs/build/python-models.md | 34 ++++++++++++++++++++++++ 1 file changed, 34 insertions(+) diff --git a/website/docs/docs/build/python-models.md b/website/docs/docs/build/python-models.md index 811379a0d2c..f77864d4543 100644 --- a/website/docs/docs/build/python-models.md +++ b/website/docs/docs/build/python-models.md @@ -660,6 +660,40 @@ models: **Docs:** ["Developer Guide: Snowpark Python"](https://docs.snowflake.com/en/developer-guide/snowpark/python/index.html) +#### Third-party snowflake packages + +To use a third-party Snowflake package that isn't available in Snowflake Anaconda, upload your package by following [this example](https://docs.snowflake.com/en/developer-guide/udf/python/udf-python-packages#importing-packages-through-a-snowflake-stage) then, configure `imports` in the dbt Python model to reference to the zip file in your Snowflake staging. + +Here’s a complete example configuration, including using `imports` in a Python model: + +```python + +import sys +from snowflake.snowpark.types import StructType, FloatType, StringType, StructField + +def model( dbt, session): + + dbt.config( + materialized='table', + imports = ['@dbt_integration_test/iris.csv'], + use_anonymous_sproc = False +) +schema_for_data_file = StructType([ + StructField("length1", FloatType()), + StructField("width1", FloatType()), + StructField("length2", FloatType()), + StructField("width2", FloatType()), + StructField("variety", StringType()), +]) +df = session.read.schema(schema_for_data_file).option("field_delimiter", ",").schema(schema_for_data_file).csv("@dbt_integration_test/iris.csv") +return df + +``` + +In this example, dbt is configured to locate the `iris.csv` file in the designated Snowflake stage, `@dbt_integration_test`. + +For more information on using this configuration, refer to [test_python_model.py](https://github.com/dbt-labs/dbt-snowflake/blob/1d299923e34c96f2e96a5215ac196658f86ce1d1/tests/functional/adapter/test_python_model.py#L90). +
From e277d2de127a88d6c314914ea386f5c844785171 Mon Sep 17 00:00:00 2001 From: nataliefiann <120089939+nataliefiann@users.noreply.github.com> Date: Tue, 5 Nov 2024 14:56:36 +0000 Subject: [PATCH 2/8] Update website/docs/docs/build/python-models.md Co-authored-by: Mirna Wong <89008547+mirnawong1@users.noreply.github.com> --- website/docs/docs/build/python-models.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/website/docs/docs/build/python-models.md b/website/docs/docs/build/python-models.md index f77864d4543..9b4cd5f6d74 100644 --- a/website/docs/docs/build/python-models.md +++ b/website/docs/docs/build/python-models.md @@ -690,7 +690,7 @@ return df ``` -In this example, dbt is configured to locate the `iris.csv` file in the designated Snowflake stage, `@dbt_integration_test`. +This example uses `imports = ['@dbt_integration_test/iris.csv'],`, which tells dbt to locate the `iris.csv` file in the designated Snowflake stage, `@dbt_integration_test`. For more information on using this configuration, refer to [test_python_model.py](https://github.com/dbt-labs/dbt-snowflake/blob/1d299923e34c96f2e96a5215ac196658f86ce1d1/tests/functional/adapter/test_python_model.py#L90). From 66c83c0c9fa3aaa97abae0478e8208d6d2757025 Mon Sep 17 00:00:00 2001 From: nataliefiann <120089939+nataliefiann@users.noreply.github.com> Date: Tue, 5 Nov 2024 14:56:45 +0000 Subject: [PATCH 3/8] Update website/docs/docs/build/python-models.md Co-authored-by: Mirna Wong <89008547+mirnawong1@users.noreply.github.com> --- website/docs/docs/build/python-models.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/website/docs/docs/build/python-models.md b/website/docs/docs/build/python-models.md index 9b4cd5f6d74..e7ddc56a8d5 100644 --- a/website/docs/docs/build/python-models.md +++ b/website/docs/docs/build/python-models.md @@ -660,7 +660,7 @@ models: **Docs:** ["Developer Guide: Snowpark Python"](https://docs.snowflake.com/en/developer-guide/snowpark/python/index.html) -#### Third-party snowflake packages +#### Third-party Snowflake packages To use a third-party Snowflake package that isn't available in Snowflake Anaconda, upload your package by following [this example](https://docs.snowflake.com/en/developer-guide/udf/python/udf-python-packages#importing-packages-through-a-snowflake-stage) then, configure `imports` in the dbt Python model to reference to the zip file in your Snowflake staging. From 6823f969d6be03a57035d5399b0bf97c0ab8e0d1 Mon Sep 17 00:00:00 2001 From: Natalie Fiann Date: Mon, 11 Nov 2024 13:00:20 +0000 Subject: [PATCH 4/8] Updated code --- website/docs/docs/build/python-models.md | 37 ++++++++++++------------ 1 file changed, 18 insertions(+), 19 deletions(-) diff --git a/website/docs/docs/build/python-models.md b/website/docs/docs/build/python-models.md index f77864d4543..5a6b52d6ff5 100644 --- a/website/docs/docs/build/python-models.md +++ b/website/docs/docs/build/python-models.md @@ -668,31 +668,30 @@ Here’s a complete example configuration, including using `imports` in a Python ```python -import sys -from snowflake.snowpark.types import StructType, FloatType, StringType, StructField +def model(dbt, session): + dbt.config(materialized = "table", packages = ["nltk", "pandas"], imports = ['@DBT_DEPS/nltk_data/sentiment/vader_lexicon.zip']) + + df_reviews = dbt.ref("stg_reviews") + + move_files() + + nltk.data.path.append(BASE_TEMP_DIR) + + pandas_df = df_reviews.to_pandas() + + sia = SentimentIntensityAnalyzer() + + pandas_df["REVIEW_POSITIVE"] = pandas_df["REVIEW_TEXT"].apply(lambda x:sia.polarity_scores(x)['compound'] > 0) -def model( dbt, session): + final_df = session.write_pandas(pandas_df, "write_pandas_table", auto_create_table=True, table_type="temp") - dbt.config( - materialized='table', - imports = ['@dbt_integration_test/iris.csv'], - use_anonymous_sproc = False -) -schema_for_data_file = StructType([ - StructField("length1", FloatType()), - StructField("width1", FloatType()), - StructField("length2", FloatType()), - StructField("width2", FloatType()), - StructField("variety", StringType()), -]) -df = session.read.schema(schema_for_data_file).option("field_delimiter", ",").schema(schema_for_data_file).csv("@dbt_integration_test/iris.csv") -return df + return final_df.select(col("order_id"), col("review_text"), col("review_positive")) ``` -In this example, dbt is configured to locate the `iris.csv` file in the designated Snowflake stage, `@dbt_integration_test`. +In this example, dbt is configured to locate the `vader_lexicon.zip` file in the designated Snowflake stage, `@DBT_DEPS`. -For more information on using this configuration, refer to [test_python_model.py](https://github.com/dbt-labs/dbt-snowflake/blob/1d299923e34c96f2e96a5215ac196658f86ce1d1/tests/functional/adapter/test_python_model.py#L90). +For more information on using this configuration, refer to [Snowflake's documentation](https://community.snowflake.com/s/article/how-to-use-other-python-packages-in-snowpark) on uploading and using other python packages in Snowpark not published on Snowflake's Anaconda channel.
From 439bfc019fdb9430d9eccdd936cc4c3bad7c2f74 Mon Sep 17 00:00:00 2001 From: Natalie Fiann Date: Tue, 12 Nov 2024 11:34:00 +0000 Subject: [PATCH 5/8] Updated code example --- website/docs/docs/build/python-models.md | 38 +++++++++++++----------- 1 file changed, 20 insertions(+), 18 deletions(-) diff --git a/website/docs/docs/build/python-models.md b/website/docs/docs/build/python-models.md index d1002948f6c..d84438c771a 100644 --- a/website/docs/docs/build/python-models.md +++ b/website/docs/docs/build/python-models.md @@ -669,30 +669,32 @@ Here’s a complete example configuration, including using `imports` in a Python ```python def model(dbt, session): - dbt.config(materialized = "table", packages = ["nltk", "pandas"], imports = ['@DBT_DEPS/nltk_data/sentiment/vader_lexicon.zip']) + # Configure the model + dbt.config( + materialized="table", + imports=["@mystage/mycustompackage.zip"], # Specify the external package location + ) - df_reviews = dbt.ref("stg_reviews") - - move_files() - - nltk.data.path.append(BASE_TEMP_DIR) - - pandas_df = df_reviews.to_pandas() - - sia = SentimentIntensityAnalyzer() - - pandas_df["REVIEW_POSITIVE"] = pandas_df["REVIEW_TEXT"].apply(lambda x:sia.polarity_scores(x)['compound'] > 0) - - final_df = session.write_pandas(pandas_df, "write_pandas_table", auto_create_table=True, table_type="temp") - - return final_df.select(col("order_id"), col("review_text"), col("review_positive")) + # Example data transformation using the imported package + # (Assuming `some_external_package` has a function we can call) + data = { + "name": ["Alice", "Bob", "Charlie"], + "score": [85, 90, 88] + } + df = pd.DataFrame(data) + + # Process data with the external package + df["adjusted_score"] = df["score"].apply(lambda x: some_external_package.adjust_score(x)) + + # Return the DataFrame as the model output + return df ``` -In this example, dbt is configured to locate the `vader_lexicon.zip` file in the designated Snowflake stage, `@DBT_DEPS`. - For more information on using this configuration, refer to [Snowflake's documentation](https://community.snowflake.com/s/article/how-to-use-other-python-packages-in-snowpark) on uploading and using other python packages in Snowpark not published on Snowflake's Anaconda channel. +To use external libraries, you can also use the [`zip`](https://github.com/phdata/dbt_snowpark_sentiment_example/blob/2c5528278e14dba678fb7773cca2d47f8adbeb4d/models/reviews.py#L30) approach. +
From 3eff31498b635619e6883650d3fad33793da99f7 Mon Sep 17 00:00:00 2001 From: nataliefiann <120089939+nataliefiann@users.noreply.github.com> Date: Tue, 12 Nov 2024 13:02:49 +0000 Subject: [PATCH 6/8] Update website/docs/docs/build/python-models.md Co-authored-by: Mirna Wong <89008547+mirnawong1@users.noreply.github.com> --- website/docs/docs/build/python-models.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/website/docs/docs/build/python-models.md b/website/docs/docs/build/python-models.md index d84438c771a..74529739e14 100644 --- a/website/docs/docs/build/python-models.md +++ b/website/docs/docs/build/python-models.md @@ -662,7 +662,7 @@ models: #### Third-party Snowflake packages -To use a third-party Snowflake package that isn't available in Snowflake Anaconda, upload your package by following [this example](https://docs.snowflake.com/en/developer-guide/udf/python/udf-python-packages#importing-packages-through-a-snowflake-stage) then, configure `imports` in the dbt Python model to reference to the zip file in your Snowflake staging. +To use a third-party Snowflake package that isn't available in Snowflake Anaconda, upload your package by following [this example](https://docs.snowflake.com/en/developer-guide/udf/python/udf-python-packages#importing-packages-through-a-snowflake-stage), and then configure the `imports` setting in the dbt Python model to reference to the zip file in your Snowflake staging. Here’s a complete example configuration, including using `imports` in a Python model: From d5d0305b342bf26c9501ef3612778f26cde37f22 Mon Sep 17 00:00:00 2001 From: nataliefiann <120089939+nataliefiann@users.noreply.github.com> Date: Tue, 12 Nov 2024 13:03:15 +0000 Subject: [PATCH 7/8] Update website/docs/docs/build/python-models.md Co-authored-by: Mirna Wong <89008547+mirnawong1@users.noreply.github.com> --- website/docs/docs/build/python-models.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/website/docs/docs/build/python-models.md b/website/docs/docs/build/python-models.md index 74529739e14..27f900badb5 100644 --- a/website/docs/docs/build/python-models.md +++ b/website/docs/docs/build/python-models.md @@ -664,7 +664,7 @@ models: To use a third-party Snowflake package that isn't available in Snowflake Anaconda, upload your package by following [this example](https://docs.snowflake.com/en/developer-guide/udf/python/udf-python-packages#importing-packages-through-a-snowflake-stage), and then configure the `imports` setting in the dbt Python model to reference to the zip file in your Snowflake staging. -Here’s a complete example configuration, including using `imports` in a Python model: +Here’s a complete example configuration using a zip file, including using `imports` in a Python model: ```python From c9872d86746273190eafdcfd1cbd0bb17d514eed Mon Sep 17 00:00:00 2001 From: nataliefiann <120089939+nataliefiann@users.noreply.github.com> Date: Tue, 12 Nov 2024 13:03:56 +0000 Subject: [PATCH 8/8] Update website/docs/docs/build/python-models.md --- website/docs/docs/build/python-models.md | 1 - 1 file changed, 1 deletion(-) diff --git a/website/docs/docs/build/python-models.md b/website/docs/docs/build/python-models.md index 27f900badb5..28136f91e9c 100644 --- a/website/docs/docs/build/python-models.md +++ b/website/docs/docs/build/python-models.md @@ -693,7 +693,6 @@ def model(dbt, session): For more information on using this configuration, refer to [Snowflake's documentation](https://community.snowflake.com/s/article/how-to-use-other-python-packages-in-snowpark) on uploading and using other python packages in Snowpark not published on Snowflake's Anaconda channel. -To use external libraries, you can also use the [`zip`](https://github.com/phdata/dbt_snowpark_sentiment_example/blob/2c5528278e14dba678fb7773cca2d47f8adbeb4d/models/reviews.py#L30) approach.