Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Python models Dataproc Serverless setup with packages (#5920)
Add description on how to setup python models with Dataproc Serverless using a custom image in order to use third-party packages. ## What are you changing in this pull request and why? In the context of running Python models in Spark using Dataproc, the documentation ([python-models.md](https://github.com/dbt-labs/docs.getdbt.com/blob/current/website/docs/docs/build/python-models.md)) says: > Installing packages: If you are using a Dataproc Cluster (as opposed to Dataproc Serverless), you can add third-party packages while creating the cluster. I dug and found it is possible to run python models using third-party packages in dataproc serverless. It requires to use a custom docker image. This is very well documented on GCP's end. We currently run this in prod without any issue. I added this in the documentation. Let me know if you need more details on how to set this up. ## Checklist - [x] Review the [Content style guide](https://github.com/dbt-labs/docs.getdbt.com/blob/current/contributing/content-style-guide.md) so my content adheres to these guidelines. - [x] For [docs versioning](https://github.com/dbt-labs/docs.getdbt.com/blob/current/contributing/single-sourcing-content.md#about-versioning), review how to [version a whole page](https://github.com/dbt-labs/docs.getdbt.com/blob/current/contributing/single-sourcing-content.md#adding-a-new-version) and [version a block of content](https://github.com/dbt-labs/docs.getdbt.com/blob/current/contributing/single-sourcing-content.md#versioning-blocks-of-content). - [x] Add a checklist item for anything that needs to happen before this PR is merged, such as "needs technical review" or "change base branch." Adding or removing pages (delete if not applicable): N/A --------- Co-authored-by: Matt Shaver <[email protected]> Co-authored-by: Leona B. Campbell <[email protected]> Co-authored-by: Mirna Wong <[email protected]>
- Loading branch information