Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Python models Dataproc Serverless setup with packages #5920

Open
wants to merge 5 commits into
base: current
Choose a base branch
from

Conversation

LouisAuneau
Copy link

@LouisAuneau LouisAuneau commented Aug 12, 2024

Add description on how to setup python models with Dataproc Serverless using a custom image in order to use third-party packages.

What are you changing in this pull request and why?

In the context of running Python models in Spark using Dataproc, the documentation (python-models.md) says:

Installing packages: If you are using a Dataproc Cluster (as opposed to Dataproc Serverless), you can add third-party packages while creating the cluster.

I dug and found it is possible to run python models using third-party packages in dataproc serverless. It requires to use a custom docker image. This is very well documented on GCP's end. We currently run this in prod without any issue. I added this in the documentation. Let me know if you need more details on how to set this up.

Checklist

Adding or removing pages (delete if not applicable):
N/A

Add description on how to setup dataproc serverless with a custom image in order to use third-party packages.
@LouisAuneau LouisAuneau requested a review from a team as a code owner August 12, 2024 21:04
Copy link

welcome bot commented Aug 12, 2024

Hello!👋 Thanks for contributing to the dbt product documentation and opening this pull request! ✨
We use Markdown and some HTML to write the dbt product documentation. When writing content, you can use our style guide and content types to understand our writing standards and how we organize information in the dbt product docs.
We'll review your contribution and respond as soon as we can. 😄

Copy link

vercel bot commented Aug 12, 2024

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name Status Preview Updated (UTC)
docs-getdbt-com 🛑 Canceled (Inspect) Sep 19, 2024 0:17am

@runleonarun runleonarun added the new contributor Label for first-time contributors label Aug 12, 2024
@github-actions github-actions bot added content Improvements or additions to content size: small This change will take 1 to 2 days to address labels Aug 12, 2024
Copy link

vercel bot commented Aug 14, 2024

@matthewshaver is attempting to deploy a commit to the dbt-labs Team on Vercel.

A member of the Team first needs to authorize it.

@matthewshaver
Copy link
Contributor

Thank you @LouisAuneau ! Just waiting for an SME on our side to review. Hope to have that shortly

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
content Improvements or additions to content new contributor Label for first-time contributors size: small This change will take 1 to 2 days to address
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants