Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

📖 Athena Spark Initial Testing #6640

Open
3 tasks
jhpyke opened this issue Jan 31, 2025 · 0 comments
Open
3 tasks

📖 Athena Spark Initial Testing #6640

jhpyke opened this issue Jan 31, 2025 · 0 comments

Comments

@jhpyke
Copy link
Contributor

jhpyke commented Jan 31, 2025

User Story

As a maintainer of CaDeT Deployments
I need to explore costs and mechanisms involved with Python Models via Athena Spark deployments
So that I can understand whether to enable these for end users.

Value / Purpose

There are tasks that are not easily completable using pure SQL-based transformations, such as fuzzy matching or natural language processing. To enable these tasks, DBT supports 'Python Models', which allow users to build models based on python code rather than SQL. To do this, queries must be submitted to an Athena Spark workgroup, which includes several packages by default and allows for the import of more as pure python (no cPython) zip files.

Useful Contacts

@jhpyke

User Types

No response

Hypothesis

If we build a test pipeline of python models
we will be able to validate the costs and compute times associated with python models.

Proposal

Suggested order of tasks:

Additional Information

No response

Definition of Done

  • An Athena Spark workgroup exists
  • We are able to succesfully build models using it
  • We are able to monitor and compare relative costs via cost-explorer or similar.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: 👀 TODO
Development

No branches or pull requests

1 participant