📖 Athena Spark Initial Testing #6640

jhpyke · 2025-01-31T10:20:24Z

User Story

As a maintainer of CaDeT Deployments
I need to explore costs and mechanisms involved with Python Models via Athena Spark deployments
So that I can understand whether to enable these for end users.

Value / Purpose

There are tasks that are not easily completable using pure SQL-based transformations, such as fuzzy matching or natural language processing. To enable these tasks, DBT supports 'Python Models', which allow users to build models based on python code rather than SQL. To do this, queries must be submitted to an Athena Spark workgroup, which includes several packages by default and allows for the import of more as pure python (no cPython) zip files.

Useful Contacts

@jhpyke

User Types

No response

Hypothesis

If we build a test pipeline of python models
we will be able to validate the costs and compute times associated with python models.

Proposal

Suggested order of tasks:

Create a new Athena Workgroup for Athena Spark testing
Update the profiles file to reference the new workgroup
Build some basic models using python per the DBT docs on Python Models

Additional Information

No response

Definition of Done

An Athena Spark workgroup exists
We are able to succesfully build models using it
We are able to monitor and compare relative costs via cost-explorer or similar.

The text was updated successfully, but these errors were encountered:

jhpyke added the story label Jan 31, 2025

jhpyke added this to Analytical Platform Jan 31, 2025

jhpyke added the 📊 CaDeT label Jan 31, 2025

github-project-automation bot moved this to 👀 TODO in Analytical Platform Jan 31, 2025

github-actions bot mentioned this issue Feb 1, 2025

Monthly issue metrics report #6643

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

📖 Athena Spark Initial Testing #6640

📖 Athena Spark Initial Testing #6640

jhpyke commented Jan 31, 2025

📖 Athena Spark Initial Testing #6640

📖 Athena Spark Initial Testing #6640

Comments

jhpyke commented Jan 31, 2025

User Story

Value / Purpose

Useful Contacts

User Types

Hypothesis

Proposal

Additional Information

Definition of Done