Provide a Spark Streaming job construct on EMR Serverless #386

vgkowski · 2024-02-01T15:34:07Z

Currently, implementing a Spark Streaming job on EMR Serverless requires additional tooling to implement streaming best practices. We can provide a construct similar to the SparkEmrServerlessJob but for streaming. Main features it should support:

Checkpointing the Spark state on a resilient storage
Graceful update of the Spark Streaming application. When deploying a new version of the Spark code, the construct should gracefully shutdown the current Spark Streaming job and then start the new one from the same checkpoint
automatically retry the Spark Streaming job when a failure is detected. The retry mechanism should have a maximum number of retry, an exponential backoff retry mechanism and an alerting

The text was updated successfully, but these errors were encountered:

omar-diop · 2024-04-05T09:37:43Z

Hi @vgkowski

I'm currently exploring implementing Spark Structured Streaming on EMR Serverless and seeking to incorporate best practices, including job retry mechanisms.

Initially, I planned to maintain a single job running continuously to handle streaming. However, I've recognized that to properly implement retry policies i need to do it manually and to figure out a solution.

You mentioned that additional tooling is required. I'm curious if you've discovered solutions or alternative approaches for implementing retry policies within the AWS ecosystem, perhaps utilizing services such as AWS Step Functions to efficiently manage repeated attempts.

Thank you!

vgkowski added the new-feature New feature label Feb 1, 2024

vgkowski self-assigned this Feb 12, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Provide a Spark Streaming job construct on EMR Serverless #386

Provide a Spark Streaming job construct on EMR Serverless #386

vgkowski commented Feb 1, 2024

omar-diop commented Apr 5, 2024 •

edited

Loading

Provide a Spark Streaming job construct on EMR Serverless #386

Provide a Spark Streaming job construct on EMR Serverless #386

Comments

vgkowski commented Feb 1, 2024

omar-diop commented Apr 5, 2024 • edited Loading

omar-diop commented Apr 5, 2024 •

edited

Loading