You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Currently, implementing a Spark Streaming job on EMR Serverless requires additional tooling to implement streaming best practices. We can provide a construct similar to the SparkEmrServerlessJob but for streaming. Main features it should support:
Checkpointing the Spark state on a resilient storage
Graceful update of the Spark Streaming application. When deploying a new version of the Spark code, the construct should gracefully shutdown the current Spark Streaming job and then start the new one from the same checkpoint
automatically retry the Spark Streaming job when a failure is detected. The retry mechanism should have a maximum number of retry, an exponential backoff retry mechanism and an alerting
The text was updated successfully, but these errors were encountered:
I'm currently exploring implementing Spark Structured Streaming on EMR Serverless and seeking to incorporate best practices, including job retry mechanisms.
Initially, I planned to maintain a single job running continuously to handle streaming. However, I've recognized that to properly implement retry policies i need to do it manually and to figure out a solution.
You mentioned that additional tooling is required. I'm curious if you've discovered solutions or alternative approaches for implementing retry policies within the AWS ecosystem, perhaps utilizing services such as AWS Step Functions to efficiently manage repeated attempts.
Currently, implementing a Spark Streaming job on EMR Serverless requires additional tooling to implement streaming best practices. We can provide a construct similar to the
SparkEmrServerlessJob
but for streaming. Main features it should support:The text was updated successfully, but these errors were encountered: