You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I'm currently using Spring Batch with remote partitioning and facing resilience challenges on the manager side.
The setup is straightforward: I have a job composed of multiple partitioned steps, distributed via RabbitMQ using persistent queues. On the worker side, resilience is simple to achieve—if a pod crashes, the message remains in the queue and is picked up by another healthy pod. Each reader is restartable, thanks to the use of CounterReader, and everything works smoothly.
On the manager side, job execution is also triggered via a persistent RabbitMQ queue, as part of a broader job lifecycle. When a pod crashes and the job is retried by another pod, it often results in a JobExecutionAlreadyRunningException, which is expected.
My question is: what's the recommended strategy for handling this scenario?
I've seen discussions like this, but I'm hesitant to resort to hardcoded database queries outside the Spring Batch API to resolve the issue.
I understand what’s happening internally and can manually clean things up when necessary. But what I really need is a way to tell Spring Batch: “Take back control of this job, I promise no one else is running it.”
If Spring Batch doesn’t support this natively, what's the cleanest, most robust way to achieve it without breaking encapsulation or relying on custom SQL?
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
-
I'm currently using Spring Batch with remote partitioning and facing resilience challenges on the manager side.
The setup is straightforward: I have a job composed of multiple partitioned steps, distributed via RabbitMQ using persistent queues. On the worker side, resilience is simple to achieve—if a pod crashes, the message remains in the queue and is picked up by another healthy pod. Each reader is restartable, thanks to the use of CounterReader, and everything works smoothly.
On the manager side, job execution is also triggered via a persistent RabbitMQ queue, as part of a broader job lifecycle. When a pod crashes and the job is retried by another pod, it often results in a JobExecutionAlreadyRunningException, which is expected.
My question is: what's the recommended strategy for handling this scenario?
I've seen discussions like this, but I'm hesitant to resort to hardcoded database queries outside the Spring Batch API to resolve the issue.
I understand what’s happening internally and can manually clean things up when necessary. But what I really need is a way to tell Spring Batch: “Take back control of this job, I promise no one else is running it.”
If Spring Batch doesn’t support this natively, what's the cleanest, most robust way to achieve it without breaking encapsulation or relying on custom SQL?
Beta Was this translation helpful? Give feedback.
All reactions