You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Running 50 cttsov2 samples simultaneously is causing the copy jobs step to fall over.
A quick workaround is to front a lambda that can query how many of the copy sfns are running simultaneously and loop until this value is below a certain value and then allow the copy to occur.
A more comprehensive solution would be to centralise all copy jobs and place the step function behind an SQS queue with a max concurrency limit.
The complication here is that copy job would need to hang, not complete, after the job has been launched.
This may be resolved by placing the 'waitfortasktoken' into another step function that can place the initial task token to a database.
A second sqs is then needed to listen to copy job events completing, find the corresponding task token in the database, and unlock the initial copy sfn. Would also need to requeue failed jobs as well. The initial SFN would then also need to release the step function that initialised the sqs queue.
The text was updated successfully, but these errors were encountered:
Running 50 cttsov2 samples simultaneously is causing the copy jobs step to fall over.
A quick workaround is to front a lambda that can query how many of the copy sfns are running simultaneously and loop until this value is below a certain value and then allow the copy to occur.
A more comprehensive solution would be to centralise all copy jobs and place the step function behind an SQS queue with a max concurrency limit.
The complication here is that copy job would need to hang, not complete, after the job has been launched.
This may be resolved by placing the 'waitfortasktoken' into another step function that can place the initial task token to a database.
A second sqs is then needed to listen to copy job events completing, find the corresponding task token in the database, and unlock the initial copy sfn. Would also need to requeue failed jobs as well. The initial SFN would then also need to release the step function that initialised the sqs queue.
The text was updated successfully, but these errors were encountered: