Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix the job jitter/delay around sidekiq retries #695

Draft
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

gitstart-app[bot]
Copy link
Contributor

@gitstart-app gitstart-app bot commented Dec 12, 2024

This PR was created by GitStart to address the requirements from this ticket: TRAB-687.


What this PR achieves?

  • Adds retryable_job.rb
  • Add test suites for retryable_job_spec.rb module
  • Update find_build_job.rb
  • Update find_build_job_spec.rb by adding more tests and leaving the previous ones unchanged.
  • Migrate all other jobs and tests with siddekiq_retry_in , sidekiq_entries_exhausted and sidekiq_retry_in_block

@gitstart-tramline
Copy link

gitstart-tramline commented Dec 12, 2024

Hi @kitallis

We’d like your guidance on testing this PR. Could you help validate the scenario provided in the attached screenshot, or are there any additional cases we should consider to ensure the fix is thoroughly tested?

Screenshot 2024-12-11 at 6 59 37 PM

Additionally, are there specific benchmarks or indicators we should monitor to confirm the changes are working as intended?

@kitallis
Copy link
Member

Hi @kitallis

We’d like your guidance on testing this PR. Could you help validate the scenario provided in the attached screenshot, or are there any additional cases we should consider to ensure the fix is thoroughly tested?

Screenshot 2024-12-11 at 6 59 37 PM

Additionally, are there specific benchmarks or indicators we should monitor to confirm the changes are working as intended?

I think this cannot be tested reliably on a unit-level. We'll have to do some sort of real-time test. What we can do is issue 100s of retries for a particular job, perhaps StoreRollouts::AppStore::FindLiveReleaseJob since that has seen the most issues due to jitter and ensure that it runs on the correct ticks of time, and the variability error under controlled circumstances (like a local env), with the correct backoffs set, is not off by more than a 1 minute or so.

@kitallis
Copy link
Member

Separately, I think the idea for this change is generally fine, but the implementation seems too complicated. If this is your final approach, we should stop and re-evaluate because I think there are issues with it that won't scale.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants