Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Possible enhancements for --from-quarantine #238

Open
philbudne opened this issue Feb 12, 2024 · 0 comments
Open

Possible enhancements for --from-quarantine #238

philbudne opened this issue Feb 12, 2024 · 0 comments
Labels
enhancement New feature or request
Milestone

Comments

@philbudne
Copy link
Contributor

The --from-quarantine option on a worker (queue reader) causes input to be taken from the PROG-quar queue instead of PROG-in as usual to allow re-processing of stories that were "kicked out" of normal processing for some extraordinary reason, or excessive retries.

There are two issues/inconvenieces:

  1. If the issue that caused the quarantine for some (or all) stories still exists, the stories will be re-written to the quarantine queue, and the program will loop endlessly.
  2. on the other hand, if all stories are processed successfully, the program will eventually block waiting on an empty queue. Typing ^C will exit with a KeyboardInterrupt exception, which in production or staging will cause a sentry.io alert!

The first could be fixed by adding one or more additional x-mc-.... RabbitMQ headers in Worker._exc_headers that identify the hostname and process-id where the exception was caught. Checking that the message header

  1. x-mc-when is >= the start time of the current process
  2. AND that x-mc-host & x-mc-pid (or a combined x-mc-host-pid string) match the current process
    can be taken to mean all messages that were in the quarantine queue at program start have been checked, and any remaining are still problematic.

The second could be handled by setting a Unix "alarm" signal (reset each time a message is received), such than when the alarm fires (due to no messages received in the timeout period) the input loop is exited (the indexer/scripts/qutil.py program does this by calling basic_cancel).

@philbudne philbudne added the enhancement New feature or request label Feb 12, 2024
@rahulbot rahulbot added this to the Production Beta 4 milestone Feb 14, 2024
@rahulbot rahulbot modified the milestones: Production Beta 4, long-term Mar 13, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants