Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ability to resend events to Kafka that errored while producing #354

Closed
2 tasks
robrap opened this issue Jul 10, 2023 · 3 comments
Closed
2 tasks

Ability to resend events to Kafka that errored while producing #354

robrap opened this issue Jul 10, 2023 · 3 comments
Assignees
Labels
event-bus Work related to the Event Bus.

Comments

@robrap
Copy link
Contributor

robrap commented Jul 10, 2023

If Kafka were to fail and events couldn't be produced, we currently end up with log messages, but do not yet a way to easily grab these messages from the logs and send them to Kafka.

A/C:

  • a procedure for resending failed events to the event bus
  • documentation (probably a runbook) on how to do this.

Implementation Details:

  • The idea for implementation is to create a Splunk or New Relic search that will isolate the failed events and allow us to download/export the details of these events.
  • Then we could run a script to resend those events to the event bus.
  • Have a strategy for dealing with API keys/permissions to send data to event bus.
  • For testing: can we replicate a breakage in stage/other environment? and then can we test resending the events?
@robrap robrap added the event-bus Work related to the Event Bus. label Jul 10, 2023
@rgraber rgraber self-assigned this Jul 17, 2023
@robrap
Copy link
Contributor Author

robrap commented Jul 21, 2023

We've discussed the fact that order matters, so this can't be as simple as re-sending failed events, because there might be successful events that need to be sent afterwards.

One idea is to cobble together scripts, how-to, and a management command that can take a batch of events to resend.

  • For lifecycle events (e.g. create, update, delete), a failed create event may require resending the create event and any following update or delete events.
    • Is this data we can through the data on the topic itself? Would we just look up data in the DB and send the final update event? The answer may change based on what is simplest and good enough for each event type.

Note: This ticket is meant to be a quick fix (if there is one) to hold over before investing in the outbox pattern, which is ticketed separately. See: openedx/openedx-events#251.

@robrap
Copy link
Contributor Author

robrap commented Jul 27, 2023

I know this is already groomed, but we should also communicate with existing owners of events being produced about what they need to be aware of.

@rgraber
Copy link
Contributor

rgraber commented Aug 25, 2023

@rgraber rgraber closed this as completed Aug 25, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
event-bus Work related to the Event Bus.
Projects
Archived in project
Development

No branches or pull requests

2 participants