Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Service using shuttle was stuck for a day, without any errors. #2157

Open
chetankashetti opened this issue Jul 11, 2024 · 0 comments
Open

Service using shuttle was stuck for a day, without any errors. #2157

chetankashetti opened this issue Jul 11, 2024 · 0 comments
Labels
s-triage Needs to be reviewed, designed and prioritized

Comments

@chetankashetti
Copy link

chetankashetti commented Jul 11, 2024

What is the bug?
Shuttle service was stuck for a day without any error logs or exceptions.

How can it be reproduced?
We have 3 shards running for live subscription. out of them two were stuck, shard-0 and shard-2.

We observed we are no more receiving the data from shuttle, and when we saw the logs there were no error logs.
some of the metrics we looked at was hubs (cpu and memory) and service(cpu and memory) and RDS all look totally fine. in fact underutilised.
some of the screenshots indicating no interaction and kept hanging state for a while not sure if even connection was still there.
image
image

While it was stuck for a day, first action we did was to restart the pod. when we did that it started syncing from the eventId it was stuck. it took few hours to sync. but once it was live, observed that the cast i made an hour back didn't get indexed, ideally it should have indexed? because live stream holds data for 3 days. and it missed my cast, similarly might have missed others as well.

So, just to summarise we wanted to know couple of things

  1. Why service was stuck at an eventId, without any error. though health of components looks good.?
  2. Does live event subscription cover all events if it was stopped for a an hour or two or for a while(less than 3 days) ?

we are not able to reproduce the issue, but we have observed only once.
Additional context

@github-actions github-actions bot added the s-triage Needs to be reviewed, designed and prioritized label Jul 11, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
s-triage Needs to be reviewed, designed and prioritized
Projects
None yet
Development

No branches or pull requests

1 participant