Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Not working RocketChat when primary mongo from PSA setup goes down. #34875

Open
dusatvoj opened this issue Jan 3, 2025 · 3 comments
Open

Not working RocketChat when primary mongo from PSA setup goes down. #34875

dusatvoj opened this issue Jan 3, 2025 · 3 comments

Comments

@dusatvoj
Copy link

dusatvoj commented Jan 3, 2025

Description:

Today I did some upgrade tasks and I had to shut down for a while one of cluster's hypervisors which has at local ZVols (due to performance reasons) database masters.

I have MongoDB in PSA setup. When I shut down MongoDB primary, secondary was elected to new primary and RocketChat crashed. All of my 3 instances went unhealthy and K8s starts killing them (CrashLoopBackOff).

I have another app consuming data from cluster and the data in second application could be read. Can't get why RC was broken.

So outage happend during a day - it's not acceptable for us but I did that many times recently and there were no problem. RocketChat connected to MongoDB, did some queries, start server but does not start responding (as it was degraded, ingress doesn't route traffic to them) -> 503 error.

Steps to reproduce:

  1. Deploy HA RC with Mongo in PSA setup.
  2. Shut down master of mongodb instances
  3. RocketChat went down.

Expected behavior:

RocketChat stays alive.

Actual behavior:

503, RC pods unhealthy, repeating System not healthy due to no real time data received recently but started from crashloopbackoff

Server Setup Information:

  • Version of Rocket.Chat Server: 6.13.1
  • License Type: Enterprise
  • Number of Users: 100+
  • Operating System: Linux
  • Deployment Method: Kubernetes via https://artifacthub.io/packages/helm/rocketchat-server/rocketchat (v6.11.5)
  • Number of Running Instances: 3
  • DB Replicaset Oplog: yes
  • NodeJS Version: idk, your image
  • MongoDB Version: 5.0.23

Additional context

Relevant logs:

(node:1) NOTE: We are formalizing our plans to enter AWS SDK for JavaScript (v2) into maintenance mode in 2023.
Please migrate your code to use AWS SDK for JavaScript (v3).
For more information, check the migration guide at https://a.co/7PzMCcy
(Use `node --trace-warnings ...` to show where the warning was created)
LocalStore: store created at
LocalStore: store created at
LocalStore: store created at
{"level":40,"time":"2025-01-03T13:13:12.434Z","pid":1,"hostname":"rocketchat-rocketchat-58946bb66b-cx5bt","name":"VoIPAsteriskService","msg":"Voip is not enabled. Cant start the service"}
{"level":51,"time":"2025-01-03T13:13:12.476Z","pid":1,"hostname":"rocketchat-rocketchat-58946bb66b-cx5bt","name":"Migrations","msg":"Not migrating, already at version 304"}
[2025-01-03T13:13:12.487Z] INFO  2a1b6d25-99dd-44e9-b8cc-2b74cff6638e/BROKER: Moleculer v0.14.34 is starting...
[2025-01-03T13:13:12.487Z] INFO  2a1b6d25-99dd-44e9-b8cc-2b74cff6638e/BROKER: Namespace: <not defined>
[2025-01-03T13:13:12.487Z] INFO  2a1b6d25-99dd-44e9-b8cc-2b74cff6638e/BROKER: Node ID: 2a1b6d25-99dd-44e9-b8cc-2b74cff6638e
[2025-01-03T13:13:12.489Z] INFO  2a1b6d25-99dd-44e9-b8cc-2b74cff6638e/REGISTRY: Strategy: RoundRobinStrategy
[2025-01-03T13:13:12.489Z] INFO  2a1b6d25-99dd-44e9-b8cc-2b74cff6638e/REGISTRY: Discoverer: LocalDiscoverer
[2025-01-03T13:13:12.491Z] INFO  2a1b6d25-99dd-44e9-b8cc-2b74cff6638e/BROKER: Serializer: EJSONSerializer
[2025-01-03T13:13:12.519Z] INFO  2a1b6d25-99dd-44e9-b8cc-2b74cff6638e/BROKER: Validator: FastestValidator
[2025-01-03T13:13:12.522Z] INFO  2a1b6d25-99dd-44e9-b8cc-2b74cff6638e/BROKER: Registered 13 middleware(s).
[2025-01-03T13:13:12.524Z] INFO  2a1b6d25-99dd-44e9-b8cc-2b74cff6638e/BROKER: Transporter: TcpTransporter
{"level":51,"time":"2025-01-03T13:13:12.560Z","pid":1,"hostname":"rocketchat-rocketchat-58946bb66b-cx5bt","name":"DatabaseWatcher","msg":"Using change streams"}
[2025-01-03T13:13:12.574Z] INFO  2a1b6d25-99dd-44e9-b8cc-2b74cff6638e/TRANSIT: Connecting to the transporter...
[2025-01-03T13:13:12.584Z] INFO  2a1b6d25-99dd-44e9-b8cc-2b74cff6638e/TRANSPORTER: TCP server is listening on port 32795
[2025-01-03T13:13:12.585Z] INFO  2a1b6d25-99dd-44e9-b8cc-2b74cff6638e/TRANSPORTER: UDP Discovery is disabled.
[2025-01-03T13:13:12.585Z] INFO  2a1b6d25-99dd-44e9-b8cc-2b74cff6638e/TRANSPORTER: TCP Transporter started.
{"level":51,"time":"2025-01-03T13:13:13.350Z","pid":1,"hostname":"rocketchat-rocketchat-58946bb66b-cx5bt","name":"License","msg":"License installed","version":"3.0","hash":"Wo6UKZeI"}
[2025-01-03T13:13:13.485Z] INFO  2a1b6d25-99dd-44e9-b8cc-2b74cff6638e/REGISTRY: '$node' service is registered.
[2025-01-03T13:13:13.486Z] INFO  2a1b6d25-99dd-44e9-b8cc-2b74cff6638e/REGISTRY: 'matrix' service is registered.
[2025-01-03T13:13:13.487Z] INFO  2a1b6d25-99dd-44e9-b8cc-2b74cff6638e/$NODE: Service '$node' started.
[2025-01-03T13:13:13.487Z] INFO  2a1b6d25-99dd-44e9-b8cc-2b74cff6638e/MATRIX: Service 'matrix' started.
[2025-01-03T13:13:13.488Z] INFO  2a1b6d25-99dd-44e9-b8cc-2b74cff6638e/BROKER: ✔ ServiceBroker with 2 service(s) started successfully in 940ms.
(node:1) [MONGODB DRIVER] Warning: cursor.count is deprecated and will be removed in the next major version, please use `collection.estimatedDocumentCount` or `collection.countDocuments` instead
+---------------------------------------------+
|                SERVER RUNNING               |
+---------------------------------------------+
|                                             |
|  Rocket.Chat Version: 6.13.1                |
|       NodeJS Version: 14.21.3 - x64         |
|      MongoDB Version: 5.0.23                |
|       MongoDB Engine: wiredTiger            |
|             Platform: linux                 |
|         Process Port: 3000                  |
|             Site URL: https://im.example.com/  |
|     ReplicaSet OpLog: Enabled               |
|          Commit Hash: 39d3b2e16c            |
|        Commit Branch: HEAD                  |
|                                             |
+---------------------------------------------+
{"level":51,"time":"2025-01-03T13:13:22.155Z","pid":1,"hostname":"rocketchat-rocketchat-58946bb66b-cx5bt","name":"License","msg":"License installed","version":"3.0","hash":"ZFm6DOKo"}
{"level":50,"time":"2025-01-03T13:13:29.845Z","pid":1,"hostname":"rocketchat-rocketchat-58946bb66b-cx5bt","name":"System","msg":"System not healthy due to no real time data received recently"}
{"level":50,"time":"2025-01-03T13:13:32.564Z","pid":1,"hostname":"rocketchat-rocketchat-58946bb66b-cx5bt","name":"System","msg":"No real time data received recently"}
{"level":50,"time":"2025-01-03T13:13:44.672Z","pid":1,"hostname":"rocketchat-rocketchat-58946bb66b-cx5bt","name":"System","msg":"System not healthy due to no real time data received recently"}
[2025-01-03T13:13:49.354Z] INFO  2a1b6d25-99dd-44e9-b8cc-2b74cff6638e/REGISTRY: Node '4431cc14-4780-4f06-8799-8b174f044702' reconnected.
{"level":50,"time":"2025-01-03T13:13:52.565Z","pid":1,"hostname":"rocketchat-rocketchat-58946bb66b-cx5bt","name":"System","msg":"No real time data received recently"}
{"level":50,"time":"2025-01-03T13:13:54.248Z","pid":1,"hostname":"rocketchat-rocketchat-58946bb66b-cx5bt","name":"System","msg":"System not healthy due to no real time data received recently"}
{"level":50,"time":"2025-01-03T13:13:59.670Z","pid":1,"hostname":"rocketchat-rocketchat-58946bb66b-cx5bt","name":"System","msg":"System not healthy due to no real time data received recently"}
{"level":50,"time":"2025-01-03T13:13:59.672Z","pid":1,"hostname":"rocketchat-rocketchat-58946bb66b-cx5bt","name":"System","msg":"System not healthy due to no real time data received recently"}
{"level":50,"time":"2025-01-03T13:14:12.565Z","pid":1,"hostname":"rocketchat-rocketchat-58946bb66b-cx5bt","name":"System","msg":"No real time data received recently"}
{"level":50,"time":"2025-01-03T13:14:14.670Z","pid":1,"hostname":"rocketchat-rocketchat-58946bb66b-cx5bt","name":"System","msg":"System not healthy due to no real time data received recently"}
{"level":50,"time":"2025-01-03T13:14:14.672Z","pid":1,"hostname":"rocketchat-rocketchat-58946bb66b-cx5bt","name":"System","msg":"System not healthy due to no real time data received recently"}
{"level":50,"time":"2025-01-03T13:14:29.248Z","pid":1,"hostname":"rocketchat-rocketchat-58946bb66b-cx5bt","name":"System","msg":"System not healthy due to no real time data received recently"}
{"level":50,"time":"2025-01-03T13:14:29.670Z","pid":1,"hostname":"rocketchat-rocketchat-58946bb66b-cx5bt","name":"System","msg":"System not healthy due to no real time data received recently"}
{"level":50,"time":"2025-01-03T13:14:29.672Z","pid":1,"hostname":"rocketchat-rocketchat-58946bb66b-cx5bt","name":"System","msg":"System not healthy due to no real time data received recently"}
{"level":50,"time":"2025-01-03T13:14:30.205Z","pid":1,"hostname":"rocketchat-rocketchat-58946bb66b-cx5bt","name":"System","msg":"System not healthy due to no real time data received recently"}
[2025-01-03T13:14:30.333Z] INFO  2a1b6d25-99dd-44e9-b8cc-2b74cff6638e/$NODE: Service '$node' stopped.
[2025-01-03T13:14:30.334Z] INFO  2a1b6d25-99dd-44e9-b8cc-2b74cff6638e/MATRIX: Service 'matrix' stopped.
[2025-01-03T13:14:30.336Z] INFO  2a1b6d25-99dd-44e9-b8cc-2b74cff6638e/BROKER: ServiceBroker is stopped. Good bye.

It says it's connected to mongo but it doesn't work.

In Mongo, there's no relevant errors (only info about success connection, etc. and a bit of queries up to 700ms, nothing really wrong).

@reetp
Copy link

reetp commented Jan 3, 2025

License Type: Enterprise

You should open a support issue dirct with the team - it's what you pay for!

You should get a faster response there.

@dusatvoj
Copy link
Author

dusatvoj commented Jan 4, 2025

@reetp , yeah, I did that but I'm noting it publicly. In desk ticket I referenced this issue.

@reetp
Copy link

reetp commented Jan 4, 2025

Ok thanks.

(No need to @ anyone - it gets read)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants