You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Our snapshots to S3 stopped from 11/22 till the database crashed on 11/29, and then it restored the data from 11/22
To Reproduce
Not sure how to make it reproducible.
Expected behavior
Snapshots should be taken every 5 minutes, and added to the S3 Bucket.
Environment (please complete the following information):
Kernel: # Command: Linux dragonfly-agent-device-0-primary-0 5.10.228-219.884.amzn2.aarch64 #1 SMP Wed Oct 23 17:17:31 UTC 2024 aarch64 aarch64 aarch64 GNU/Linux
Containerized?: EKS
Dragonfly Version: 1.24
Additional context
In Grafana Logs, I see these lines, where it should have logged the snapshot.
I20241122 11:25:02.677474 12 rdb_save.cc:1270] Channel write took 1112 ms while writing 35393/35393
I20241122 11:25:04.081243 13 rdb_save.cc:1270] Channel write took 1623 ms while writing 34972/34972
I20241122 11:25:05.775197 11 rdb_save.cc:1270] Channel write took 2414 ms while writing 32815/32815
Also, we have a Primary and a Replica. Is it needed to do snapshots for the Primary and the replica? Or just the primary?
Regarding the crash, I don't see anything in the logs really
@wernermorgenstern I appreciate filing the issue but unfortunately it does not have enough information to identify the root cause of these problem. The persistence section of the "info" response has information on how recent the last save was. you may want to monitor this.
In addition (and I do not know if it's related) there is a problem in dns resolve code and that's why your periodic snapshotting stops - it's just being stuck there. For me to understand what happens, can you please run dragonfly process with: --vmodue=dns_resolve=1 ? it will print bunch of logs that may help identifying the issue.
It could be an issue with our code that handles the DNS resolution but I can not say for certain. In case you see again the snapshots stopped working again, I could instruct on how to provide more info. In any case, this comment #4244 (comment) provides an advice on how to increase the verbosity around the dns resolution code.
Describe the bug
We have Snapshots enabled every 5 minutes, to S3.
Here is our configuration for the Snapshotting:
Our snapshots to S3 stopped from 11/22 till the database crashed on 11/29, and then it restored the data from 11/22
To Reproduce
Not sure how to make it reproducible.
Expected behavior
Snapshots should be taken every 5 minutes, and added to the S3 Bucket.
Environment (please complete the following information):
Linux dragonfly-agent-device-0-primary-0 5.10.228-219.884.amzn2.aarch64 #1 SMP Wed Oct 23 17:17:31 UTC 2024 aarch64 aarch64 aarch64 GNU/Linux
Additional context
In Grafana Logs, I see these lines, where it should have logged the snapshot.
Also, we have a Primary and a Replica. Is it needed to do snapshots for the Primary and the replica? Or just the primary?
Regarding the crash, I don't see anything in the logs really
Our EKS Resources are:
The text was updated successfully, but these errors were encountered: