-
Notifications
You must be signed in to change notification settings - Fork 255
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Standby resyncs with Primary Node at every Restart #858
Comments
I had a similar problem. When I reboot my primary e.g. for updating the Linux kernel, the secondary is promoted to primary. To "fix" this, I pause the service before reboot and unpause after reboot. Pause service (execute on ONE node of the cluster): Unpause/continue service (execute on ONE node of the cluster): |
Hello @JP95Git, I tried with the steps which you suggested. But this is not working for me.
Actually all this effort going on in testing for the reason behind the issue: One of the Standby node's status is "running as primary". Although all the connection are routing to original primary.
I am finding a way to solve this issue without requesting for any downtime. I tried another method to fix this:
But again, same issue occurred as before, every time pgsql service is started, both replica nodes starts to re-sync with primary node each time even though I scaled only one replica node. Is it something with the configuration setting which is causing this re-syncing at every time services are deployed or am I performing any wrong steps? |
@aviralsingh21 I never used docker, so I can't help you with this. But something went wrong while doing "stack rm" and "stack deploy" because some of your nodes were paused and some were not. The message "running as primary" indicates a "split brain", which means that you have 2 primaries, each of them is holding some parts of your database. I also faced this situation, had to restore from a backup. I use repmgr in pause mode until I find a solution for the "split brain". I think it is normal that the standby syncs again with the primary at the restart. I setup WAL archiving using barman, which allows the standby to use the WAL archive to sync very fast with the primary, without doing a full sync. |
@JP95Git Related to concern you raised about But I confirmed the service status of the nodes, they were all paused before stopping the service. Status after re-syncing was complete: Logs from the Standby Node at the time of re-syncing:
Logs after re-syncing was complete:
|
I have a docker swarm HA architecture with setup of 3 nodes of PostgreSQL, 1 pgpool-II service and various other services.
PostgreSQL is setup in HA Cluster using Replication Manager (repmgr) tool. 1 Primary Node + 1 Standby Node + 1 Witness Node
Docker Image Used: bitnami/postgresql-repmgr:16.3.0
Issue: Standby resyncs with Primary Node at every Restart of docker services.
What I was planning to do is to perform a graceful shutdown of postgresql database and then stop the container. In the process of shutting down the database at primary node (node-1), as soon it was shutdown then container got exited and database started as with new container id with a standby role and started to re-sync with new primary(node-2). I assumed this is normal. Since everytime container was restarting at every db shutdown try, I thought it will be better to first stop the repmgr daemon to permanently stop the database. But this didn't help.
I didn't get the permanent way to perform graceful shutdown of database before stopping docker service of postgresql. I didn't get the solution for it but I discovered another issue where whenever I restart the postgresql docker service, standby node (node-1) re-syncs (performs cloning) every single time with primary node (node-1).
PostgreSQL Logs from Standby Node:
I also compared logs of standby with other same environment's standby node which is not facing such issue. Logs are same as above, just 'Rejoining Node...' log does not exist there.
Additional information:
I have already reviewed other relevant issues. Like #52213, #34986. I configured pg_rewind and enabled wal_log_hints. But situation is still same.
I tested with bitnami/postgresql-repmgr:12.4.0 docker imager. Same situation is there also.
I also deleted the volume and deployed the postgresql service with fresh volume, restored the database again. This time I directly stopped the docker service instead of stopping database first. But still I am facing same issue.
Database Size used for testing: Around 60GB.
How to tackle this situation, anyone can please help me with this situation?
The text was updated successfully, but these errors were encountered: