Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pgpool Version 4.4.4 #64

Open
pradeepp-ai opened this issue Aug 9, 2024 · 3 comments
Open

Pgpool Version 4.4.4 #64

pradeepp-ai opened this issue Aug 9, 2024 · 3 comments
Assignees

Comments

@pradeepp-ai
Copy link

I Need help or suggestions on the issue below pgpool-II.

Example:-
Node1 -- Primary (Postgres & pgpool installed)
Node2 -- Standby (Postgres & pgpool installed)
Node3 -- Witness (Pgpool installed)

Incident:-

  1. In the pgpool cluster we have node1 is primary, node2 is secondary, and node3 is witness.
  2. node1 becomes unresponsive then failover is triggered and then node2 becomes the new primary.
  3. After 15 minutes node1 became responsive again and node1 is in primary state and we have node 2 also in primary so a split brain situation happened.
  4. Node 1 itself is decided as leader/coordinator based on the node 1 pgpool record and more number of connected standby nodes (based on old state of information) , now i am the true leader, it is decided as the true leader and asks node 2 to step down.
  5. In this case for 15 minutes data written to node 2 new primary after node 1 responsive data gets started to return on node 1 again so here data gets inconsistent.

Please provide a suggestion in this case.

NOTE:- We are using only for auto-failover not for auto-failback so auto-failback is disabled.

Thanks & Regards,
Pradeep Prajapati

@tatsuo-ishii
Copy link
Collaborator

tatsuo-ishii commented Aug 13, 2024

What do you mean by "witness"? I am asking because there's no such an idea "witness" in pgpool.
Can you share pgpool.conf? (you can hide sensitive data like password or IP address)

@pradeepp-ai
Copy link
Author

pradeepp-ai commented Aug 13, 2024

Hi tatsuo-ishii,

Thank you for your reply.

Ok, there is no such witness node. I am briefing the server information and sharing the pgpool configuration file for your reference.

So, we have 3 servers we can refer it as node 1, node 2 & node 3.

Node 1 have following services installed like postgresql-15.4 and pgpool-II 4.4.4
Node 2 also have following services installed like postgresql-15.4 and pgpool-II 4.4.4
But, In node 3 we have installed only pgpool-II 4.4.4

We are using 3 pgpool-II 4.4.4 and 2 postgresql-15.4 Database so that pgpool will use consensus to elect the leader if two pgpool agrees.

We have assigned the pgpool_node_id as follow for Node 1 = 0, Node 2 = 1 and Node 3 = 2

Kindly find the all 3 nodes pgpool.conf file.

Please let me know as per my configuration in how much time failover will complete or share me the formula to calculate the failover time.

FYI, We are using pgpool for auto-failover and manual recovery instead of auto-recovery that why we have disable the auto-failback and we are not using load balancing for the time being.

Please let me know if you have any query related to setup and configuration.

FYI, All nodes pgpool.conf file is same except wd_priority.
We have given the wd_priority as follow
for Node 1 = 1 ,
Node 2 = 2 and
Node 3 = 0 (Because we don't want VIP IP to established on node 3)

node3-pgpool-conf.txt
node2-pgpool-conf.txt
node1-pgpool-conf.txt

@tatsuo-ishii
Copy link
Collaborator

tatsuo-ishii commented Aug 17, 2024

There's nothing unexpected in the pgpool.conf files and I need more info.
If the incident above can be reproduced, can you provide the output of pcp_watchdog_info and pcp_node_info on node1 and node
when this happened:

node1 becomes unresponsive then failover is triggered and then node2 becomes the new primary.

Also I would like to confirm that you did not restart any of node1, node2, node3 and PostgreSQL during 1-5.

Please let me know as per my configuration in how much time failover will complete or share me the formula to calculate the failover time.

I guess you are asking how long it will take before failover is triggered when some errors occurred.
The formula for this is roughly:
(health_check_period + health_check_timeout + health_check_retry_delay + connect_timeout) * health_check_max_retries

Note that this could be shorter if health_check_timeout or connect_timeout checking immediately reports back the failure.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants