Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Connections hang if the database limit max_size is overtaken #471

Open
fluca1978 opened this issue Sep 23, 2024 · 3 comments
Open

Connections hang if the database limit max_size is overtaken #471

fluca1978 opened this issue Sep 23, 2024 · 3 comments
Labels
bug Something isn't working

Comments

@fluca1978
Copy link
Collaborator

While inspecting #456 I found this behavior that applies also to master branch 40acff6:

% pgagroal-cli conf get max_connections
15

% pgagroal-cli conf get limit.pgbench.max_size
12

% pgagroal-cli conf get limit.pgbench.min_size
5

so the system is allowing 15 max connections and 12 are possible at max against the pgbench database.
Every run with a number of client less than 12 (pgbench max connections) works:

% time /usr/pgsql-16/bin/pgbench -c 5 -n  -T 60 -h localhost -p 54322 -U pgbench pgbench
...
/usr/pgsql-16/bin/pgbench -c 5 -n -T 60 -h localhost -p 54322 -U pgbench   0.11s user 0.19s system 0% cpu 1:01.39 total

% time /usr/pgsql-16/bin/pgbench -c 10 -n  -T 60 -h localhost -p 54322 -U pgbench pgbench
...
/usr/pgsql-16/bin/pgbench -c 10 -n -T 60 -h localhost -p 54322 -U pgbench   0.12s user 0.27s system 0% cpu 1:00.53 total

% time /usr/pgsql-16/bin/pgbench -c 12 -n  -T 60 -h localhost -p 54322 -U pgbench pgbench
...
/usr/pgsql-16/bin/pgbench -c 12 -n -T 60 -h localhost -p 54322 -U pgbench   0.12s user 0.27s system 0% cpu 1:00.66 total


then when the number of clients is greater than max connections for the database, the system seems to hang the connections and the running never ends:

% time /usr/pgsql-16/bin/pgbench -c 13 -n  -T 60 -h localhost -p 54322 -U pgbench pgbench
pgbench (16.4)
^C
/usr/pgsql-16/bin/pgbench -c 13 -n -T 60 -h localhost -p 54322 -U pgbench   0.00s user 0.01s system 0% cpu 1:07.47 total


% time /usr/pgsql-16/bin/pgbench -c 15 -n  -T 60 -h localhost -p 54322 -U pgbench pgbench
pgbench (16.4)
^C
/usr/pgsql-16/bin/pgbench -c 15 -n -T 60 -h localhost -p 54322 -U pgbench   0.00s user 0.01s system 0% cpu 3:57.05 total

in the meantime, observing the status reports the connections to be in replica mode:

% pgagroal-cli status
Status:              Running
Active connections:  12
Total connections:   13
Max connections:     15
[luca@rachel]~% pgagroal-cli status details
Status:              Running
Active connections:  12
Total connections:   13
Max connections:     15
---------------------
Server:              rachel
Host:                127.0.0.1
Port:                5432
State:               Primary
---------------------
Database:            pgbench
Username:            pgbench
Active connections:  12
Max connections:     12
Initial connections: 5
Min connections:     5
---------------------
Database:            testdb
Username:            luca
Active connections:  0
Max connections:     2
Initial connections: 1
Min connections:     1
---------------------
Connection    1:     Replica         2024-09-23 17:19:59 165147 15     pgbench pgbench pgbench
Connection    2:     Replica         2024-09-23 17:19:59 165148 16     pgbench pgbench pgbench
Connection    3:     Replica         2024-09-23 17:19:59 165149 17     pgbench pgbench pgbench
Connection    4:     Replica         2024-09-23 17:19:59 165150 18     pgbench pgbench pgbench
Connection    5:     Replica         2024-09-23 17:19:59 165151 19     pgbench pgbench pgbench
Connection    6:     Primary         2024-09-23 17:19:58        20     luca testdb 
Connection    7:     Replica         2024-09-23 17:19:59 165152 5      pgbench pgbench pgbench
Connection    8:     Replica         2024-09-23 17:19:59 165154 5      pgbench pgbench pgbench
Connection    9:     Replica         2024-09-23 17:19:59 165156 5      pgbench pgbench pgbench
Connection   10:     Replica         2024-09-23 17:19:59 165158 5      pgbench pgbench pgbench
Connection   11:     Replica         2024-09-23 17:19:59 165160 5      pgbench pgbench pgbench
Connection   12:     Replica         2024-09-23 17:19:59 165162 5      pgbench pgbench pgbench
Connection   13:     Replica         2024-09-23 17:19:59 165164 5      pgbench pgbench pgbench
Connection   14:     Not init                                            
Connection   15:     Not init   
@fluca1978 fluca1978 added the bug Something isn't working label Sep 23, 2024
@fluca1978
Copy link
Collaborator Author

With regard to the replica status of connections, the initial connections are reported as primary (database limit initial_size), once the connection is running it is reported as replica:

% pgagroal-cli status details
-> TRACE cli.c:691 <status details>
-> DEBUG management.c:1852 pgagroal-cli version 20000
Status:              Running
Active connections:  0
Total connections:   6
Max connections:     15
---------------------
Server:              rachel
Host:                127.0.0.1
Port:                5432
State:               Primary
---------------------
Database:            pgbench
Username:            pgbench
Active connections:  0
Max connections:     12
Initial connections: 5
Min connections:     5
---------------------
Database:            testdb
Username:            luca
Active connections:  0
Max connections:     2
Initial connections: 1
Min connections:     1
---------------------
Connection    1:     Primary         2024-09-23 17:27:56        15     pgbench pgbench 
Connection    2:     Primary         2024-09-23 17:27:56        16     pgbench pgbench 
Connection    3:     Primary         2024-09-23 17:27:56        17     pgbench pgbench 
Connection    4:     Primary         2024-09-23 17:27:56        18     pgbench pgbench 
Connection    5:     Primary         2024-09-23 17:27:56        19     pgbench pgbench 
Connection    6:     Primary         2024-09-23 17:27:56        20     luca testdb 
Connection    7:     Not init                                            
Connection    8:     Not init                                            
Connection    9:     Not init                                            
Connection   10:     Not init                                            
Connection   11:     Not init                                            
Connection   12:     Not init                                            
Connection   13:     Not init                                            
Connection   14:     Not init                                            
Connection   15:     Not init                                            



# run pgbench here

% pgagroal-cli status details
-> TRACE cli.c:691 <status details>
-> DEBUG management.c:1852 pgagroal-cli version 20000
Status:              Running
Active connections:  7
Total connections:   8
Max connections:     15
---------------------
Server:              rachel
Host:                127.0.0.1
Port:                5432
State:               Primary
---------------------
Database:            pgbench
Username:            pgbench
Active connections:  7
Max connections:     12
Initial connections: 5
Min connections:     5
---------------------
Database:            testdb
Username:            luca
Active connections:  0
Max connections:     2
Initial connections: 1
Min connections:     1
---------------------
Connection    1:     Replica         2024-09-23 17:28:13 165971 15     pgbench pgbench pgbench
Connection    2:     Replica         2024-09-23 17:28:13 165972 16     pgbench pgbench pgbench
Connection    3:     Replica         2024-09-23 17:28:13 165973 17     pgbench pgbench pgbench
Connection    4:     Replica         2024-09-23 17:28:13 165974 18     pgbench pgbench pgbench
Connection    5:     Replica         2024-09-23 17:28:13 165975 19     pgbench pgbench pgbench
Connection    6:     Primary         2024-09-23 17:27:56        20     luca testdb 
Connection    7:     Replica         2024-09-23 17:28:13 165976 5      pgbench pgbench pgbench
Connection    8:     Replica         2024-09-23 17:28:13 165978 5      pgbench pgbench pgbench
Connection    9:     Not init                                            
Connection   10:     Not init                                            
Connection   11:     Not init                                            
Connection   12:     Not init                                            
Connection   13:     Not init                                            
Connection   14:     Not init                                            
Connection   15:     Not init    

then they switch back to primary, hence this is not the problem.

@fluca1978
Copy link
Collaborator Author

Checking the status of the connections on the PostgreSQL side I've:

  1. before starting the pgbench
pgbench=# select application_name, state, now() - backend_start from pg_stat_activity where application_name = 'pgbench';
 application_name | state |    ?column?     
------------------+-------+-----------------
 pgbench          | idle  | 00:11:57.499258
 pgbench          | idle  | 00:11:57.483905
 pgbench          | idle  | 00:11:57.474856
 pgbench          | idle  | 00:11:57.465395
 pgbench          | idle  | 00:11:57.456646
 pgbench          | idle  | 00:10:17.432109
 pgbench          | idle  | 00:10:17.4212
 pgbench          | idle  | 00:10:17.410675
 pgbench          | idle  | 00:10:17.398096
 pgbench          | idle  | 00:10:17.385822
 pgbench          | idle  | 00:10:17.372623
 pgbench          | idle  | 00:10:17.358922
  1. I launch pgbench
% time /usr/pgsql-16/bin/pgbench -c 15 -n  -T 60 -h localhost -p 54322 -U pgbench pgbench
pgbench (16.4)
^C
/usr/pgsql-16/bin/pgbench -c 15 -n -T 60 -h localhost -p 54322 -U pgbench   0.00s user 0.00s system 0% cpu 3:02.72 total
  1. and after more than the time I've set pgbench to run:
pgbench=# select application_name, state, now() - backend_start from pg_stat_activity where application_name = 'pgbench';
 application_name | state |    ?column?     
------------------+-------+-----------------
 pgbench          | idle  | 00:15:01.105466
 pgbench          | idle  | 00:15:01.090113
 pgbench          | idle  | 00:15:01.081064
 pgbench          | idle  | 00:15:01.071603
 pgbench          | idle  | 00:15:01.062854
 pgbench          | idle  | 00:13:21.038317
 pgbench          | idle  | 00:13:21.027408
 pgbench          | idle  | 00:13:21.016883
 pgbench          | idle  | 00:13:21.004304
 pgbench          | idle  | 00:13:20.99203
 pgbench          | idle  | 00:13:20.978831
 pgbench          | idle  | 00:13:20.96513

Then, I quit pgagroal and started it over, running again the pgbench, obtaining much less connections than before, but still not getting it completed in one minute:

pgbench=# select application_name, state, now() - backend_start from pg_stat_activity where application_name = 'pgbench';
 application_name | state |    ?column?     
------------------+-------+-----------------
 pgbench          | idle  | 00:04:11.263507
 pgbench          | idle  | 00:04:11.251645
 pgbench          | idle  | 00:04:11.236199
 pgbench          | idle  | 00:04:11.223282
 pgbench          | idle  | 00:04:11.210829
 pgbench          | idle  | 00:04:11.197331
 pgbench          | idle  | 00:04:11.185038

it ran for 4 minutes instead of 60 seconds.


Then I noted a strange thing:

pgbench=# select datname, count(*) from pg_stat_activity group by datname;
datname | count
----------+-------
| 5
testdb | 1
pgbench | 94
bibliodb | 2
(4 rows)

pgbench=# show max_connections ;
max_connections

100
(1 row)

pgbench=# select application_name, state, now() - backend_start from pg_stat_activity where datname = 'pgbench';
application_name | state | ?column?
------------------+--------+-----------------
pgagroal | idle | 00:05:57.203986
pgbench | idle | 00:04:12.28093
pgagroal | idle | 00:05:51.044541
pgbench | idle | 00:04:12.264843
pgbench | idle | 00:04:12.25655
pgagroal | idle | 00:05:57.495641
pgagroal | idle | 00:05:57.440158
pgagroal | idle | 00:05:57.430005
pgagroal | idle | 00:05:57.419923
pgagroal | idle | 00:05:57.340037
pgagroal | idle | 00:05:57.307005
pgagroal | idle | 00:05:57.22609
pgagroal | idle | 00:05:56.865153
pgagroal | idle | 00:05:57.181119
pgagroal | idle | 00:05:57.132652
pgagroal | idle | 00:05:57.059128
pgagroal | idle | 00:05:56.946148
pgagroal | idle | 00:05:56.922899
pgagroal | idle | 00:05:56.511467
pgagroal | idle | 00:05:56.84327
pgagroal | idle | 00:05:56.726024
pgagroal | idle | 00:05:56.631186
pgagroal | idle | 00:05:55.601123
pgagroal | idle | 00:05:56.427405
pgagroal | idle | 00:05:56.368246
pgagroal | idle | 00:05:56.345935
pgagroal | idle | 00:05:56.32225
pgagroal | idle | 00:05:56.31235
pgagroal | idle | 00:05:56.302237
pgagroal | idle | 00:05:56.162064
pgagroal | idle | 00:05:56.022244
pgagroal | idle | 00:05:56.012134
pgagroal | idle | 00:05:55.753131
pgagroal | idle | 00:05:55.706314
pgagroal | idle | 00:05:55.68315
pgagroal | idle | 00:05:55.4753
pgagroal | idle | 00:05:55.556058
pgagroal | idle | 00:05:55.307282
pgagroal | idle | 00:05:55.452211
pgagroal | idle | 00:05:55.442067
pgagroal | idle | 00:05:55.39693
pgagroal | idle | 00:05:55.38674
pgagroal | idle | 00:05:55.364754
pgagroal | idle | 00:05:54.793307
pgagroal | idle | 00:05:55.275956
pgagroal | idle | 00:05:54.613983
pgagroal | idle | 00:05:54.636393
pgagroal | idle | 00:05:54.517555
pgagroal | idle | 00:05:54.590629
pgagroal | idle | 00:05:54.580725
pgagroal | idle | 00:05:54.483401
pgagroal | idle | 00:05:54.415498
pgagroal | idle | 00:05:54.473048
pgagroal | idle | 00:05:53.675977
pgagroal | idle | 00:05:54.240092
pgagroal | idle | 00:05:54.218311
pgagroal | idle | 00:05:54.183009
pgagroal | idle | 00:05:54.126042
pgagroal | idle | 00:05:54.090936
pgagroal | idle | 00:05:54.068769
pgagroal | idle | 00:05:54.011918
pgagroal | idle | 00:05:53.930814
pgagroal | idle | 00:05:53.885552
pgagroal | idle | 00:05:53.778212
pgagroal | idle | 00:05:53.406539
pgagroal | idle | 00:05:53.619185
pgagroal | idle | 00:05:53.56211
pgagroal | idle | 00:05:53.552125
pgagroal | idle | 00:05:53.529931
pgagroal | idle | 00:05:53.508004
pgagroal | idle | 00:05:53.464309
pgagroal | idle | 00:05:53.183035
pgagroal | idle | 00:05:53.384223
pgagroal | idle | 00:05:53.327012
pgagroal | idle | 00:05:53.271237
pgagroal | idle | 00:05:53.260539
pgagroal | idle | 00:05:53.250636
pgagroal | idle | 00:05:53.205172
pgagroal | idle | 00:05:53.1271
pgagroal | idle | 00:05:52.357268
pgagroal | idle | 00:05:53.093096
pgagroal | idle | 00:05:53.047396
pgagroal | idle | 00:05:53.037179
pgagroal | idle | 00:05:53.01488
pgagroal | idle | 00:05:52.935622
pgagroal | idle | 00:05:52.700898
pgagroal | idle | 00:05:52.678142
pgagroal | idle | 00:05:52.644102
pgagroal | idle | 00:05:52.505012
pgagroal | idle | 00:05:52.482957
pgagroal | idle | 00:05:52.240204
pgagroal | idle | 00:05:52.18427
pgagroal | idle | 00:05:52.229701
psql | active | 00:03:44.47224


and clearly, `pgagroal` sets `application_name` to `pgagroal`, not `pgbench` therefore my queries were wrong and the problem was about `pgbench` opening local connections (how?).

I then killed `pgagroal`, and the result was:

pgbench=# select datname, count(*) from pg_stat_activity group by datname;
datname | count
----------+-------
| 5
pgbench | 1
bibliodb | 2


I then restarted `pgagroal` obtaining the *initial size* connections:

pgbench=# select datname, count(*) from pg_stat_activity group by datname;
datname | count
----------+-------
| 5
testdb | 1
pgbench | 6
bibliodb | 2


then run again `pgbench` with `-c 15` and get:

pgbench=# select application_name, state, now() - backend_start from pg_stat_activity where datname = 'pgbench';
application_name | state | ?column?
------------------+--------+-----------------
pgagroal | idle | 00:01:49.709488
pgagroal | idle | 00:01:49.672817
pgagroal | idle | 00:01:49.66682
pgagroal | idle | 00:01:49.661123
pgagroal | idle | 00:01:49.656639
pgbench | idle | 00:00:59.811691
pgbench | idle | 00:00:59.801635
pgbench | idle | 00:00:59.790721
pgbench | idle | 00:00:59.77807
pgbench | idle | 00:00:59.765371
pgbench | idle | 00:00:59.752487
pgbench | idle | 00:00:59.738717
psql | active | 00:09:28.30725
(13 rows)

pgbench=# select datname, count(*) from pg_stat_activity group by datname; datname | count
----------+-------
| 5
testdb | 1
pgbench | 13
bibliodb | 2


Killing `pgagroal` and restarting produces the same results, I don't know how to reproduce the increasing number of connections.
However, the problem remains: asking for `15` clients with a max size of `12` seems to never end.

@fluca1978
Copy link
Collaborator Author

I've noted also another behaviour that I'm not sure is neither regular nor related to this issue: if I kill the pgbench process that is running (and never ending), pgagroal resets all the connections without keeping the min pool connections:

# before

Connection    1:     Active          2024-09-26 15:51:30 262104 15     pgbench pgbench pgbench
Connection    2:     Active          2024-09-26 15:51:30 262105 16     pgbench pgbench pgbench
Connection    3:     Active          2024-09-26 15:51:30 262106 17     pgbench pgbench pgbench
Connection    4:     Active          2024-09-26 15:51:30 262107 18     pgbench pgbench pgbench
Connection    5:     Active          2024-09-26 15:51:30 262108 19     pgbench pgbench pgbench
Connection    6:     Free            2024-09-26 15:51:21        20     luca testdb 
Connection    7:     Active          2024-09-26 15:51:30 262109 5      pgbench pgbench pgbench
Connection    8:     Active          2024-09-26 15:51:30 262111 5      pgbench pgbench pgbench
Connection    9:     Active          2024-09-26 15:51:30 262113 5      pgbench pgbench pgbench
Connection   10:     Active          2024-09-26 15:51:30 262115 5      pgbench pgbench pgbench
Connection   11:     Active          2024-09-26 15:51:30 262117 5      pgbench pgbench pgbench
Connection   12:     Active          2024-09-26 15:51:30 262119 5      pgbench pgbench pgbench
Connection   13:     Active          2024-09-26 15:51:30 262121 5      pgbench pgbench pgbench


# after killing pgbench

Connection    1:     Not initialized                                     
Connection    2:     Not initialized                                     
Connection    3:     Not initialized                                     
Connection    4:     Not initialized                                     
Connection    5:     Not initialized                                     
Connection    6:     Free            2024-09-26 15:51:21        20     luca testdb 
Connection    7:     Not initialized                                     
Connection    8:     Not initialized                                     
Connection    9:     Not initialized                                     
Connection   10:     Not initialized                                     
Connection   11:     Not initialized                                     
Connection   12:     Not initialized                                     
Connection   13:     Not initialized                                     
Connection   14:     Not initialized                                     
Connection   15:     Not initialized                      

but

% pgagroal-cli conf get limit.pgbench.min_size       
5

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant