Skip to content

bump to pg_net 0.19.5 #1728

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: develop
Choose a base branch
from
Open

bump to pg_net 0.19.5 #1728

wants to merge 1 commit into from

Conversation

steve-chavez
Copy link
Member

@steve-chavez steve-chavez commented Jul 30, 2025

  1. No longer executes unnecessary queries, preventing pollution of pg_stat_statements feat: process the queue on demand pg_net#199
    • This also fixes the memory leak problem with pg_stat_monitor
  2. Fixes the previous problem on 0.19.3 with DROP DATABASE fix: preventing DROP DATABASE from completing  pg_net#217
  3. http_delete with request body added in feat: support net.http_delete with request body pg_net#173

Bumping from 0.14 to 0.19.4 might seem like a big change, but many minor versions were bumped just because of the addition of testing utilities (and addition of DEBUG traces, which perhaps were too strict to be minor versions) and pg v18 compatibility. The main changes here are 1 and 3.

Release notes:

@steve-chavez steve-chavez requested review from a team as code owners July 30, 2025 21:41
@soedirgo
Copy link
Member

Please ensure local infra testing is done this time to prevent the mishap in #1704. Our current automated tests are not sufficient.

@steve-chavez
Copy link
Member Author

steve-chavez commented Jul 31, 2025

@soedirgo Are you sure the manual tests would reproduce the DROP DATABASE problem?

From:

It looks like they wouldn't have prevented this case anyway. This happens when DROP DATABASE is done on a different database than the current one being used.

@soedirgo
Copy link
Member

We need to err on the side of caution and do E2E testing even if it’s manual. It’s clear in hindsight that the blast radius of this change is bigger than initially expected.

You can try reproducing the problem on the new AMI through local infra testing - if the same issue doesn’t show up you’re good to go.

@za-arthur
Copy link
Contributor

Not sure that supabase/pg_net#217 will help to fix the issue with DROP DATABASE.

The command DROP DATABASE ... WITH (force) sends SIGTERM:
https://github.com/postgres/postgres/blob/REL_17_5/src/backend/storage/ipc/procarray.c#L3925

pg_net already calls SetLatch when handling SIGTERM:
https://github.com/supabase/pg_net/blob/b9f39d0bbdd8a62d249bce67102d84cc5ce948ba/src/worker.c#L128

By default background workers just die if SIGTERM handler is not overridden:
https://github.com/postgres/postgres/blob/REL_17_5/src/backend/postmaster/bgworker.c#L709

@steve-chavez
Copy link
Member Author

steve-chavez commented Jul 31, 2025

@za-arthur Many thanks for your review, this is helpful.

I can confirm that prior to the fix (on this previous commit supabase/pg_net@a78be08), the with (force) variant also never finished:

postgres=# drop database foo with (force);
...
2025-07-31 15:26:15.687 -05 [573129] LOG:  still waiting for backend with PID 573119 to accept ProcSignalBarrier
2025-07-31 15:26:15.687 -05 [573129] STATEMENT:  drop database foo with (force);
2025-07-31 15:26:20.692 -05 [573129] LOG:  still waiting for backend with PID 573119 to accept ProcSignalBarrier
2025-07-31 15:26:20.692 -05 [573129] STATEMENT:  drop database foo with (force);

However this only happens for pg >= 15. Previous pg versions had DROP DATABASE with (force) and DROP DATABASE working fine (they didn't block).

The command DROP DATABASE ... WITH (force) sends SIGTERM:

So I guess this behavior changed on pg 15 and SIGUSR1 is sent first?

Edit: added a new test for this supabase/pg_net#219

@za-arthur
Copy link
Contributor

@steve-chavez indeed, I see now that v0.19.4 should fix the issue with DROP DATABASE. I mixed up the original issue, and thought that it is a bit different issue.

I think if pg_net would use MyLatch instead of defining new latch then there wouldn't be such issue. It would make sense to use WaitLatch(MyLatch, ...). Defining additional latch makes sense when you need additional latch in addition to MyLatch. I wonder if there might be other similar issues with handling other signals.

You don't need to define new signal to synchronize between processes. You can assign a worker latch to the shared structure by:

static void net_shmem_startup(void) {
    ...
    worker_state->latch = MyLatch;
    // or
    worker_state->latch = &MyProc->procLatch;
    ...
}

@za-arthur
Copy link
Contributor

@steve-chavez BTW I also noticed that pg_net initializes shared memory not fully correctly. I created an issue:
supabase/pg_net#220

@steve-chavez
Copy link
Member Author

steve-chavez commented Aug 1, 2025

I think if pg_net would use MyLatch instead of defining new latch then there wouldn't be such issue. It would make sense to use WaitLatch(MyLatch, ...). Defining additional latch makes sense when you need additional latch in addition to MyLatch

@za-arthur It needs a shared latch so other backends can wake the background worker when doing net.http_get and other http requests. The reasoning is for this change is mentioned on supabase/pg_net#199 (essentially to avoid polling the request queue table).

You don't need to define new signal to synchronize between processes. You can assign a worker latch to the shared structure by:

Btw, UNIX signals could have been used instead of a shared latch but shared latchs are much faster than signals since the communication happens in memory, we needed to make it the fastest to solve supabase/pg_net#175

@steve-chavez BTW I also noticed that pg_net initializes shared memory not fully correctly. I created an issue:
supabase/pg_net#220

Thanks, will take a look. Converting the PR to draft until the issue is closed.

@za-arthur
Copy link
Contributor

It needs a shared latch so other backends can wake the background worker when doing net.http_get and other http requests.

I think MyLatch is a shared latch. At least this is exactly how I used it before. See:
https://github.com/postgres/postgres/blob/REL_17_5/src/backend/storage/lmgr/proc.c#L450
https://github.com/postgres/postgres/blob/REL_17_5/src/backend/storage/lmgr/proc.c#L228

The code snippet above for net_shmem_startup I sent earlier is wrong. You can save the worker latch when it starts:

void pg_net_worker(__attribute__ ((unused)) Datum main_arg) {
    ...
    worker_state->latch = MyLatch;
    // or
    worker_state->latch = &MyProc->procLatch;
    ...
}

@steve-chavez
Copy link
Member Author

I think MyLatch is a shared latch. At least this is exactly how I used it before. See:
https://github.com/postgres/postgres/blob/REL_17_5/src/backend/storage/lmgr/proc.c#L450
https://github.com/postgres/postgres/blob/REL_17_5/src/backend/storage/lmgr/proc.c#L228

@za-arthur Thanks for the hint! I've addressed that on supabase/pg_net#221. Considering it a refactor since there were no known bugs caused by the previous behavior, but it's still good to avoid the extra latch.

@steve-chavez steve-chavez marked this pull request as draft August 1, 2025 15:38
- No longer executes unnecessary queries, preventing pollution of pg_stat_statements supabase/pg_net#199
- Fixes the memory leak problem with `pg_stat_monitor` supabase/pg_net#199
- Fixes the previous problem on 0.19.3 with DROP DATABASE supabase/pg_net#217
- `http_delete` with request body added in supabase/pg_net#173
@steve-chavez steve-chavez changed the title bump to pg_net 0.19.4 bump to pg_net 0.19.5 Aug 5, 2025
@steve-chavez steve-chavez marked this pull request as ready for review August 5, 2025 17:39
@steve-chavez
Copy link
Member Author

All feedback from Artur has been solved and I've updated this PR to the new pg_net 0.19.5 patch version.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants