Skip to content

feat: process the queue on demand #199

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

steve-chavez
Copy link
Member

@steve-chavez steve-chavez commented Jun 25, 2025

Avoid polling the queue table and instead wait for a wake signal that is sent whenever net.http_get and friends are called. The worker will be "awake" as long as the request queue has rows or as long there are expired responses.

If the worker is restarted while it is "awake", it will keep processing the queue and expired responses once it starts again.

This causes some behaviors to change:

  • pg_net.ttl: Previously the TTL was fulfilled a few seconds away but now it will only happen while the worker is awake, it's not ensured to happen soon after the TTL expires. The ttl is still honored though and its raison d'être is still maintained, it will prevent the response table from growing too much.

  • net.http_request_queue: direct insertions will no longer cause new requests to fire. This should be fine since this table was never meant to be used this way, users should only rely on the net schema functions to queue new requests.

Closes #164, no unnecessary queries are now done and we don't pollute pg_stat_statements.

Closes #175, requests will no longer wait one second all the time. The queue will be processed as soon as new requests arrive.

@steve-chavez steve-chavez force-pushed the new-worker-loop branch 3 times, most recently from f4a2a11 to 4c0c3a0 Compare June 25, 2025 10:21
@@ -55,4 +56,4 @@ def test_query_stat_statements(sess):
"""
)).fetchone()

assert new_calls > old_calls
assert new_calls == old_calls
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This proves #164 is solved

@steve-chavez steve-chavez marked this pull request as ready for review June 25, 2025 10:26
@steve-chavez steve-chavez force-pushed the new-worker-loop branch 2 times, most recently from 9334484 to 98d41ec Compare June 25, 2025 10:49
@steve-chavez
Copy link
Member Author

steve-chavez commented Jun 25, 2025

The loadtest summary also shows performance is still maintained:

In fact it's a bit better since there's less waiting.

sql/pg_net.sql Outdated

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be necessary to add these changes into a new migration file

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the review. Will do that.

I tend to leave the actual migration files for a bump version commit (usually another PR, because a new feat doesn't necessarily imply a new release), but can do it here.

Avoid polling the queue table and instead wait for a `wake` signal that
is sent whenever the `net.http_get` and friends are called.
The worker will be "awake" as long as the request queue has rows in it or as long there are expired responses.

If the worker is restarted while it is "awake", it will keep
processing the queue and expired responses once it starts again.

This causes some behaviors tochange:

- `pg_net.ttl`: Previously the TTL
was fulfilled a few seconds away but now it will only happen while the
worker is awake, it's not ensured to happen soon after the TTL expires.
The `ttl` is still honored though and its raison d'être is still
maintained, it will prevent the response table from growing too much.

- `net.http_request_queue`: direct insertions will no longer cause new
  requests to fire. This should be fine since this table was never meant
  to be used this way, users should only rely on the `net` schema
  functions to queue new requests.

Closes supabase#164, no
unnecessary queries are now done and we don't pollute
`pg_stat_statements`.

Closes supabase#175, requests will
no longer wait one second all the time. The queue will be processed as
soon as new requests arrive.
@steve-chavez steve-chavez force-pushed the new-worker-loop branch 3 times, most recently from c1c0c5e to f04dacb Compare June 26, 2025 19:06
@steve-chavez
Copy link
Member Author

steve-chavez commented Jun 26, 2025

There was a tricky bug where the worker wasn't being woken on an net.http_delete, thus the test waited forever for the request to complete: https://github.com/supabase/pg_net/actions/runs/15874330566/job/44758196662?pr=199

This was due to a missing pg_write_barrier, fixed on 5b088d9. It makes sense since the SetLatch wasn't ensured to happen after worker_state->should_work was set, the barrier fixed that.

I couldn't reproduce that locally, it needed a weakly-ordered CPU such as ARM.

Edit: While this problem is fixed, I think we can just remove this should_work state and prevent new types of bugs. I'll add a commit for this here.

Comment on lines +165 to +188
def test_new_requests_get_attended_asap(sess):
"""new requests get attended as soon as possible"""

sess.execute(text("select net.wait_until_running();"))

sess.execute(text(
"""
select net.http_get('http://localhost:8080/pathological?status=200') from generate_series(1,10);
"""
))

sess.commit()

# less than a second
time.sleep(0.1)

(status_code,count) = sess.execute(text(
"""
select status_code, count(*) from net._http_response group by status_code;
"""
)).fetchone()

assert status_code == 200
assert count == 10
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Proves #175 is solved.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Database Webhooks get fired >1s after record inserted Frequent queries pollute pg_stat_statements
2 participants