Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

InvalidWorkerCreation: Edge functions cannot handle concurrent requests #408

Open
2 tasks done
nathanaeng opened this issue Sep 15, 2024 · 4 comments · May be fixed by #382
Open
2 tasks done

InvalidWorkerCreation: Edge functions cannot handle concurrent requests #408

nathanaeng opened this issue Sep 15, 2024 · 4 comments · May be fixed by #382

Comments

@nathanaeng
Copy link

Bug report

  • I confirm this is a bug with Supabase, not with my own application.
  • I confirm I have searched the Docs, GitHub Discussions, and Discord.

Describe the bug

Making concurrent requests to a Supabase edge function will result in InvalidWorkerCreation errors or 502 errors.

To Reproduce

Steps to reproduce the behavior, please provide code snippets or a repository:

  1. Using the Supabase CLI, create a new function with supabase functions new test_concurrency. Here is an example of a function I have (I realize the createClient is not used):
import "jsr:@supabase/functions-js/edge-runtime.d.ts"
import { createClient } from 'jsr:@supabase/supabase-js@2';

console.log("Hello from Functions!")

Deno.serve(async (req) => {
  const supabaseClient = createClient(
    Deno.env.get('SUPABASE_URL') ?? '',
    Deno.env.get('SUPABASE_SERVICE_ROLE_KEY') ?? '',
  );
  const { name } = await req.json()
  const data = {
    message: `Hello ${name}!`,
  }

  return new Response(
    JSON.stringify(data),
    { headers: { "Content-Type": "application/json" } },
  )
})
  1. Run supabase functions serve

  2. In a new terminal tab, execute this bash script which sends 200 concurrent requests, replacing SERVICE_ROLE_KEY with your service role key:

#!/bin/bash

seq 1 200 | xargs -n1 -P0 -I{} curl -L -X POST 'http://localhost:54321/functions/v1/test_concurrency' -H 'Authorization: Bearer SERVICE_ROLE_KEY' --data '{"name":"Example"}'
  1. Notice how it will successfully execute the function for the first 100 or so requests, before erroring on the supabase functions serve tab:
InvalidWorkerCreation: worker did not respond in time
    at async UserWorker.create (ext:sb_user_workers/user_workers.js:145:15)
    at async Object.handler (file:///root/index.ts:154:22)
    at async respond (ext:sb_core_main_js/js/http.js:163:14) {
  name: "InvalidWorkerCreation"
}

with the following error message on the tab that executes the test script:

{"code":"BOOT_ERROR","message":"Worker failed to boot (please check logs)"}

Expected behavior

I would expect the edge function to be able to handle concurrent requests to this degree.

Screenshots

image

System information

  • OS: macOS, M3 Max
  • Browser (if applies) [e.g. chrome, safari]
  • Version of supabase-js: 1.192.5, using supabase-edge-runtime-1.58.2 (compatible with Deno v1.45.2)
  • Version of Node.js: 18

Additional context

From my understanding, edge functions can be used to serve API routes, and in a production application it is perfectly reasonable that you would have 200 users hit the same endpoint at the same time. This example uses an edge function with minimal computations. If you add database reads, a text embedding call using Supabase.ai gte-small, and a database write, it can handle even fewer concurrent requests (around 40 from my testing). I noticed this issue at first because I wanted to generate text embeddings on seed data consisting of only 40 users (which gets triggered on inserts to a table) but it failed to work for every user.

I'm not entirely sure how edge functions work, maybe a worker is being re-used to handle multiple requests and then a CPU limit or similar is hit, resulting in failures - but I thought the idea of edge functions is to scale up with requests and a mere 200 requests is nothing.

At first I thought that this could be a problem with local Supabase running in Docker, but I also confirmed this occurs on a remote Supabase project (ran using Supabase to host) - where I get 502 errors after the first 50-100 requests or so.

@nathanaeng nathanaeng added the bug Something isn't working label Sep 15, 2024
@ethan-dinh
Copy link

I have encountered a similar issue when trying to call an edge function multiple times concurrently. In my case, making a lot of calls resulted in InvalidWorkerCreation errors or 502 errors. It seems that the scaling ability of edge functions might be limited and this significantly impacts performance when concurrent requests spike.

I feel like other serverless functions can handle concurrent requests with ease, yet edge functions can't even handle 50? Is Supabase not equipped to handle more than 50 concurrent requests? It seems as if the edge function is attempting to create a worker for every single request rather than queuing or using some implementation to resolve concurrency on a large scale.

@nyannyacha
Copy link
Collaborator

Hello @nathanaeng and @ethan-dinh

I am not a member of the Supabase team that works on Supabase Edge Functions, but as the edge runtime maintainer, I'm sorry I didn't meet your expectations 😞

With the user script code and bash script you posted in the description and assuming you're using default edge runtime policy settings in supabase/cli then, I can explain why the edge runtime is showing such low request throughput.

The edge runtime has three main scheduling policies(per_worker, per_request, oneshot) for workers, and for developers convenience, supabase/cli defaults to whichever of these scheduling policies is not used by Supabase Edge Functions. (aka. oneshot policy)

Unlike the other policies, the oneshot policy does not reuse workers but rather creates a new worker and forwards a request to it, even if they have the same service path.
The reason supabase/cli chose this policy as the default is that the source code can be changed by developers at any time, so that the next request will reflect the changed source code.
So it is not used in production(and Supabase Edge Functions) because it is highly inefficient for the reasons described above.

If you change the policy, I think you'll probably get a different result.

I was able to reproduce your issue exactly locally on the oneshot policy using your code, but I was also able to confirm that the per_worker policy is not affected by this issue.

Of course, my experience doesn't guarantee that you won't have the same issue with Supabase Edge Functions.

Today, I came across an author on Reddit discussing this same topic, and it seemed that the author was also experiencing these issues with Supabase Edge Functions.

My expectation is that these issues should be handled well by the per_worker policy, but it looks like sometimes it's not able to properly forward the many request traffic to the workers and just gives up. (Forgive me, I have very limited visibility for Edge Functions because I am not a member of the Supabase team).

I have opened PR-382 to better handle this situation, and once this is merged, they will be able to implement more specific request scheduling on top of the per_worker policy, which I believe will mitigate these issues.

I will put this on my watchlist and will let you guys know if there are any updates on this issue in the future.

Have a great day!

@nathanaeng
Copy link
Author

nathanaeng commented Sep 15, 2024

Thanks for the detailed response! Yep, I have looked into the per_worker policy and while it might work fine for the simple edge function I provided above, it was failing for a more complex edge function that performs a read, text embedding, and write. I can't recall how many concurrent requests it was able to handle, it might have been a bit more than oneshot but it was still underwhelming unfortunately. Additionally, I was able to replicate this error on my remote DB (Supabase hosted) which makes me think it's not just a local hosting issue. Thanks for helping though!

@thurahtetaung
Copy link

thurahtetaung commented Sep 17, 2024

Hello @nyannyacha , thanks for your detailed response. As someone who self-hosts edge functions separately (not together with supabase docker compose), where should I go about changing the policies you mentioned? I suspect it is in the main function index.ts with forceCreate = true or false but I am not sure and I am still getting those 502 errors after 30-50 concurrent requests even with the forceCreate = false option. Can you help me figure out some other configurations in the main function where I can optimize for better scaling performance? I am running it in multiple replicas in my K8s deployment but the replicas still cannot pass the load test because the edge runtime container stop responding to requests and return 502 with the above error after a few concurrent requests.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
4 participants