Auto Execution Yielding #589

ericallam · 2023-10-09T13:47:04Z

ericallam
Oct 9, 2023
Maintainer

When executing a run, we make an HTTP request to the client's endpoint URL and keep that request open until we get a response. Client's are expected to respond in the following situations:

The run experiences an uncaught exception
The run completes
3.io.wait() or io.yield() is called.
There is an error inside of a Task callback
The serverless function times out

For 3, 4, and 5, we'll retry the run execution at a later time, using the cached outputs from completed tasks to prevent duplicate task executions, and to attempt to get to a complete state of the run.

For cases other than when we receive a response via a function timeout we have complete control over the timing and can make pretty strong guarantees that tasks or only executed exactly once.

Unfortunately, we don't control the point at which a function times out, which causes our task execution guarantees to become at-least once if the function times out after the task is executed but before we're able to update the task status/output on the server.

This can lead to situations where tasks are executed twice, and depending on the work being done in the task, cause unwanted behavior. We mitigate this situation with our integrations that support idempotency keys (e.g. our stripe.com integration) as well as providing idempotency keys when using io.runTask(), but very few integrations actually support idempotency keys.

This will be an ongoing area of improvement for a system like Trigger.dev, and there are multiple fronts where we can make improvements, but one thing we can do in the short term without too many changes or risks is to do Auto Execution Yielding to yield execution right before a function is going to timeout.

Auto Execution Yielding

If we can detect the value of a function execution limit, we could provide that information when executing a run and yield execution before we reach the limit, in between task executions. This would make it much less likely that a task would be executed twice because we'd (almost never) hit the function execution timeout. This would also be better from a DX standpoint of not getting 504 timeout errors in your function execution logs (e.g. on Vercel) which could cause problems with monitoring noise.

Detecting function execution limit

Since Vercel does not provide the function execution limit in environment variables, we could pretty easily detect function execution limits by implementing "execution limit probes" that would basically make a request to our endpoint and wait. We would time how long it took to receive a 504 timeout error (with probably our own request timeout to handle situations where the timeout is really high). We'd do this periodically to ensure any changes in the function execution runtime are detected. Once we have this data we can save it to the Endpoint table and pass it down as an optional piece of data in the run execution request, to the client.

Auto Yielding

Once the client gets the execution limit in a run execution request, it can measure the time elapsed since the function started executing. Then we can yield execution before a task is executed if there isn't enough time left at the following points:

Before a task is executed
After a task is executed, but before the server is updated with task output/status
After the server is updated with task output/status

A further iteration of this feature would include historical measurement of task execution times and the ability to better predict when a task is likely to take longer than the available time left. But that can build off the initial work of this feature as a further improvement.

Temporary Workarounds

To workaround this issue before this feature is available, you can use io.yield() to force exiting the current function execution and to resume the run in a new function execution, picking up where it left off. You could put these in strategic points of your job run, along with your knowledge of your own function timeout limit to make it less likely tasks are executed more than once:

client.defineJob({
  id: "send-resend-email",
  name: "Send Resend Email",
  version: "0.1.0",
  trigger: eventTrigger({
    name: "send.email",
  }),
  integrations: {
    resend,
    supabase
  },
  run: async (payload, io, ctx) => {
    const emailsToSend = await io.supabase.runTask("get-emails", async (task) => {
      const { data: emails } = await io.supabase.client.from("emails").select("*").eq("sent", false);
      return emails;
    })

    io.yield("before-send");

    for (const email of emailsToSend) {
      await io.resend.sendEmail(`📧 ${email.to}`, {
        to: email.to,
        subject: email.subject,
        text: email.text,
        from: email.from,
      });

      await io.supabase.runTask(`update ${email.to}`, async (task) => {
        await io.supabase.client.from("emails").update({ sent: true }).eq("id", email.id);
      })

      io.yield(`after ${email.to}`);
    }
  },
});

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Auto Execution Yielding #589

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 0 comments

Select a reply

Uh oh!

Uh oh!

Auto Execution Yielding #589

Uh oh!

Uh oh!

ericallam Oct 9, 2023 Maintainer

Auto Execution Yielding

Detecting function execution limit

Auto Yielding

Temporary Workarounds

Replies: 0 comments

ericallam
Oct 9, 2023
Maintainer