Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(api): delete image race condition #6607

Merged
merged 2 commits into from
Jul 13, 2024

Conversation

psychedelicious
Copy link
Collaborator

Summary

fix(api): deleting large images fails

This issue is caused by a race condition. When a large image is served to the client, it is done using a streaming FileResponse. This concurrently serves the image straight from disk. The file is kept open by FastAPI until the image is fully served.

When a user deletes an image before the file is done serving, the delete fails because the file is still held by FastAPI.

To reproduce the issue:

  • Create a very large image (8k reliably creates the issue).
  • Create a smaller image, so that the first image in the gallery is not the large image.
  • Refresh the app. The small image should be selected.
  • Select the large image and immediately delete it. You have to be fast, to delete it before it finishes loading.
  • In the terminal, we expect to see an error saying Failed to delete image file, and the image does not disappear from the UI.
  • After a short wait, once the image has fully loaded, try deleting it again. We expect this to work.

The workaround is to instead serve the image from memory.

Loading the image to memory is very fast, so there is only a tiny window in which we could create the race condition, but it technically could still occur, because FastAPI is asynchronous and handles requests concurrently.

Once we load the image into memory, deletions of that image will work. Then we return a normal Response object with the image bytes. This is essentially what FileResponse does - except it uses anyio.open_file, which is async.

The tradeoff is that the server thread is blocked while opening the file. I think this is a fair tradeoff.

A future enhancement could be to implement soft deletion of images (db is already set up for this), and then clean up deleted image files on startup/shutdown. We could move back to using the async FileResponse for best responsiveness in the server without any risk of race conditions.

fix(app): windows indefinite hang while finding port

For some reason, I started getting this indefinite hang when the app checks if port 9090 is available. After some fiddling around, I found that adding a timeout resolves the issue.

I confirmed that the util still works by starting the app on 9090, then starting a second instance. The second instance correctly saw 9090 in use and moved to 9091.

Related Issues / Discussions

Closes #6598 (for real this time)

QA Instructions

See description for a repro.

Merge Plan

I will do RC1 release after this merges

Checklist

  • The PR has a short but descriptive title, suitable for a changelog
  • Tests added / updated (if applicable)
  • Documentation added / updated (if applicable)

For some reason, I started getting this indefinite hang when the app checks if port 9090 is available. After some fiddling around, I found that adding a timeout resolves the issue.

I confirmed that the util still works by starting the app on 9090, then starting a second instance. The second instance correctly saw 9090 in use and moved to 9091.
This issue is caused by a race condition. When a large image is served to the client, it is done using a streaming `FileResponse`. This concurrently serves the image straight from disk. The file is kept open by FastAPI until the image is fully served.

When a user deletes an image before the file is done serving, the delete fails because the file is still held by FastAPI.

To reproduce the issue:
- Create a very large image (8k reliably creates the issue).
- Create a smaller image, so that the first image in the gallery is not the large image.
- Refresh the app. The small image should be selected.
- Select the large image and immediately delete it. You have to be fast, to delete it before it finishes loading.
- In the terminal, we expect to see an error saying `Failed to delete image file`, and the image does not disappear from the UI.
- After a short wait, once the image has fully loaded, try deleting it again. We expect this to work.

The workaround is to instead serve the image from memory.

Loading the image to memory is very fast, so there is only a tiny window in which we could create the race condition, but it technically could still occur, because FastAPI is asynchronous and handles requests concurrently.

Once we load the image into memory, deletions of that image will work. Then we return a normal `Response` object with the image bytes. This is essentially what `FileResponse` does - except it uses `anyio.open_file`, which is async.

The tradeoff is that the server thread is blocked while opening the file. I think this is a fair tradeoff.

A future enhancement could be to implement soft deletion of images (db is already set up for this), and then clean up deleted image files on startup/shutdown. We could move back to using the async `FileResponse` for best responsiveness in the server without any risk of race conditions.
@psychedelicious psychedelicious merged commit 7c0dfd7 into main Jul 13, 2024
14 checks passed
@psychedelicious psychedelicious deleted the psyche/fix/delete-image-race-condition branch July 13, 2024 04:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api python PRs that change python files
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[bug]: Failed to delete image file
2 participants