Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clean up transaction management for file_complete handler #930

Draft
wants to merge 4 commits into
base: develop
Choose a base branch
from

Conversation

BenGalewsky
Copy link
Contributor

@BenGalewsky BenGalewsky commented Nov 25, 2024

Problem

The TransformerFileComplete resource handler is the most critical code in the entire stack. It responds to each file in the dataset being transformed, and is responsible for updating the total number of files processed (either successfully, or failure) - these two counters are how we determine if the transform is complete. The endpoint will be hit repeatedly by all of the running transformers. Consequently, database transaction handing is very important to avoid missing files.

The current implementation uses implicit transactions and doesn't manage locks and flushing to the DB. It's possible that this allows for files to be lost during big transform requests.

Approach

  1. Us the DB session to explicitly manage transactions
  2. Update the record_file_complete to read the request with with_for_update flag set which will lock the record in the db
  3. The increments to files are handled in the same transaction
  4. To make the file more readable, the retry call arguments are captured in a single, new decorator.file_complete_ops_retry -

With this decorator, I ran into a problem with unit tests. Importing the module caused the current_app.logger expression to be evaluated. This would throw RuntimeError: Working outside of application context. in the unit tests. Worked around this in the decorator to only access that logger if we are inside the flask app

… transaction

We are moving from implicit transactions to more fine-grain control. We don't
want to completely turn off implicit transactions during this migration, so any
database interaction can start the transaction which can spoil things later on.
This decorator was causing a transaction to be started for any user-facing REST endpoint.
Change this so it runs in its own transaction and commits it before entering the
endpoint code.
@BenGalewsky BenGalewsky marked this pull request as draft November 25, 2024 21:22
Base automatically changed from delete_fixes to develop November 26, 2024 04:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant