Reduce the time to generate the OCR field #30
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
The following changes were applied to this PR to reduce the time to generate the OCR field.
In the following picture, you can see how most of the time, the full-text document is in the process of generating the ORC field.
I used seven documents to measure the time to generate the OCR field. Before this PR, around 0.38 seconds to generate the OCR field, after the PR ~0.16 seconds.
Steps to test this PR,
docker build -t document_generator .
docker compose up document_generator -d
docker compose exec document_generator pytest document_generator ht_document ht_queue_service ht_utils