Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Investigate improving recognition performance for a single image using threads #27

Open
robertknight opened this issue May 30, 2022 · 1 comment

Comments

@robertknight
Copy link
Owner

There are several ways that threads could be used to speed up recognition:

  1. For a multi-page document, create a pool of OCRClients and distribute images across them. This can already be done today, but won't help for single document images, and requires duplicating a lot of resources in each worker, so is not memory-efficient.
  2. Split a large image into smaller ones and distribute the sub-images to a pool of OCRClients. This is possible today, but again results in a lot of resource duplication, and requires work to figure out how to split up the image.
  3. Tesseract supports OpenMP, but according to comments in the GitHub repo, it isn't that effective at present as it is probably too low-level
  4. Since text recognition is the expensive step, a middle ground would be to do detection in one thread, then use multiple threads to perform recognition of different text lines that were found in parallel. There is a comment in the Tesseract code to the effect that it might be a good win
@robertknight
Copy link
Owner Author

Since text recognition is the expensive step, a middle ground would be to do detection in one thread, then use multiple threads to perform recognition of different text lines that were found in parallel.

Rough plan for this:

  • Add option to OCRClient to use auxiliary workers for text recognition. The option balances speed vs memory usage. One of the workers would be designated the main worker, others would be recognition workers. The recognition workers might need to be created on-demand
  • loadModel would load the model into all workers, or if recognition workers are created on-demand, save a copy of the model data for transfer to recognition workers when they are created subsequently
  • loadImage would load the image into the main worker
  • OCRClient's text recognition methods will query the main worker to fetch text line images (TBD: full color? greyscale? binarized?). Batches of line images will then be distributed to the recognition workers which will run recognition on them, treating each input image as a single line, and return the results (text + bounding boxes). The coordinates will then be adjusted to reflect the input line's position in the original image.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant