Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

A lot of memory used on newspaper pages? #110

Open
2 tasks
mikegerber opened this issue Feb 22, 2024 · 6 comments
Open
2 tasks

A lot of memory used on newspaper pages? #110

mikegerber opened this issue Feb 22, 2024 · 6 comments
Assignees
Labels
bug Something isn't working

Comments

@mikegerber
Copy link
Collaborator

mikegerber commented Feb 22, 2024

In OCR-D/quiver-benchmarks#22, @stweil mentions 118 GB being used for newspaper pages.

  • Reproduce
  • Can we test for this somehow
@mikegerber mikegerber self-assigned this Feb 22, 2024
@mikegerber
Copy link
Collaborator Author

Might have overlooked this because a. our servers have a lot of memory and b. I didn't process a lot of newspapers.

  1. I asked @stweil for the input data. Need to check if I have some newspaper pages readily segmented.

  2. Options used seem to be -I OCR-D-SEG-LINE-RESEG-DEWARP -O OCR-D-OCR -P checkpoint_dir qurator-gt4histocr-1.0"

@mikegerber mikegerber added the bug Something isn't working label Feb 22, 2024
@mikegerber mikegerber changed the title A lot of memory used on newspaper pages A lot of memory used on newspaper pages? Feb 27, 2024
@mikegerber
Copy link
Collaborator Author

I don't have the data for the issue mentioned in OCR-D/quiver-benchmarks#22, tried to produce something similar but failed due to an unrelated issue.

→ Trying with some other data supplied by @cneud

@mikegerber
Copy link
Collaborator Author

Yeah, ran into another unrelated issue first: OCR-D/core#1179

@mikegerber
Copy link
Collaborator Author

The page I used only had 365 lines, didn't see anything more than 1.8 GB RSS ("not great, not terrible").

There is something else wrong, though, it seems to use the raw (RGB) images for some lines, this does not make sense. But the XML may be not be 100% as I imported it etc. pp.

@mikegerber
Copy link
Collaborator Author

There is something else wrong, though, it seems to use the raw (RGB) images for some lines, this does not make sense. But the XML may be not be 100% as I imported it etc. pp.

The workspace also showed signs of OCR-D/core#1195, so I'll try again first, with METS caching disabled.

@mikegerber
Copy link
Collaborator Author

I've redone the segmentation, no "raw image" problem anymore. Probably just because I couldn't figure out how to fix up the XML so i works properly with the AlternativeImage logic

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant