-
Notifications
You must be signed in to change notification settings - Fork 9.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
tesseract ocr big size pic dump #3885
Labels
Comments
GerHobbelt
added a commit
to GerHobbelt/tesseract
that referenced
this issue
Feb 27, 2023
… input images. Available to both userland and tesseract internal code, these can be used to report & early fail images which are too large to fit in memory. Some very lenient defaults are used for the memory pressure allowance (1.5 GByte for 32bit builds, 64GByte for 64bit builds) but this can be tweaked to your liking and local machine shop via Tesseract Global Variable `allowed_image_memory_capacity` (DOUBLE type). NOTE: the allowance limit can be effectively removed by setting this variable to an 'insane' value, e.g. `1.0e30`. HOWEVER, the CheckAndReportIfImageTooLarge() API will still fire for images with either width or high dimension >= TDIMENSION_MAX, which in the default built is the classic INT16_MAX (32767px); when compiled with defined(LARGE_IMAGES), then the width/height limit is raised to 24bit i.e. ~ 16.7 Mpx, which would then tolerate images smaller than 16777216 x 16777216px. (This latter part is a work-in-progress.) Related: - tesseract-ocr#3184 - tesseract-ocr#3885 - tesseract-ocr#3435 (pullreq by @stweil -- WIP) # Conflicts: # src/api/baseapi.cpp # src/ccmain/tesseractclass.h # src/ccmain/thresholder.cpp # src/ccutil/params.h # src/textord/tordmain.cpp
GerHobbelt
added a commit
to GerHobbelt/tesseract
that referenced
this issue
Feb 27, 2023
… input images. Available to both userland and tesseract internal code, these can be used to report & early fail images which are too large to fit in memory. Some very lenient defaults are used for the memory pressure allowance (1.5 GByte for 32bit builds, 64GByte for 64bit builds) but this can be tweaked to your liking and local machine shop via Tesseract Global Variable `allowed_image_memory_capacity` (DOUBLE type). NOTE: the allowance limit can be effectively removed by setting this variable to an 'insane' value, e.g. `1.0e30`. HOWEVER, the CheckAndReportIfImageTooLarge() API will still fire for images with either width or high dimension >= TDIMENSION_MAX, which in the default built is the classic INT16_MAX (32767px); when compiled with defined(LARGE_IMAGES), then the width/height limit is raised to 24bit i.e. ~ 16.7 Mpx, which would then tolerate images smaller than 16777216 x 16777216px. (This latter part is a work-in-progress.) Related: - tesseract-ocr#3184 - tesseract-ocr#3885 - tesseract-ocr#3435 (pullreq by @stweil -- WIP)
IMO, we should undo 57b79742920c |
That would not fix the issue here which is caused by missing error handling. |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
hi,
void TessBaseAPI::SetImage(Pix *pix) API function has a coredump problem when handling a big size pic(system memory no enough)
void TessBaseAPI::SetImage(Pix *pix) {
if (InternalSetImage()) {
if (pixGetSpp(pix) == 4 && pixGetInputFormat(pix) == IFF_PNG) {
// remove alpha channel from png
Pix *p1 = pixRemoveAlpha(pix);
pixSetSpp(p1, 3);
(void)pixCopy(pix, p1); <---- bug
pixDestroy(&p1);
}
thresholder_->SetImage(pix);
SetInputImage(thresholder_->GetPixRect());
}
}
pixCopy(pix, p1) function in leptonica, return pixd, or NULL on error
so it is necessary to check pixCopy return val.
Environment
Current Behavior:
tesseract dump
Expected Behavior:
tesseract ocr ok(not dump)
Suggested Fix:
Possible fix:
void TessBaseAPI::SetImage(Pix *pix) {
if (InternalSetImage()) {
if (pixGetSpp(pix) == 4 && pixGetInputFormat(pix) == IFF_PNG) {
// remove alpha channel from png
Pix *p1 = pixRemoveAlpha(pix);
pixSetSpp(p1, 3);
}
}
tks
The text was updated successfully, but these errors were encountered: