Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reduced tesseract capabilities because of missing libraries for build process #63

Open
1 task done
stweil opened this issue May 14, 2024 · 8 comments
Open
1 task done
Labels
bug Something isn't working

Comments

@stweil
Copy link

stweil commented May 14, 2024

Solution to issue cannot be found in the documentation.

  • I checked the documentation.

Issue

The Tesseract package is built without libarchive and libcurl:

tesseract --version
tesseract 5.3.4
 leptonica-1.83.1
  libgif 5.2.1 : libjpeg 8d (libjpeg-turbo 3.0.0) : libpng 1.6.39 : libtiff 4.6.0 : zlib 1.2.13 : libwebp 1.3.2 : libopenjp2 2.5.0
 Found NEON

Therefore some functionality (for example running OCR with an image URL) is missing.

Installed packages

not relevant

Environment info

not relevant
@stweil stweil added the bug Something isn't working label May 14, 2024
@stweil stweil changed the title Missing libraries for build process Reduced tesseract capabilities because of missing libraries for build process May 14, 2024
@carlodri
Copy link
Contributor

@stweil can you open a PR adding the missing libraries?

stweil added a commit to stweil/tesseract-feedstock that referenced this issue Jun 12, 2024
stweil added a commit to stweil/tesseract-feedstock that referenced this issue Jun 12, 2024
@stweil
Copy link
Author

stweil commented Jun 12, 2024

See #66 which adds libcurl. As this is my first PR here I might have missed something.

libarchive was already in the package list, but not found by the build process. I still have no solution how to fix this.

@scw
Copy link

scw commented Jun 13, 2024

@stweil Thanks for adding libcurl! This package fails to import on Windows still because it was still missing. For libarchive, I do see it showing up once I manually add libcurl to the environment on Windows:

> tesseract --version
tesseract 5.4.1
 leptonica-1.83.1 (Oct 11 2023, 07:58:44) [MSC v.1937 LIB Release x64]
  libjpeg 8d (libjpeg-turbo 3.0.0) : libpng 1.6.43 : libtiff 4.6.0 : zlib 1.2.13 : libopenjp2 2.5.2
 Found AVX
 Found SSE4.1
 Found libarchive 3.7.4 zlib/1.2.13 liblzma/5.2.6 bz2lib/1.0.8 liblz4/1.9.3 libzstd/1.5.6
 Found libcurl/8.8.0 Schannel zlib/1.3.1 libssh2/1.11.0

However on MacOS I also see no mention of libarchive or libcurl. If you build this locally, you should see in the configuration step where it tries to identify the packages contained within the build environment. On Windows, this looks like:

-- Found LibArchive: C:/conda/conda-bld/tesseract/_h_env/Library/lib/archive.lib (found version "3.6.2")

On Windows, this package uses CMake, but uses configure / make on POSIX systems. Perhaps it can switch to CMake on Linux, or you can try passing in the configure flags like --with-curl. In CMake, all the library / include locations are set to reference the environment itself, it should be possible to do the same for configure based builds.

@stweil
Copy link
Author

stweil commented Jun 13, 2024

libarchive is less important (it supports zipped model files, but currently there are no such files as far as I know), so it would not matter much if some platforms don't find it.

libcurl is more important because it allows OCR with image URLs.

@cshaley
Copy link

cshaley commented Aug 26, 2024

5.4.1 libgif is missing on the windows build

@stweil
Copy link
Author

stweil commented Aug 26, 2024

On which Windows build is it missing? I think libgif should be pulled in by Leptonica.

@cshaley
Copy link

cshaley commented Aug 27, 2024

The latest one: win-64/tesseract-5.4.1-h5bc6e0e_0.conda

Looks like leptonica pulls in libjpeg, libpng, libtiff, zlib, and libopenjp2 - but no libgif.

@stweil
Copy link
Author

stweil commented Aug 27, 2024

Then this looks like a missing build and runtime dependency for Leptonica. In https://github.com/conda-forge/leptonica-feedstock/blob/main/recipe/meta.yaml the dependency for giflib is marked with "not win". This was introduced 5 years ago because of missing GIF header files for Windows. So it would be worth to check it again and possibly fix it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

4 participants