Skip to content

Commit

Permalink
Merge pull request #1005 from eikek/fix/973-jpn-ocr
Browse files Browse the repository at this point in the history
Use different japanese train files for tesseract
  • Loading branch information
mergify[bot] authored Aug 13, 2021
2 parents f79aa44 + 326cf1c commit 1d90095
Showing 1 changed file with 6 additions and 0 deletions.
6 changes: 6 additions & 0 deletions docker/dockerfiles/joex.dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -63,6 +63,12 @@ RUN wget ${joex_url:-https://github.com/eikek/docspell/releases/download/v$versi
rm docspell-joex-*.zip && \
ln -snf docspell-joex-* docspell-joex

# Using these data files for japanese, because they work better. See #973
RUN \
wget https://raw.githubusercontent.com/tesseract-ocr/tessdata_fast/master/jpn_vert.traineddata && \
wget https://raw.githubusercontent.com/tesseract-ocr/tessdata_fast/master/jpn.traineddata && \
mv jpn*.traineddata /usr/share/tessdata

COPY joex-entrypoint.sh /opt/joex-entrypoint.sh

ENTRYPOINT ["/opt/joex-entrypoint.sh", "-J-XX:+UseG1GC"]
Expand Down

0 comments on commit 1d90095

Please sign in to comment.