diff --git a/en/setup.html b/en/setup.html index 0be3d683..f054dd9b 100644 --- a/en/setup.html +++ b/en/setup.html @@ -481,12 +481,11 @@

System requirements

- Python 3.6 or 3.7 - - - OCR-D's target Python version is currently Python 3.6 which we will continue to support until at least Q3 2022
- - Python 3.7 is also tested and supported
- - Python 3.8 and newer versions are not yet fully supported, since there are no pre-built Python packages for Tensorflow 2.5 and <2 and other related software. We expect to unconditionally support Python 3.8 once all processors and models are upgraded to work with a more recent Tensorflow. - + Python 3.7 + + - OCR-D's target Python version is currently Python 3.7. Python 3.8 also works. Python < 3.7 is not supported.
+ - Python 3.9 and newer versions are not yet fully supported, since there are no pre-built Python packages for Tensorflow 2.5 and <2 and other related software. We expect to unconditionally support Python 3.9 once all processors and models are upgraded to work with a more recent Tensorflow. +
Operating system: Ubuntu 18.04 (or Docker) diff --git a/feed.xml b/feed.xml index e3356cd5..19f2bd76 100644 --- a/feed.xml +++ b/feed.xml @@ -1,4 +1,4 @@ -Jekyll2024-07-26T12:58:56+02:00https://ocr-d.de/feed.xmlOCR-DWrite an awesome description for your new site here. You can edit this line in _config.yml. It will appear in your document head meta (for Google search results) and in your feed.xml site description.OCR-D Phase III gestartet2021-08-06T00:00:00+02:002021-08-06T00:00:00+02:00https://ocr-d.de/de/2021/08/06/kick-off-phase3Am 30. Juli fand unser Kick-off-Workshop statt, der die Phase III von OCR-D einläutete.

+Jekyll2024-07-26T13:18:57+02:00https://ocr-d.de/feed.xmlOCR-DWrite an awesome description for your new site here. You can edit this line in _config.yml. It will appear in your document head meta (for Google search results) and in your feed.xml site description.OCR-D Phase III gestartet2021-08-06T00:00:00+02:002021-08-06T00:00:00+02:00https://ocr-d.de/de/2021/08/06/kick-off-phase3Am 30. Juli fand unser Kick-off-Workshop statt, der die Phase III von OCR-D einläutete.

Das Team gab eine Einführung in die Ziele und öffentlichen Kommunikationskanäle von OCR-D in Phase III, in Status und Pläne der OCR-Software und der Web-API und in den Umgang mit Ground Truth Daten in OCR-D. Zudem gab das Koordinierungsprojekt einen Einblick in die bisherige Praxis der Softwareentwicklung in OCR-D mit Möglichkeiten, mitzuwirken.

diff --git a/search-index.json b/search-index.json index 018b34e6..ed7fd47a 100644 --- a/search-index.json +++ b/search-index.json @@ -556,7 +556,7 @@ { "slug": "en-setup-html", "title": "OCR-D setup guide", - "content" : "# OCR-D setup guideOCR-D's software is a modular collection of many projects (called _modules_)with many tools per module (called _processors_) that you can combine freelyto achieve the workflow best suited for OCRing your content.## System requirementsMinimum system requirements 8 GB RAM (more recommended) - The more RAM is available, the more concurrent processes can be run - Exceedingly large images (newspapers, folio-size books...) require a lot of RAM 20 GB free disk space for local installation (more recommended) - How much disk space is needed depends mainly on the individual purposes of the installation. In addition to the installation itself you will need space for various pretrained models, training and evaluation data for training, and data to process. Python 3.6 or 3.7 - OCR-D's target Python version is currently Python 3.6 which we will continue to support until at least Q3 2022 - Python 3.7 is also tested and supported - Python 3.8 and newer versions are not yet fully supported, since there are no pre-built Python packages for Tensorflow 2.5 and Operating system: Ubuntu 18.04 (or Docker) - For installation on Windows 10 (WSL) and macOS see the setup guides in the [OCR-D-Wiki](https://github.com/OCR-D/ocrd-website/wiki). - Ubuntu 18.04 is our target platform because it was the most up-to-date Ubuntu LTS release when we started developing and will be supported for the foreseeable future - Ubuntu 22.04 is now (2022) the current Ubuntu LTS, seems to work, too, and will be our next target platform. - Other Linux distributions or Ubuntu versions can also be used, though some instructions have to be adapted (e.g. package management, locations of some files) - With Windows Subsystem for Linux (WSL), a feature of Windows 10, it is also possible to set up an Ubuntu 18.04 installation within Microsoft Windows - OCR-D can be deployed on an Apple MacOSX machine using Homebrew## Installation### ocrd_all`ocrd_all` is the main way to distribute and install the OCR-D software.If you want to produce OCR output from image data,this is what you need. Tell me more about ocrd_allThe [`ocrd_all`](https://github.com/OCR-D/ocrd_all) project is an effort by theOCR-D community, now maintained by the OCR-D coordination team. It streamlinesthe native installation of OCR-D modules with a versatile Makefile approach.Besides allowing native installation of the full OCR-D stack (or any subset),it is also the base for the [`ocrd/all`](https://hub.docker.com/r/ocrd/all)Docker images available from DockerHub that contain the full stack (or certain subsets)of OCR-D modules ready for deployment.Technically, [`ocrd_all`](https://github.com/OCR-D/ocrd_all) is a Git repositorythat keeps all the necessary software as Git submodules at specific revisions.This way, the software tools are known to be at a stable version and guaranteed tobe interoperable with one another. ### Installation: Docker or NativeThere are two methods to install OCR-D: 1. **[Docker Installation of OCR-D](#ocrd_all-via-docker)** using the prebuilt `ocrd/all` [Docker images](https://hub.docker.com/r/ocrd/all) to install a module collection (**recommended**) 2. **[Native Installation of OCR-D](#ocrd_all-natively)** using the `ocrd_all` [git repository](https://github.com/OCR-D/ocrd_all) to install selected modules nativelyWe recommend using the prebuilt Docker images, since this does not require any changes tothe host system besides [installing Docker](https://hub.docker.com/r/ocrd/all). Installation of individual OCR-D modulesSometimes it can be useful to [install the modules individually](#individual-installation-experts-only), either via Docker or natively.Beware that we do not recommend installing modules individually, as it can be difficult to catch all dependencies,keep the software versions up-to-date and ensure that all components are at a usable and interoperable state. ## ocrd_all via Docker### PrerequisitesIf you want to use the OCR-D-via-Docker solution, [docker](https://docs.docker.com/install/linux/docker-ce/ubuntu/#install-using-the-repository) and [docker compose](https://docs.docker.com/compose/install/) have to be installed.After installing docker you may have to set up and start the docker daemon and add your user to the `docker` group:```sh# Start docker daemon at startupsudo systemctl enable docker# Add user to group 'docker'sudo usermod -aG docker $USER``` Please log out and log in again.To test access to docker try the following command:```shdocker images```Now you should see an (empty) list of available images.### mini medi maxiThere are three versions of the[`ocrd/all`](https://hub.docker.com/r/ocrd/all) Docker image:`minimum`, `medium` and `maximum`. They differ in which modules are includedand hence the size of the image:* `minimum` is comprised of the essential OCR-D components, with Tesseract and OCRopus as OCR engines.* `medium` adds the Calamari OCR engine, as well as extra segmentation, pre- and postprocessing options.* `maximum` includes all modules for best performance and full flexibility, but requires the most disk space.We encourage the use of the relatively large but complete `maximum` image.The `minimum` or `medium` images should only be used when certain that none but the included OCR-Dmodules are needed. Click here for a table showing the modules included in each version| Module | `minimum` | `medium` | `maximum` || ----- | ---- | ---- | ---- || core | ☑ | ☑ | ☑ || ocrd_cis | ☑ | ☑ | ☑ || ocrd_fileformat | ☑ | ☑ | ☑ || ocrd_im6convert | ☑ | ☑ | ☑ || ocrd_pagetopdf | ☑ | ☑ | ☑ || ocrd_repair_inconsistencies | ☑ | ☑ | ☑ || ocrd_tesserocr | ☑ | ☑ | ☑ || ocrd_wrap | ☑ | ☑ | ☑ || tesserocr | ☑ | ☑ | ☑ || workflow-configuration | ☑ | ☑ | ☑ || cor-asv-ann | - | ☑ | ☑ || dinglehopper | - | ☑ | ☑ || docstruct | - | ☑ | ☑ || format-converters | - | ☑ | ☑ || ocrd_calamari | - | ☑ | ☑ || ocrd_keraslm | - | ☑ | ☑ || ocrd_olahd_client | ☑ | ☑ | ☑ || ocrd_olena | - | ☑ | ☑ || ocrd_segment | - | ☑ | ☑ || tesseract | - | ☑ | ☑ || ocrd_neat | - | ☑ | ☑ || ocrd_anybaseocr | - | - | ☑ || ocrd_detectron2 | - | - | ☑ || ocrd_doxa | - | - | ☑ || ocrd_kraken | - | - | ☑ || ocrd_ocropy | - | - | - || ocrd_pc_segmentation | - | - | - || ocrd_typegroups_classifier | - | - | ☑ || sbb_binarization | - | - | ☑ || cor-asv-fst | - | - | - |### Fetch Docker imageTo fetch the `maximum` version of the `ocrd/all` Docker image:(replace `maximum` accordingly if you want the `minimum` or `medium` version)```shdocker pull ocrd/all:maximum``` Docker and git imagesIf you want to keep the modules' git repos inside the Docker images – so you can keep makingfast updates, without waiting for a new pre-built image, but also without building an image yourself –then add the suffix `-git` to the image version, e.g. `maximum-git`. This will behave like the native installation,only inside the container. Yes, you can also [commit changes](https://rollout.io/blog/using-docker-commit-to-create-and-change-an-image/)made in containers back to your local Docker image.) ### Testing the Docker installationTo start, download and extract a document from the [OCR-D GT Repo](https://ola-hd.ocr-d.de/search?q=&fulltextsearch=false&metadatasearch=false&isGT=true&perPageRecords=30):```shwget "https://ola-hd.ocr-d.de/api/export?id=21.11156/BFBAD520-65F4-430A-B4B2-C81A296C9E09&internalId=false" -O wundt_grundriss_1896.ocrd.zipunzip wundt_grundriss_1896.ocrd.zipcd data```Now, spin up the docker container:```shdocker run --user $(id -u) --workdir /data --volume $PWD:/data --rm -it ocrd/all bash```Your command line should start with something similar to:```shI have no name!@ade9a4692fcd:/data$```After spinning up the container, you can use the installation and call the processors the same way as in the native installation.Alternatively, you can [translate each command to a docker call](/en/user_guide#translating-native-commands-to-docker-calls).Let's segment the images in file group `OCR-D-IMG` from the zip file into regions, thereby creating aMETS file group `OCR-D-SEG-BLOCK-DOCKER`):```shocrd-tesserocr-segment-region -I OCR-D-IMG -O OCR-D-SEG-BLOCK-DOCKER```When you are finished using OCR-D commands, use this command to stop using docker interactively:```shexit```### Updating Docker imageTo update the Docker image to the latest version, just run the `docker pull` command:(replace `maximum` accordingly if you use the `minimum` or `medium` version)```shdocker pull ocrd/all:maximum```### Further readingWe recommend jumping to the [section about installing models at the bottom of this page](#installing-models) next.Alternatively, for instructions on how to proceed further with the processing of your data, please see the [user guide](/en/user_guide). Make sure to also read [the notes on translating native command line calls to docker calls](/en/user_guide#translating-native-commands-to-docker-calls).## ocrd_all nativelyThe `ocrd_all` project contains a sophisticated Makefile to install or compileprerequisites as necessary, set up a virtualenv, install the core software,install OCR-D modules and more. Detailed documentation [can be found in itsREADME](https://github.com/OCR-D/ocrd_all).### InstallationThere are some [system requirements](https://github.com/OCR-D/ocrd_all#system-packages) for ocrd_all.You need to have `make` installed to make use of `ocrd_all`:```shsudo apt install make```Clone the repository (still without submodules) and change into the `ocrd_all` directory:```shgit clone https://github.com/OCR-D/ocrd_allcd ocrd_all```You should now be in a directory called `ocrd_all`.It is easiest to install all the possible system requirements by calling `make deps-ubuntu` as root:```shsudo make deps-ubuntu```This will install all system requirements.Now you are ready for the final step which will actually install the OCR-D-Software.You can either install 1. all the software at once with the `all` target (equivalent to the [`maximum` Docker version](#mini-medi-maxi)), 2. modules individually by using an executable from that module as the target, or 3. a subset of modules by listing the project names in the `OCRD_MODULES` variable (equivalent to a custom selection of the [`medium` Docker version](#mini-medi-maxi)):```shmake all # Installs all the software (recommended)make ocrd-tesserocr-binarize # Install ocrd_tesserocr which contains ocrd-tesserocr-binarizemake ocrd-cis-ocropy-binarize # Install ocrd_cis which contains ocrd-cis-ocropy-binarizemake all OCRD_MODULES="core ocrd_tesserocr ocrd_cis" # Will install only ocrd_tesserocr and ocrd_cis```(Custom choices for `OCRD_MODULES` and other control variables (cf. `make help`) can also be made permanent by writing them into `local.mk`.)**Note:** Never run `make all` as root unless you know *exactly* what you are doing!Installation is incremental, i.e. failed/interrupted attempts can be continued, and modules can be installed one at a time as needed.Running `make` will also take care of cloning and updating all required submodules.Especially running `make all` will take a while (between 30 and 60 minutes or more on slower machines). In the end, it should say that the last processor was installed successfully.Having installed `ocrd_all` successfully, `ocrd --version` should give you the current version of [OCR-D/core](https://github.com/OCR-D/core).Activate the virtual Python environment, which was created in the directory `venv`, before running any OCR-D command.```shsource venv/bin/activateocrd --versionocrd, version 2.13.2 # your version should be 2.13.2 or later```### Testing the native installationFor example, let's fetch a document from the [OCR-D GT Repo](https://ocr-d-repo.scc.kit.edu/api/v1/metastore/bagit/):```shwget "https://ola-hd.ocr-d.de/api/export?id=21.11156/BFBAD520-65F4-430A-B4B2-C81A296C9E09&internalId=false" -O wundt_grundriss_1896.ocrd.zipsudo unzip wundt_grundriss_1896.ocrd.zipcd data```If you haven't done it already, activate your venv:```sh# Activate the venvsource /path/to/ocrd_all/venv/bin/activate```Let's segment the images in file group `OCR-D-IMG` from the zip file into regions (creating afirst [PAGE-XML](https://github.com/PRImA-Research-Lab/PAGE-XML) file group`OCR-D-SEG-BLOCK`):```shocrd-tesserocr-segment-region -I OCR-D-IMG -O OCR-D-SEG-BLOCK```### Updating the softwareAs `ocrd_all` is in [activedevelopment](https://github.com/OCR-D/ocrd_all/commits/master), it is wise toregularly update the repository and its submodules:```shgit pull```This will refresh the local clone of ocrd_all with the changes in the official ocrd_all GitHub repository.Now you can install the changes with```shmake all```This will run the installation process for all submodules which have been changed. In the end, it shouldsay that the last processor was installed successfully. `--version` for the processors which have been changedshould give you its current version.### Further readingWe recommend jumping to the [section about installing models at the bottom of this page](#installing-models) next.For instructions on how to process your own data, please see the [user guide](/en/user_guide).## Individual installation (experts only)For developing purposes it might be useful to install modules individually, either with Docker or natively.With all variants of individual module installation, it will be up to you tokeep the repositories up-to-date and installed. We therefore discourageindividual installation of modules and recommend using ocrd_all as outlined above..All [OCR-D modules](https://github.com/topics/ocr-d) follow the same[interface](https://ocr-d.github.io/cli) and common design patterns. So onceyou understand how to install and use one project, you know how to install anduse all of them.### Individual Docker containerThis is the best option if you want full control over which modules youactually intend to use while still profiting from the simple installation ofDocker containers.You need to have [Docker](https://docs.docker.com/install/linux/docker-ce/ubuntu/)Many OCR-D modules are also [published as Docker containers to DockerHub](https://hub.docker.com/u/ocrd). To find the Dockerimage for a module, replace the `ocrd_` prefix with `ocrd/`:```shdocker pull ocrd/tesserocr # Installs ocrd_tesserocrdocker pull ocrd/olena # Installs ocrd_olena```Now you can [test your installation](#testing-the-docker-installation).### Native installationInstalling each module into your system natively requires you to know and install all its _dependencies_ first.That can be _system packages_ (or even system package repositories) or _Python packages_.To learn about system dependencies, consult the module's README files. In contrast, Python dependencies shouldbe resolved automatically by using the Python package manager `pip`.> **NOTE**>> ocrd_tesserocr requires **tesseract-ocr >= 4.1.0**. But the Tesseract packages> bundled with **Ubuntu please enable [Alexander Pozdnyakov PPA](https://launchpad.net/~alex-p/+archive/ubuntu/tesseract-ocr),> which has up-to-date builds of tesseract and its dependencies:>> ```sh> sudo add-apt-repository ppa:alex-p/tesseract-ocr> sudo apt-get update> ```Next subsections:- For Python you also first need [virtualenv](#virtualenv). Then you have two options:- installing [via PyPI](#from-pypi) or- installing [via local git clone](#from-git).#### virtualenv* **Always install python modules into a virtualenv*** **Never run `pip`/`pip3` as root**First install Python 3 and `venv`:```shsudo apt install python3 python3-venv``````sh# If you haven't created the venv yet:python3 -m venv ~/venv# Activate the venvsource ~/venv/bin/activate```Once you have activated the virtualenv, you should see `(venv)` prepended toyour shell prompt.#### From PyPIThis is the best option if you want to use the stable, released version of individual modules.However, many modules require a number of non-Python (system) packages. For theexact list of packages you need to look at the README of the module inquestion. (If you are not on Ubuntu >= 18.04, then your requirements maydeviate from that.)For example to install `ocrd_tesserocr` from PyPI:```shsudo apt-get install git python3 python3-pip python3-venv libtesseract-dev libleptonica-dev tesseract-ocr-eng tesseract-ocr wgetpip3 install ocrd_tesserocr```Many ocrd modules conventionally contain a Makefile with a `deps-ubuntu` target that can handle calls to `apt-get` for you:```shsudo make deps-ubuntu```Now you can [test your installation](#testing-the-native-installation).#### From gitThis is the best option if you want to change the source code or install the latest, unpublished changes.```shgit clone https://github.com/OCR-D/ocrd_tesserocrcd ocrd_tesserocrsudo make deps-ubuntu # or manually with apt-getmake deps # or pip3 install -r requirementsmake install # or pip3 install .```If you intend to develop a module, it is best to install the module editable:```shpip install -e .```This way, you won't have to reinstall after making changes.Now you can [test your installation](#testing-the-native-installation).## Installing modelsSeveral processors in OCR-D need pretrained models you have to install beforehand.Please consult our [instruction on models](/en/models) to get more information on how to download and install them.", + "content" : "# OCR-D setup guideOCR-D's software is a modular collection of many projects (called _modules_)with many tools per module (called _processors_) that you can combine freelyto achieve the workflow best suited for OCRing your content.## System requirementsMinimum system requirements 8 GB RAM (more recommended) - The more RAM is available, the more concurrent processes can be run - Exceedingly large images (newspapers, folio-size books...) require a lot of RAM 20 GB free disk space for local installation (more recommended) - How much disk space is needed depends mainly on the individual purposes of the installation. In addition to the installation itself you will need space for various pretrained models, training and evaluation data for training, and data to process. Python 3.7 - OCR-D's target Python version is currently Python 3.7. Python 3.8 also works. Python - Python 3.9 and newer versions are not yet fully supported, since there are no pre-built Python packages for Tensorflow 2.5 and Operating system: Ubuntu 18.04 (or Docker) - For installation on Windows 10 (WSL) and macOS see the setup guides in the [OCR-D-Wiki](https://github.com/OCR-D/ocrd-website/wiki). - Ubuntu 18.04 is our target platform because it was the most up-to-date Ubuntu LTS release when we started developing and will be supported for the foreseeable future - Ubuntu 22.04 is now (2022) the current Ubuntu LTS, seems to work, too, and will be our next target platform. - Other Linux distributions or Ubuntu versions can also be used, though some instructions have to be adapted (e.g. package management, locations of some files) - With Windows Subsystem for Linux (WSL), a feature of Windows 10, it is also possible to set up an Ubuntu 18.04 installation within Microsoft Windows - OCR-D can be deployed on an Apple MacOSX machine using Homebrew## Installation### ocrd_all`ocrd_all` is the main way to distribute and install the OCR-D software.If you want to produce OCR output from image data,this is what you need. Tell me more about ocrd_allThe [`ocrd_all`](https://github.com/OCR-D/ocrd_all) project is an effort by theOCR-D community, now maintained by the OCR-D coordination team. It streamlinesthe native installation of OCR-D modules with a versatile Makefile approach.Besides allowing native installation of the full OCR-D stack (or any subset),it is also the base for the [`ocrd/all`](https://hub.docker.com/r/ocrd/all)Docker images available from DockerHub that contain the full stack (or certain subsets)of OCR-D modules ready for deployment.Technically, [`ocrd_all`](https://github.com/OCR-D/ocrd_all) is a Git repositorythat keeps all the necessary software as Git submodules at specific revisions.This way, the software tools are known to be at a stable version and guaranteed tobe interoperable with one another. ### Installation: Docker or NativeThere are two methods to install OCR-D: 1. **[Docker Installation of OCR-D](#ocrd_all-via-docker)** using the prebuilt `ocrd/all` [Docker images](https://hub.docker.com/r/ocrd/all) to install a module collection (**recommended**) 2. **[Native Installation of OCR-D](#ocrd_all-natively)** using the `ocrd_all` [git repository](https://github.com/OCR-D/ocrd_all) to install selected modules nativelyWe recommend using the prebuilt Docker images, since this does not require any changes tothe host system besides [installing Docker](https://hub.docker.com/r/ocrd/all). Installation of individual OCR-D modulesSometimes it can be useful to [install the modules individually](#individual-installation-experts-only), either via Docker or natively.Beware that we do not recommend installing modules individually, as it can be difficult to catch all dependencies,keep the software versions up-to-date and ensure that all components are at a usable and interoperable state. ## ocrd_all via Docker### PrerequisitesIf you want to use the OCR-D-via-Docker solution, [docker](https://docs.docker.com/install/linux/docker-ce/ubuntu/#install-using-the-repository) and [docker compose](https://docs.docker.com/compose/install/) have to be installed.After installing docker you may have to set up and start the docker daemon and add your user to the `docker` group:```sh# Start docker daemon at startupsudo systemctl enable docker# Add user to group 'docker'sudo usermod -aG docker $USER``` Please log out and log in again.To test access to docker try the following command:```shdocker images```Now you should see an (empty) list of available images.### mini medi maxiThere are three versions of the[`ocrd/all`](https://hub.docker.com/r/ocrd/all) Docker image:`minimum`, `medium` and `maximum`. They differ in which modules are includedand hence the size of the image:* `minimum` is comprised of the essential OCR-D components, with Tesseract and OCRopus as OCR engines.* `medium` adds the Calamari OCR engine, as well as extra segmentation, pre- and postprocessing options.* `maximum` includes all modules for best performance and full flexibility, but requires the most disk space.We encourage the use of the relatively large but complete `maximum` image.The `minimum` or `medium` images should only be used when certain that none but the included OCR-Dmodules are needed. Click here for a table showing the modules included in each version| Module | `minimum` | `medium` | `maximum` || ----- | ---- | ---- | ---- || core | ☑ | ☑ | ☑ || ocrd_cis | ☑ | ☑ | ☑ || ocrd_fileformat | ☑ | ☑ | ☑ || ocrd_im6convert | ☑ | ☑ | ☑ || ocrd_pagetopdf | ☑ | ☑ | ☑ || ocrd_repair_inconsistencies | ☑ | ☑ | ☑ || ocrd_tesserocr | ☑ | ☑ | ☑ || ocrd_wrap | ☑ | ☑ | ☑ || tesserocr | ☑ | ☑ | ☑ || workflow-configuration | ☑ | ☑ | ☑ || cor-asv-ann | - | ☑ | ☑ || dinglehopper | - | ☑ | ☑ || docstruct | - | ☑ | ☑ || format-converters | - | ☑ | ☑ || ocrd_calamari | - | ☑ | ☑ || ocrd_keraslm | - | ☑ | ☑ || ocrd_olahd_client | ☑ | ☑ | ☑ || ocrd_olena | - | ☑ | ☑ || ocrd_segment | - | ☑ | ☑ || tesseract | - | ☑ | ☑ || ocrd_neat | - | ☑ | ☑ || ocrd_anybaseocr | - | - | ☑ || ocrd_detectron2 | - | - | ☑ || ocrd_doxa | - | - | ☑ || ocrd_kraken | - | - | ☑ || ocrd_ocropy | - | - | - || ocrd_pc_segmentation | - | - | - || ocrd_typegroups_classifier | - | - | ☑ || sbb_binarization | - | - | ☑ || cor-asv-fst | - | - | - |### Fetch Docker imageTo fetch the `maximum` version of the `ocrd/all` Docker image:(replace `maximum` accordingly if you want the `minimum` or `medium` version)```shdocker pull ocrd/all:maximum``` Docker and git imagesIf you want to keep the modules' git repos inside the Docker images – so you can keep makingfast updates, without waiting for a new pre-built image, but also without building an image yourself –then add the suffix `-git` to the image version, e.g. `maximum-git`. This will behave like the native installation,only inside the container. Yes, you can also [commit changes](https://rollout.io/blog/using-docker-commit-to-create-and-change-an-image/)made in containers back to your local Docker image.) ### Testing the Docker installationTo start, download and extract a document from the [OCR-D GT Repo](https://ola-hd.ocr-d.de/search?q=&fulltextsearch=false&metadatasearch=false&isGT=true&perPageRecords=30):```shwget "https://ola-hd.ocr-d.de/api/export?id=21.11156/BFBAD520-65F4-430A-B4B2-C81A296C9E09&internalId=false" -O wundt_grundriss_1896.ocrd.zipunzip wundt_grundriss_1896.ocrd.zipcd data```Now, spin up the docker container:```shdocker run --user $(id -u) --workdir /data --volume $PWD:/data --rm -it ocrd/all bash```Your command line should start with something similar to:```shI have no name!@ade9a4692fcd:/data$```After spinning up the container, you can use the installation and call the processors the same way as in the native installation.Alternatively, you can [translate each command to a docker call](/en/user_guide#translating-native-commands-to-docker-calls).Let's segment the images in file group `OCR-D-IMG` from the zip file into regions, thereby creating aMETS file group `OCR-D-SEG-BLOCK-DOCKER`):```shocrd-tesserocr-segment-region -I OCR-D-IMG -O OCR-D-SEG-BLOCK-DOCKER```When you are finished using OCR-D commands, use this command to stop using docker interactively:```shexit```### Updating Docker imageTo update the Docker image to the latest version, just run the `docker pull` command:(replace `maximum` accordingly if you use the `minimum` or `medium` version)```shdocker pull ocrd/all:maximum```### Further readingWe recommend jumping to the [section about installing models at the bottom of this page](#installing-models) next.Alternatively, for instructions on how to proceed further with the processing of your data, please see the [user guide](/en/user_guide). Make sure to also read [the notes on translating native command line calls to docker calls](/en/user_guide#translating-native-commands-to-docker-calls).## ocrd_all nativelyThe `ocrd_all` project contains a sophisticated Makefile to install or compileprerequisites as necessary, set up a virtualenv, install the core software,install OCR-D modules and more. Detailed documentation [can be found in itsREADME](https://github.com/OCR-D/ocrd_all).### InstallationThere are some [system requirements](https://github.com/OCR-D/ocrd_all#system-packages) for ocrd_all.You need to have `make` installed to make use of `ocrd_all`:```shsudo apt install make```Clone the repository (still without submodules) and change into the `ocrd_all` directory:```shgit clone https://github.com/OCR-D/ocrd_allcd ocrd_all```You should now be in a directory called `ocrd_all`.It is easiest to install all the possible system requirements by calling `make deps-ubuntu` as root:```shsudo make deps-ubuntu```This will install all system requirements.Now you are ready for the final step which will actually install the OCR-D-Software.You can either install 1. all the software at once with the `all` target (equivalent to the [`maximum` Docker version](#mini-medi-maxi)), 2. modules individually by using an executable from that module as the target, or 3. a subset of modules by listing the project names in the `OCRD_MODULES` variable (equivalent to a custom selection of the [`medium` Docker version](#mini-medi-maxi)):```shmake all # Installs all the software (recommended)make ocrd-tesserocr-binarize # Install ocrd_tesserocr which contains ocrd-tesserocr-binarizemake ocrd-cis-ocropy-binarize # Install ocrd_cis which contains ocrd-cis-ocropy-binarizemake all OCRD_MODULES="core ocrd_tesserocr ocrd_cis" # Will install only ocrd_tesserocr and ocrd_cis```(Custom choices for `OCRD_MODULES` and other control variables (cf. `make help`) can also be made permanent by writing them into `local.mk`.)**Note:** Never run `make all` as root unless you know *exactly* what you are doing!Installation is incremental, i.e. failed/interrupted attempts can be continued, and modules can be installed one at a time as needed.Running `make` will also take care of cloning and updating all required submodules.Especially running `make all` will take a while (between 30 and 60 minutes or more on slower machines). In the end, it should say that the last processor was installed successfully.Having installed `ocrd_all` successfully, `ocrd --version` should give you the current version of [OCR-D/core](https://github.com/OCR-D/core).Activate the virtual Python environment, which was created in the directory `venv`, before running any OCR-D command.```shsource venv/bin/activateocrd --versionocrd, version 2.13.2 # your version should be 2.13.2 or later```### Testing the native installationFor example, let's fetch a document from the [OCR-D GT Repo](https://ocr-d-repo.scc.kit.edu/api/v1/metastore/bagit/):```shwget "https://ola-hd.ocr-d.de/api/export?id=21.11156/BFBAD520-65F4-430A-B4B2-C81A296C9E09&internalId=false" -O wundt_grundriss_1896.ocrd.zipsudo unzip wundt_grundriss_1896.ocrd.zipcd data```If you haven't done it already, activate your venv:```sh# Activate the venvsource /path/to/ocrd_all/venv/bin/activate```Let's segment the images in file group `OCR-D-IMG` from the zip file into regions (creating afirst [PAGE-XML](https://github.com/PRImA-Research-Lab/PAGE-XML) file group`OCR-D-SEG-BLOCK`):```shocrd-tesserocr-segment-region -I OCR-D-IMG -O OCR-D-SEG-BLOCK```### Updating the softwareAs `ocrd_all` is in [activedevelopment](https://github.com/OCR-D/ocrd_all/commits/master), it is wise toregularly update the repository and its submodules:```shgit pull```This will refresh the local clone of ocrd_all with the changes in the official ocrd_all GitHub repository.Now you can install the changes with```shmake all```This will run the installation process for all submodules which have been changed. In the end, it shouldsay that the last processor was installed successfully. `--version` for the processors which have been changedshould give you its current version.### Further readingWe recommend jumping to the [section about installing models at the bottom of this page](#installing-models) next.For instructions on how to process your own data, please see the [user guide](/en/user_guide).## Individual installation (experts only)For developing purposes it might be useful to install modules individually, either with Docker or natively.With all variants of individual module installation, it will be up to you tokeep the repositories up-to-date and installed. We therefore discourageindividual installation of modules and recommend using ocrd_all as outlined above..All [OCR-D modules](https://github.com/topics/ocr-d) follow the same[interface](https://ocr-d.github.io/cli) and common design patterns. So onceyou understand how to install and use one project, you know how to install anduse all of them.### Individual Docker containerThis is the best option if you want full control over which modules youactually intend to use while still profiting from the simple installation ofDocker containers.You need to have [Docker](https://docs.docker.com/install/linux/docker-ce/ubuntu/)Many OCR-D modules are also [published as Docker containers to DockerHub](https://hub.docker.com/u/ocrd). To find the Dockerimage for a module, replace the `ocrd_` prefix with `ocrd/`:```shdocker pull ocrd/tesserocr # Installs ocrd_tesserocrdocker pull ocrd/olena # Installs ocrd_olena```Now you can [test your installation](#testing-the-docker-installation).### Native installationInstalling each module into your system natively requires you to know and install all its _dependencies_ first.That can be _system packages_ (or even system package repositories) or _Python packages_.To learn about system dependencies, consult the module's README files. In contrast, Python dependencies shouldbe resolved automatically by using the Python package manager `pip`.> **NOTE**>> ocrd_tesserocr requires **tesseract-ocr >= 4.1.0**. But the Tesseract packages> bundled with **Ubuntu please enable [Alexander Pozdnyakov PPA](https://launchpad.net/~alex-p/+archive/ubuntu/tesseract-ocr),> which has up-to-date builds of tesseract and its dependencies:>> ```sh> sudo add-apt-repository ppa:alex-p/tesseract-ocr> sudo apt-get update> ```Next subsections:- For Python you also first need [virtualenv](#virtualenv). Then you have two options:- installing [via PyPI](#from-pypi) or- installing [via local git clone](#from-git).#### virtualenv* **Always install python modules into a virtualenv*** **Never run `pip`/`pip3` as root**First install Python 3 and `venv`:```shsudo apt install python3 python3-venv``````sh# If you haven't created the venv yet:python3 -m venv ~/venv# Activate the venvsource ~/venv/bin/activate```Once you have activated the virtualenv, you should see `(venv)` prepended toyour shell prompt.#### From PyPIThis is the best option if you want to use the stable, released version of individual modules.However, many modules require a number of non-Python (system) packages. For theexact list of packages you need to look at the README of the module inquestion. (If you are not on Ubuntu >= 18.04, then your requirements maydeviate from that.)For example to install `ocrd_tesserocr` from PyPI:```shsudo apt-get install git python3 python3-pip python3-venv libtesseract-dev libleptonica-dev tesseract-ocr-eng tesseract-ocr wgetpip3 install ocrd_tesserocr```Many ocrd modules conventionally contain a Makefile with a `deps-ubuntu` target that can handle calls to `apt-get` for you:```shsudo make deps-ubuntu```Now you can [test your installation](#testing-the-native-installation).#### From gitThis is the best option if you want to change the source code or install the latest, unpublished changes.```shgit clone https://github.com/OCR-D/ocrd_tesserocrcd ocrd_tesserocrsudo make deps-ubuntu # or manually with apt-getmake deps # or pip3 install -r requirementsmake install # or pip3 install .```If you intend to develop a module, it is best to install the module editable:```shpip install -e .```This way, you won't have to reinstall after making changes.Now you can [test your installation](#testing-the-native-installation).## Installing modelsSeveral processors in OCR-D need pretrained models you have to install beforehand.Please consult our [instruction on models](/en/models) to get more information on how to download and install them.", "url": " /en/setup.html" },