From 5a9d7ba1f697f47d452faf937b7f867381512e54 Mon Sep 17 00:00:00 2001
From: zdenop <zdenop@gmail.com>
Date: Wed, 27 Mar 2024 17:52:05 +0100
Subject: [PATCH] Update README.md

replace `frk` with `deu_latf`; improve wording and grammar
---
 README.md | 26 +++++++++++++-------------
 1 file changed, 13 insertions(+), 13 deletions(-)

diff --git a/README.md b/README.md
index e1554740..2cb0ad00 100644
--- a/README.md
+++ b/README.md
@@ -20,7 +20,7 @@ and more can be found in the [Tesseract User Manual](https://tesseract-ocr.githu
 
   1. Install the latest tesseract (e.g. from https://digi.bib.uni-mannheim.de/tesseract/), and make sure that tesseract is added to your PATH.
   2. Install [Python 3](https://www.python.org/downloads/)
-  3. Install [Git SCM to Windows](https://gitforwindows.org/) - it provides a lot of linux utilities on Windows (e.g. `find`, `unzip`, `rm`) and put `C:\Program Files\Git\usr\bin` to the beginning of your PATH variable (temporarily you can do it in `cmd` with `set PATH=C:\Program Files\Git\usr\bin;%PATH%` - unfortunately there are several Windows tools with the same name as on linux (`find`, `sort`) with different behaviour/functionality and there is need to avoid them during training.
+  3. Install [Git SCM to Windows](https://gitforwindows.org/) - it provides a lot of linux utilities on Windows (e.g. `find`, `unzip`, `rm`) and put `C:\Program Files\Git\usr\bin` to the beginning of your PATH variable (temporarily you can do it in `cmd` with `set PATH=C:\Program Files\Git\usr\bin;%PATH%` - unfortunately there are several Windows tools with the same name as on linux (`find`, `sort`) with different behavior/functionality and there is need to avoid them during training.
   4. Install winget/[Windows Package Manager](https://github.com/microsoft/winget-cli/releases/) and then run `winget install GnuWin32.Make` and `winget install wget` to install missing tools.
 
 ### Python
@@ -36,18 +36,18 @@ To fetch them:
 
     make tesseract-langdata
 
-(This step is only needed once and already included implicitly in the `training` target,
-but you might want to run explicitly it in advance.)
+(While this step is only needed once and implicitly included in the `training` target,
+you might want to run it explicitly beforehand.)
 
 
-## Choose model name
+## Choose the model name
 
 Choose a name for your model. By convention, Tesseract stack models including
 language-specific resources use (lowercase) three-letter codes defined in
 [ISO 639](https://en.wikipedia.org/wiki/List_of_ISO_639-1_codes) with additional
 information separated by underscore. E.g., `chi_tra_vert` for **tra**ditional
 Chinese with **vert**ical typesetting. Language-independent (i.e. script-specific)
-models use the capitalized name of the script type as identifier. E.g.,
+models use the capitalized name of the script type as an identifier. E.g.,
 `Hangul_vert` for Hangul script with vertical typesetting. In the following,
 the model name is referenced by `MODEL_NAME`.
 
@@ -58,7 +58,7 @@ Place ground truth consisting of line images and transcriptions in the folder
 evaluation data, the ratio is defined by the `RATIO_TRAIN` variable.
 
 Images must be TIFF and have the extension `.tif` or PNG and have the
-extension `.png`, `.bin.png` or `.nrm.png`.
+extension `.png`, `.bin.png`, or `.nrm.png`.
 
 Transcriptions must be single-line plain text and have the same name as the
 line image but with the image extension replaced by `.gt.txt`.
@@ -79,7 +79,7 @@ Run
     make training MODEL_NAME=name-of-the-resulting-model
 
 
-which is basically a shortcut for
+which is a shortcut for
 
     make unicharset lists proto-model tesseract-langdata training
 
@@ -143,10 +143,10 @@ you are running tesstrain from a script or other makefile), then you can use the
 
 When the training is finished, it will write a `traineddata` file which can be used
 for text recognition with Tesseract. Note that this file does not include a
-dictionary. The `tesseract` executable therefore prints an warning.
+dictionary. The `tesseract` executable therefore prints a warning.
 
 It is also possible to create additional `traineddata` files from intermediate
-training results (the so called checkpoints). This can even be done while the
+training results (the so-called checkpoints). This can even be done while the
 training is still running. Example:
 
     # Add MODEL_NAME and OUTPUT_DIR like for the training.
@@ -166,12 +166,12 @@ It is also possible to create models for selected checkpoints only. Examples:
     # Make traineddata for all checkpoint files with CER better than 1 %.
     make traineddata CHECKPOINT_FILES="$(ls data/foo/checkpoints/*[^1-9]0.*.checkpoint)"
 
-Add `MODEL_NAME` and `OUTPUT_DIR` and replace `data/foo` by the output directory if needed.
+Add `MODEL_NAME` and `OUTPUT_DIR` and replace `data/foo` with the output directory if needed.
 
 ## Plotting CER (experimental)
 
-Training and Evaluation CER can be plotted using matplotlib. A couple of scripts are provided
-as a starting point in `plot` subdirectory for plotting of different training scenarios. The training
+Training and Evaluation CER can be plotted using Matplotlib. A couple of scripts are provided
+as a starting point in the `plot` subdirectory for plotting different training scenarios. The training
 log is expected to be saved in `plot/TESSTRAIN.LOG`.
 
 As an example, use the training data provided in 
@@ -179,7 +179,7 @@ As an example, use the training data provided in
 Plotting can be done while training is running also to depict the training status till then.
 ```
 unzip ocrd-testset.zip -d data/ocrd-ground-truth
-nohup make training MODEL_NAME=ocrd START_MODEL=frk TESSDATA=~/tessdata_best MAX_ITERATIONS=10000 > plot/TESSTRAIN.LOG &
+nohup make training MODEL_NAME=ocrd START_MODEL=deu_latf TESSDATA=~/tessdata_best MAX_ITERATIONS=10000 > plot/TESSTRAIN.LOG &
 ```
 ```
 cd ./plot