Handwritten Word Detector

A neural network based detector for handwritten words.

Run demo

Download trained model, and place the unzipped files into the model directory
Go to the src directory and execute python infer.py
This opens a window showing the words detected in the test images (located in data/test)
Required libs: torch, numpy, sklearn, cv2, path, matplotlib

Go to src and execute python train.py with the following parameters specified (only the first one is required):
- --data_dir: dataset directory containing a gt and an img directory
- --batch_size: 27 images per batch are possible on a 8GB GPU
- --caching: cache the dataset to avoid loading and decoding the png images, cache file is stored in the dataset directory
- --pretrained: initialize with saved model weights
- --val_freq: speed up training by only validating each n-th epoch
- --early_stopping: stop training after n validation steps without improvement
The model weights are saved every time the f1 score on the validation set increases
A log is written into the log directory, which can be opened with tensorboard
Executing python eval.py evaluates the trained model

The model classifies each pixel into one of three classes (see plot below):
- Inner part of a word (plot: red)
- Outer part of a word (plot: green)
- Background (plot: blue)
An axis-aligned bounding box is predicted for each inner-word pixel
DBSCAN clusters the predicted bounding boxes
The backbone of the neural network is based on the ResNet18 model (taken from torchvision, with modifications)
The model is inspired by the ideas of Zhou and Axler
See this article for more details

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
data/test		data/test
doc		doc
model		model
src		src
.gitignore		.gitignore
LICENSE.md		LICENSE.md
README.md		README.md