This repository allows you to build computational models that can tell human and machine-generated handwriting apart.
You can train the models from scratch with the following datasets:
-
$1-GDS (5280 unistroke gestures, 16 classes): human data and synthetic data.
-
$N-MMG (9600 unistroke and multistroke gestures, 16 classes): human data and synthetic data.
-
Chars74k (3410 unistroke and multistroke gestures, 62 classes): human data and synthetic data.
You can try other datasets as long as you follow the expected CSV format.
This is how we trained our GRU classifier over the $1-GDS dataset:
~$ python3 main.py --human_dir csv-1dollar-human --synth_dir csv-1dollar-synth-best \
--model_type gru --epochs 400 --patience 40 --batch_size 32 --activation tanh
When this process ends, a directory will be created (in /tmp by default) with the log history and the trained model file in h5 format.
There are many CLI options you might want to specify,
such as --verbose 1
(to see more output info) or --out_dir somedir
(to set the output directory).
To see all the available CLI options, run python3 main.py -h
.
If you already trained a model, you can evaluate it this way:
~$ python3 main.py --human_dir csv-1dollar-human --synth_dir csv-1dollar-synth-best \
--eval_model path/to/model.h5 --model_type gru
Again, run python3 main.py -h
to see all the available CLI options.
A preprint of our ICPR paper is publicly available: https://arxiv.org/abs/2010.13231
Please cite us using the following reference:
- L. A. Leiva, M. Diaz, M. A. Ferrer, R. Plamondon. Human or Machine? It Is Not What You Write, But How You Write It. Proc. ICPR, 2020.
@InProceedings{Leiva20_biometrics,
author = {Luis A. Leiva and Moises Diaz and Miguel A. Ferrer and Réjean Plamondon},
title = {Human or Machine? It Is Not What You Write, But How You Write It},
booktitle = {Proc. ICPR},
year = {2020},
}