Skip to content

Document image classification on the Tobacco-3482 dataset using multi-modal CNNs

Notifications You must be signed in to change notification settings

mleimeister/document-image-classification

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Document image classification on the Tobacco-3482 dataset using multi-modal CNNs

This Colab notebook illustrates how to use multi-modal convolutional neural networks in Tensorflow to classifiy document images from the Tobacco-3482 dataset. The dataset contains single-page scans of 10 different types of business documents, as seen in the following examples:

Samples from the Tobacco-3482 dataset

The architecture fuses an image path based on an ImageNet-pretrained VGG16 network, and a text path using a Tf-Idf featurisation. The notebook illustrates how to create tf.data.Dataset pipelines for multi-modal (image + text) input, and how to use the TextVectorization layer from tf.keras.layers.experimental.preprocessing.

Network architecture

On a random train/test split, the given network achieves around 86% accuracy on the 10 classes of the Tobacco-3482 dataset. This is close to results of recent publications with similar or more complicated network architectures, such as [1] and [2].

Confusion matrix on the Tobacco-3482 test set

[1] Audebert et. al.: Multimodal deep networks for text and image-based document classification, ArXiv 2019 pdf

[2] Ferrando et. al.: Improving Accuracy and Speeding Up Document Image Classification Through Parallel Systems, ICCS 2020 link

About

Document image classification on the Tobacco-3482 dataset using multi-modal CNNs

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published