Simple classification engine for government/municipality documents built with TensorFlow Documents are tagged based on occurrence of certain words and other characteristics of a document.
This project is a prototype for
Built during Prague Hacks 2016
Requirements:
- bash
- python 2.7
- numpy
- tensorflow 0.10.0 (does not work with 0.11.0rc0 due to tensorflow/tensorflow#4715)
sudo pip install numpy
export TF_BINARY_URL=https://storage.googleapis.com/tensorflow/linux/cpu/tensorflow-0.10.0-cp27-none-linux_x86_64.whl
sudo pip install --upgrade $TF_BINARY_URL
Prepare data:
- copy tagged content files to ./input
- copy feature vector to features.csv
export CATS=`cat cats.txt
bash generate-all.sh features.csv $CATS
Train DNN
python train.py $CATS
Run classification on new data
python predict.py features.csv $CATS output.csv