Categorizer (a PragueHacks 2016 project)

Simple classification engine for government/municipality documents built with TensorFlow Documents are tagged based on occurrence of certain words and other characteristics of a document.

This project is a prototype for

Built during Prague Hacks 2016

Setup

Requirements:

bash
python 2.7
numpy
tensorflow 0.10.0 (does not work with 0.11.0rc0 due to tensorflow/tensorflow#4715)

sudo pip install numpy
export TF_BINARY_URL=https://storage.googleapis.com/tensorflow/linux/cpu/tensorflow-0.10.0-cp27-none-linux_x86_64.whl
sudo pip install --upgrade $TF_BINARY_URL

Run

Prepare data:

copy tagged content files to ./input
copy feature vector to features.csv
export CATS=`cat cats.txt
bash generate-all.sh features.csv $CATS

Train DNN

python train.py $CATS

Run classification on new data

python predict.py features.csv $CATS output.csv

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
input		input
service		service
.gitignore		.gitignore
README.md		README.md
cats.txt		cats.txt
common.py		common.py
generate-all.sh		generate-all.sh
generate-for-category.sh		generate-for-category.sh
generate-x.sh		generate-x.sh
generate-y.sh		generate-y.sh
parse.py		parse.py
predict.py		predict.py
reproducer.py		reproducer.py
train.py		train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Categorizer (a PragueHacks 2016 project)

Setup

Run

About

Releases

Packages

Languages

jharting/praguehacks2016-categorizer

Folders and files

Latest commit

History

Repository files navigation

Categorizer (a PragueHacks 2016 project)

Setup

Run

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages