cd
into the WikiClassify2.0 base directory and run python main.py
to start the automated workflow in the following order:
- Download latest Wikipedia data dump bz2 archive
- Extract archive into .xml format
- Compile C++ parser files
- Parse .xml data, sending bursts to remove server at 1000 article increments
- Train word2vec and LDA models
After a model is present in the working directory, a subsequent call to python main.py
will open the interface created to interact with the models (including A* path search and A* convene functions). A call to python main.py
with a -c
launch parameter will clean the working directory of models and downloaded data.
Run python main.py
with a -g
launch parameter to open the user interface main menu.
Configure parser launch parameters.
Python 2.7
numpy
(pip install numpy)g++
gensim
(pip install gensim)sklearn
(pip install sklearn))PyQt4
(apt-get install python-qt4)psycopg2
(pip install psycopg2)
Both of the following packages can be install via the command line using package manager such as apt-get
on Ubuntu.
libpq-dev
(apt-get install libpq-dev)libpqxx-dev
(apt-get install libpqxx-dev)