quokka

Download data

Download files from these links and copy them to the data directory

Energy Hub

Energy Hub Training set - 

Energy Hub Validation set - 

Energy Hub Test set -

Reuters

Reuters Training set - 

Reuters Validation set - 

Retuers Test set -

Downloading Necessary Packages
- Download NLTK stopwords using
```
import nltk

nltk.download('stopwords')
```
- Download Mallet from here. Unzip and copy it to the directory.
  If you use Google Colab:
```
!wget http://mallet.cs.umass.edu/dist/mallet-2.0.8.zip
!unzip mallet-2.0.8.zip
```
- Download GloVe embeddings from here. Unzip and copy it to the directory.
  If you use Google Colab:
```
```
  !wget https://nlp.stanford.edu/data/wordvecs/glove.6B.zip
  !unzip glove*.zip
```
```
Build Topic-Entity Triples
This step involves
- Training a Topic Modeler over the corpus
- Extracting Named-Entities using spaCy
- Building Triples using Dependency parser and POS tagger
- Apply Topic Entity Filter over these triples
Run the following python file.

python data_preprocess.py <dataset>

Change <dataset> to "energy hub" or "reuters" to select the corpus.
Training Models
Run the following python file.

python train.py <dataset> <model>

Change <dataset> to "energy hub" or "reuters" to select the corpus.

Change <model> to the following options
- text - for GloVe based text model
- topics - To use topic distributions
- entites - To use Glove-enriched named entities
- triples - To use Glove-enriched triples
- text_topics - To use text and topic distributions
- text_triples - To use text(GloVe) and triples(GloVe)

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
.gitignore		.gitignore
README.md		README.md
TopicModelling_v3.py		TopicModelling_v3.py
TripleFormation.py		TripleFormation.py
classifier_models.py		classifier_models.py
data_preprocess.py		data_preprocess.py
infersent_embeddings.py		infersent_embeddings.py
models.py		models.py
train.py		train.py

Provide feedback