PyTorch implementation of Chaudhary et al. 2020's TopicBERT
Install conda if you have not already done so. Then run
conda env create -f environment.yml
This will create a Python environment that strictly adheres to the versioning indicated in the project proposal. It is intended to closely mirror Google Colab.
Then train the model via main.py
. There are many options that can be set, run python main.py -h
to see more.
One particularly helpful option is -s PATH
or --save PATH
, which saves the given options as a JSON file that
can easily be used again with --load PATH
.
Sample config.json
:
{
"dataset": "reuters8",
"label_path": ".../labels.txt",
"train_dataset_path": ".../training.tsv",
"val_dataset_path": ".../validation.tsv",
"test_dataset_path": ".../test.tsv",
"num_workers": 8,
"batch_size": 16,
"warmup_steps": 10,
"lr": 2e-05,
"alpha": 0.9,
"num_epochs": 2,
"clip": 1.0,
"seed": 42,
"device": "cuda",
"val_freq": 0.0,
"test_freq": 0.0,
"disable_tensorboard": false,
"tensorboard_dir": "runs/topicbert-512",
// directory where checkpoints should be
"resume": ".../checkpoints/",
// whether to look for a checkpoint in above or just save a new one there
"save_checkpoint_only": true,
"verbose": true,
"silent": false,
"load": null,
"save": "config.json"
}
Alternatively, open experiment.ipynb
in Google Colab:
- Have working BERT on some dataset (SST-2)
- Completed on 4/8/21, Liam
- Reuters8 Dataset & DataLoader set up
- Dataset & DataLoader done on 4/9/21, Liam
- BERT doing standalone prediction on Reuters8
- Done — achieves 99.5% train, 98.0% val accuracy run on Google Colab, 4/10/21, Liam
- Set up NVDM topic model on some dataset
- NVDM working on Reuters8
- Done — error behaves as expected when training, needs further analysis, 4/18/21, Liam
- Create joint model (TopicBERT)
- Coding complete, 4/19/21, Liam
- Achieve near baselines with TopicBERT
- We achieve 0.96 F1 score on Reuters8 with TopicBERT-512, outperforming the original paper marginally. See differences section for potental factors.
- Done, 4/19/21, Liam
- Move from Jupyter to Python modules
- All "modules" converted, 4/25/21, Liam.
training
package andmain.py
complete, 4/26/21, Liam.
- Measure performance baselines
- All baselines finalized, 5/3/21, Liam.
Happy to report that the model has performance (runtime & accuracy) characteristics as expected!
Non-modification Extensions Pursued:
- Pre-train VAE.
- Implemented HR-VAE as comptatible model with TopicBERT. Currently have ability for the TopicBERT main script to pre-train an HR-VAE model on a dataset. 5/8/21, Liam.
More Extension Ideas:
- Test new datasets in topic classification
- Test datasets in a different domain (e.g. NLI, GLUE)
This section maintains a (non-definitive) list of differences between the original implementation and this repository's code.
-
F_MIN
set to10
on Reuters8 dataset yields a vocab size ofK = 4832
rather thanK = 4813
reported in the original paper, despite following the same text-cleaning guidelines. We assume this will not significantly affect results. -
F_MIN
set to100
on the IMDB dataset yields a vocab size ofK = 7358
rather thanK = 6823
reported in the original paper, despite following the same text-cleaning guidelines. We assume this will not significantly affect results. - We use a size 1k validation set for IMDB (24k train), whereas the originaal authors used a 5k validation set.
- The original authors use
bert-base-cased
. As all data is lowercased across datasets in the original experiments, we change this tobert-base-uncased
. - Labels are encoded one-hot. We use
torch.max(...)[1]
to extract prediction & label indices. These indices can be converted back and forth with label strings viadataset.label_mapping[index]
anddataset.label_mapping[label_str]
. - NVDM in the original paper uses
tanh
activation for multiliayer perceptron in NVDM. However, the author's TensorFlow implementation usessigmoid
. We useGELU
, as the NVDM paper (Miao et al. 2016) uses this as well. - TopicBERT as described in the paper has a projection layer consisting of a single matrix
$\mathbf{P} \in \mathbf{R}^{\hat{H} \times H_B}$ . We addGELU
activation after$\mathbf{P}$ . The original author's TensorFlow implementation utilizes atf.keras.layers.Dense
layer, which adds a bias vector andGELU
activation after$\mathbf{P}$ .