Neural language models with latent syntax

Code for my thesis, written in Python 3.6 using dynet.

The thesis can be found here.

Setup

Use make to obtain the data and install EVALB:

make data    # download ptb and unlabeled data
make evalb   # install EVALB

Usage

Use make to train a number of standard models:

make disc             # train discriminative rnng
make gen              # train generative rnng
make crf              # train crf
make fully-unsup-crf  # train rnng + crf (vi) fully unsupervised

You can list all the options with:

make list

Alternatively, you can use command line arguments:

python src/main.py train --model-type=disc-rnng --model-path-base=models/disc-rnng

For all available options use:

python src/main.py --help

To set the environment variables used in evaluation of trained models, e.g. CRF_PATH=models/crf_dev=90.01, use:

source scripts/best-models.sh

Models

Models are saved to folder models with their name and development scores. We have included our best models by development score as zip. To use them run unzip zipped/file.zip from the models directory.

Acknowledgements

I have relied on some excellent implementations for inspiration and help with my own implementation:

pytorch-rnng inspired the representation of the RNNG parser class
minimal-span-parser provided the foundations of the tree classes, the vocabulary class and of the CRF parser
im2latex inspired the use of a makefile to organize experiments

Make sure to check them out!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Neural language models with latent syntax

Setup

Usage

Models

Acknowledgements

Files

README.md

Latest commit

History

README.md

File metadata and controls

Neural language models with latent syntax

Setup

Usage

Models

Acknowledgements