GitHub

This repository produces the results of the feature rich classifier from the paper:

Nathan Schneider, Jena D. Hwang, Vivek Srikumar, Jakob Prange, Austin Blodgett, Sarah R. Moeller, Aviram Stern, Adi Bitan, and Omri Abend. Comprehensive supersense disambiguation of English prepositions and possessives. Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics, Melbourne, Australia, July 15–20, 2018. http://people.cs.georgetown.edu/nschneid/p/pssdisambig.pdf

(Other resources from the paper available at: https://github.com/nert-gu/streusle/blob/master/ACL2018.md)

Getting started

To run this code, you will need the following installed on your computer:

Scala and SBT
Liblinear

If you are on a Mac, you can use home brew to install these dependencies using brew install scala sbt liblinear.

Next, clone this repository. Included as a git submodule is the supersense dataset that was released with the paper. In addition to the train/dev/test data splits, that repository also contains the official evaluation scripts for the data.

After cloning the repository, compile everything using sbt compile on the terminal. Now, you can train the classifiers.

Instructions for training classifiers

The pipeline for training scene role and function classifiers is:

Preprocess the data
Extract features into liblinear compatible files
Train the scene role and function classifiers
Construct the output in the json format for evaluation using the official script.

Other than step 3, these steps use the scala code in this repository and can be invoked using scripts/run.sh. This script provides an entry point to three sub-commands extract, preprocess and write-json which corresponds to steps 1, 2 and 4 respectively.

run.sh preprocess reads the streusle data and runs preprocessing tools on it.
run.sh extract performs feature extraction so that we can train the classifiers. It also saves enough metadata so that we can reconstruct json files for evaluation.
run.sh write-json converts the liblinear prediction files into json formatted output for evaluation.

Each of these sub-commands take arguments involving the location of the data files. Running the sub-command without any arguments will describe required arguments.

Running experiments

scripts/experiment.sh is a single bash script that performs all the steps described above and keeps track of the intermediate files that are generated along the way. This script will write the data files, models and predicted supersenses to experiments/output.

The paper compares the impact of automatically identified prepositions and predicted parse trees. To train classifiers in these four settings use:

scripts/experiment.sh --goldid --goldsyn
scripts/experiment.sh --goldid --autosyn
scripts/experiment.sh --autoid --goldsyn
scripts/experiment.sh --autoid --autosyn

After running these four commands in a row, if all goes well, you will find four json files in experiments/output. Each file contains predictions for that setting.

If you want to modify the training data to produce learning curves, etc, you will have to edit the TRAIN_JSON variable in the experiment script.

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
data		data
lib		lib
project		project
scripts		scripts
src/main/scala/edu/utah/cs/learnlab/supersensesv2		src/main/scala/edu/utah/cs/learnlab/supersensesv2
.gitignore		.gitignore
.gitmodules		.gitmodules
LICENSE		LICENSE
build.sbt		build.sbt
readme.md		readme.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Getting started

Instructions for training classifiers

Running experiments

About

Releases

Packages

Contributors 2

Languages

License

utahnlp/supersenses

Folders and files

Latest commit

History

Repository files navigation

Getting started

Instructions for training classifiers

Running experiments

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages