GitHub - pblouw/pysem: A suite of tools for doing NLP with distributed representations

https://codecov.io/gh/pblouw/pysem/branch/master/graph/badge.svg?token=FdhAk7v0S0

PySem: Natural Language Semantics in Python

A suite of tools for doing NLP with distributed representations, with a specific focus on natural language inference tasks such as question answering and recognizing textual entailment. These tools range from implementations of existing algorithms and techniques, to novel algorithms for generating sentences that follow from a given sentence.

Features:

Text Corpora: Wikipedia article and sentence streaming, with options for preprocessing and caching. Tools for using the Stanford Natural Language Inference corpus for recognizing textual entailment, along with the Sentences Involving Compositional Knowledge corpus.
Word Embeddings: a random indexing implementation with options for encoding information concerning word order and dependency syntax. Fully parallelized with Python's multiprocessing module.
Standard ML Tools: logistic regression, multilayer perceptron.
Neural Networks: standard RNNs and TreeRNNs (i.e. recursive neural networks), along with a TreeRNN variant that learns to encode a "Holographic Reduced Representation" (Plate 2003) of an input sentence. WIP LSTM extensions of these models.
Generative Modelling: experimental models that (a) learn the weights in a "decoding" TreeRNN to generate an embedding for each node in the tree, and (b) learn weights for a TreeRNN that generates both structure and and content simultaneously.

Examples:

In the examples directory, there are Jupyter notebooks illustrating the creation of:

unsupervised models that learn word embeddings from Wikipedia text
classification models for predicting inferential relations betweeen sentences
generative models for generating sentences that are entailed by a given sentence.

Note that to you will need to download a dump of wikipedia articles and preprocess them using this tool. You will also need to change the path in the notebook to point to where you have saved this dump locally. For the generative modelling example, the pretrained model parameters used in the notebook can be found here.

Installation

Pysem requires Python 3.5, mostly to support effective multiprocessing. For installation, it is easist to use the Anaconda Python distribution to create a conda environment as follows. Run these commands from inside the cloned repository:

conda env create
source activate pysem
python -m spacy.en.download
python -m nltk.downloader stopwords punkt
python setup.py develop

The first command creates an environment called pysem that includes all the needed dependencies, while the second command activates it. The next commands download data for doing parsing and tokenizing. The final command install this library in the environment. You can verify that the installation was successful with the following command:

py.test

To leave the environment and (optionally) delete it, do the following:

source deactivate
conda env remove -n pysem

Testing:

All of the machine learning and neural network models are tested with comprehensive gradient checks to ensure that they are implemented correctly. The library is currently only tested on Python 3.5

Name		Name	Last commit message	Last commit date
Latest commit History 352 Commits
examples		examples
pysem		pysem
.gitignore		.gitignore
.travis.yml		.travis.yml
README.rst		README.rst
codecov.yml		codecov.yml
environment.yml		environment.yml
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

PySem: Natural Language Semantics in Python

Features:

Examples:

Installation

Testing:

About

Releases

Packages

Languages

pblouw/pysem

Folders and files

Latest commit

History

Repository files navigation

PySem: Natural Language Semantics in Python

Features:

Examples:

Installation

Testing:

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages