SPO extractor using Stanza

Example sentences and universal dependencies

The encapsulation of rifampicin leads to a reduction of the Mycobacterium smegmatis inside macrophages.
nsubj (nominal subject) <= VERB => obl (oblique nominal)

The Norwalk virus is the prototype virus that causes epidemic gastroenteritis infecting predominantly older children and adults.
acl:relcl (adjectival clause) => VERB => obj (object)

It is widely agreed that the exposure to ambient air pollution may cause serious respiratory illnesses and that weather conditions may also contribute to the seriousness.
nsubj <= VERB => obj

In this report, ribavirin was shown to inhibit SARS coronavirus replication in five different cell types of animal or human origin at therapeutically achievable concentrations.
nsubj:pass <= xcomp => VERB => obj

Chronic hepatitis virus infection is a major cause of chronic hepatitis, cirrhosis, and hepatocellular carcinoma worldwide.
nsubj <= NOUN => nmod => conj
coordinating conjunctions

What is Stanza?

Stanza is a python wrapper for Stanford CoreNLP and PyTorch NLP models.
tregex for chunking
neural pipelien for dependency parsing

Algorithm

Given a sentence and a list of triggers,
Check if a trigger is fired.
If a trigger is fired, run a dependency parser and a chunker on the sentence.
Using the dependency relations of the trigger, identify head words.
Extract noun phrases by merging dependency relations and chunks based on the head words.

Usages

Download Stanford CoreNLP and an English models
Put the model jars in the distribution folder
Setting up environment variables

export CORENLP_HOME=/path/to/stanford-corenlp-full-2020-04-20
export DATA_DIR=/path/to/data/

How to install dependencies?

pip install -r requirements.txt

Download a model for a neural dependency parser

python -c 'import stanza; stanza.download("en")'

How to run tests?

cd /path/to/project-directory
pytest tests/test_SPOs.py

export DATA_DIR="$(pwd)/data/tests"
pytest tests/test_data_reader.py

How to get SPOs for example sentences

PYTHONPATH=. python tests/test_SPOs.py

How to extract SPO?

PYTHONPATH=. python bin/run_spo.py -i input_directory -o output_file

What is missing?

Biomedical Named Entity Recognisers can be used to improve NP chunking and to identify the roles on NPs.

Name		Name	Last commit message	Last commit date
Latest commit History 45 Commits
.github/workflows		.github/workflows
bin		bin
data/tests		data/tests
image		image
result		result
spo		spo
tests		tests
.gitignore		.gitignore
README.md		README.md
noxfile.py		noxfile.py
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SPO extractor using Stanza

Example sentences and universal dependencies

What is Stanza?

Algorithm

Usages

What is missing?

About

Releases

Packages

Languages

jeekim/spo

Folders and files

Latest commit

History

Repository files navigation

SPO extractor using Stanza

Example sentences and universal dependencies

What is Stanza?

Algorithm

Usages

What is missing?

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages