NewsReader

NewsReader is a natural language processing pipeline. Among others, it tags parts-of-speech, recognizes named entities and annotates entities with predicates.

There are a number of implementations of the NewsReader pipeline:

POAS: pipeline-on-a-stick.
cltl/nlpp: contains a script that constructs the pipeline (EN+NL) from components.
vmc-from-scratch: creating a VM with the Dutch version of NewsReader
newsreader-docker: a Docker image for setting up a NewsReader server.

At the moment, none of these implementations succesfully build the whole pipeline for Dutch (see issues tracker). We have therefore decided to build the pipeline from individual modules.

Modules

We have imported all modules from NewsReader under the heading "Dutch modules":

tokenization: Splits text into tokens (words / punctuation symbols) (wiki).
part-of-speech-tagging: tags words with grammar categories such as 'nouns' and 'verbs' (wiki).
named-entity-recognition: recognizes words as named entities such as 'Holland' (wiki).
named-entity-disambiguation: some names refer to multiple entities, this module selects the most likely one (wiki).
word-sense-disambiguation: selects the most likely meaning of individual words (wiki).
time-expression-recognition: recognizes temporal expressions, such as "last week" (wiki, Heideltime).
ontological-tagger: tags words with predicates, recognizes equivalent semantic frames and identifies events.
semantic-role-labeling: assigns roles to agents, such as 'murderer' and 'murdered' (wiki, additional-roles).
event-coreference: determines that two recognized events are actually referring to the same event (wiki).
opinion-miner: detects whether a statement contains an opinion.

These modules depend on the following software packages:

KafNafParserPy: a parser for KAF/NAF files in python.
vua-resources: a package with utility functions of the Computational Lexicology & Terminology Lab.
Alpino: a dependency parser for Dutch text.
dbpedia-spotlight: tool for annotating mentions of DBpedia resources (more info).
libsvm: library of support vector machines.
svmlight: library of support vector machines.
timbl: Tilburg Memory-Based Learner, containing classifiers for symbolic feature spaces.

Build

The goal is to construct a lightweight, portable pipeline, which we achieve through a Docker image. This image is available from Docker Hub and can be obtained by pulling:

docker pull evidence/newsreaderdutch

If you would like to make change and build the image yourself, call:

docker image build -t newsreaderdutch NewsReaderDutch/

from within the root of the repository.

Usage

The Docker container can be run directly on your text files by calling:

docker run -v /workspace/:/work/ newsreaderdutch /work/file.txt

where /workspace/ is your local directory containing files that need to be processed and file.txt is the document that you would like to get annotated. The output will have the same filename, but with a *.naf extension. Currently, the pipeline writes the output of each module separately as well.

Contact

Questions, comments and bugs can be submitted to the issues tracker.

Name		Name	Last commit message	Last commit date
Latest commit History 65 Commits
EventCoreference		EventCoreference
NewsReaderDutch		NewsReaderDutch
OntoTagger		OntoTagger
ixa-heideltime		ixa-heideltime
ixa-pipe-ned		ixa-pipe-ned
ixa-pipe-nerc		ixa-pipe-nerc
ixa-pipe-tok		ixa-pipe-tok
morphosyntactic_parser_nl		morphosyntactic_parser_nl
opinion_miner_deluxePP		opinion_miner_deluxePP
svm-wsd		svm-wsd
vua-srl-dutch-nominal-events		vua-srl-dutch-nominal-events
vua-srl-nl		vua-srl-nl
.gitattributes		.gitattributes
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
install_dependencies.sh		install_dependencies.sh
newsreader.sh		newsreader.sh
txt03.txt		txt03.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

NewsReader

Modules

Build

Usage

Contact

About

Releases

Packages

Contributors 3

Languages

License

ADAH-EviDENce/NewsReader

Folders and files

Latest commit

History

Repository files navigation

NewsReader

Modules

Build

Usage

Contact

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages