This folder contains scripts to reproduce the benchmark results reported in our documentation for the different tasks we cover.
The benchmark scripts evaluate models implemented in the danlp package but also models implemented in other frameworks.
Therefore more installation of packages is needed to run some of the scripts.
You can either look in the specific scripts to check which packages is needed, or you can install all the packages which is needed by pip install -r requirements_benchmark.txt
.
For running the sentiment_benchmarks_twitter
you need a twitter development account and setting the keys as environment variable.
-
Benchmark script for word embeddings in
wordembeddings_benchmarks.py
-
Benchmark script of Part of Speech tagging on Danish Dependency Treebank. SpaCy, Dacy, flair, polyglot and Stanza models are benchmarked
pos_benchmarks.py
-
Benchmark script of Dependency Parsing on Danish Dependency Treebank. SpaCy, Dacy and Stanza models are benchmarked
dependency_benchmarks.py
-
Benchmark script of Noun-phrase Chunking -- depending on the Dependency Parsing model -- on Danish Dependency Treebank. The (convertion of the dependencies given by the) spaCy model is benchmarked
chunking_benchmarks.py
-
Benchmark script on the DaNE NER dataset in
ner_benchmarks.py
-
Benchmark script for sentiment classification on LCC Sentiment and Europarl Sentiment using the tools AFINN and Sentida where the scores are converted to three class problem. It also includes benchmark of BERT Tone (polarity)
sentiment_benchmarks.py
-
sentiment_benchmarks_twitter.py
show evaluation on a small twitter dataset for both polarity and subjective/objective classification -
Benchmark script for Hate Speech Detection on DKHate. A BERT and an ELECTRA model for identification of offensive language are benchmarked
hatespeech_benchmarks.py