Skip to content

Latest commit

 

History

History
 
 

benchmarks

Benchmarks scripts

This folder contains scripts to reproduce the benchmark results reported in our documentation for the different tasks we cover.

The benchmark scripts evaluate models implemented in the danlp package but also models implemented in other frameworks. Therefore more installation of packages is needed to run some of the scripts. You can either look in the specific scripts to check which packages is needed, or you can install all the packages which is needed by pip install -r requirements_benchmark.txt.

For running the sentiment_benchmarks_twitter you need a twitter development account and setting the keys as environment variable.

List of current benchmark scripts

  • Benchmark script for word embeddings in wordembeddings_benchmarks.py

  • Benchmark script of Part of Speech tagging on Danish Dependency Treebank. SpaCy, Dacy, flair, polyglot and Stanza models are benchmarked pos_benchmarks.py

  • Benchmark script of Dependency Parsing on Danish Dependency Treebank. SpaCy, Dacy and Stanza models are benchmarked dependency_benchmarks.py

  • Benchmark script of Noun-phrase Chunking -- depending on the Dependency Parsing model -- on Danish Dependency Treebank. The (convertion of the dependencies given by the) spaCy model is benchmarked chunking_benchmarks.py

  • Benchmark script on the DaNE NER dataset in ner_benchmarks.py

  • Benchmark script for sentiment classification on LCC Sentiment and Europarl Sentiment using the tools AFINN and Sentida where the scores are converted to three class problem. It also includes benchmark of BERT Tone (polarity) sentiment_benchmarks.py

  • sentiment_benchmarks_twitter.py show evaluation on a small twitter dataset for both polarity and subjective/objective classification

  • Benchmark script for Hate Speech Detection on DKHate. A BERT and an ELECTRA model for identification of offensive language are benchmarked hatespeech_benchmarks.py