Skip to content

Releases: explosion/spacy-models

el_core_news_md-2.1.0

17 Mar 11:01
8113e54
Compare
Choose a tag to compare

Downloads

Details: https://spacy.io/models/el#el_core_news_md

File checksum: 7ae074564e262a954ae5c0be18949058eb7e10f421b5c917855589b4284cfee5

Greek pipeline with word vectors, POS tags, dependencies and named entities. Word vectors use Facebook's FastText Common Crawl vectors, pruned to a vocabulary of 20,000 items. Words outside the most frequent were mapped to the nearest neighbouring vector within the 20,000 rows retained. Syntax (dependencies and POS tags) trained from the Universal Dependencies conversion of the Greek Dependency Treebank (v2.2). Named entity annotations were created by Giannis Daras using Prodigy, using the OntoNotes 5 annotation schema.

Feature Description
Name el_core_news_md
Version 2.1.0
spaCy >=2.1.0
Model size 126 MB
Pipeline  tagger, parser, ner
Vectors 1999938 keys, 20000 unique vectors (300 dimensions)
Sources Common Crawl, Greek Dependency Treebank, Daras GSOC 2018
License CC BY-NC 4.0
Author Giannis Daras

Accuracy

Type Score
ENTS_F  81.06
ENTS_P  78.66
ENTS_R  83.61
LAS  85.00
TAGS_ACC  96.56
TOKEN_ACC  100.00
UAS  88.29

Installation

pip install spacy
spacy download el_core_news_md

de_core_news_sm-2.1.0

17 Mar 11:01
8113e54
Compare
Choose a tag to compare

Downloads

Details: https://spacy.io/models/de#de_core_news_sm

File checksum: 25f1140eef95dccb124ca74b28db83f27787c210bcf6a97db4269690146e443a

German multi-task CNN trained on the TIGER and WikiNER corpus. Assigns context-specific token vectors, POS tags, dependency parse and named entities. Supports identification of PER, LOC, ORG and MISC entities.

Feature Description
Name de_core_news_sm
Version 2.1.0
spaCy >=2.1.0
Model size 10 MB
Pipeline  tagger, parser, ner
Vectors 0 keys, 0 unique vectors (0 dimensions)
Sources TIGER Corpus, Wikipedia
License MIT
Author Explosion AI

Accuracy

Type Score
ENTS_F  83.10
ENTS_P  84.09
ENTS_R  82.13
LAS  88.58
TAGS_ACC  96.27
TOKEN_ACC  95.88
UAS  90.66

Because the model is trained on Wikipedia, it may perform inconsistently on many genres, such as social media text. The NER accuracy refers to the "silver standard" annotations in the WikiNER corpus. Accuracy on these annotations tends to be higher than correct human annotations.

Installation

pip install spacy
spacy download de_core_news_sm

de_core_news_md-2.1.0

17 Mar 11:01
8113e54
Compare
Choose a tag to compare

Downloads

Details: https://spacy.io/models/de#de_core_news_md

File checksum: c1ab2f9e95b084dd895ac8f76b12c880d20766e886d3364350125971b3a72e60

German multi-task CNN trained on the TIGER and WikiNER corpus. Assigns context-specific token vectors, POS tags, dependency parse and named entities. Supports identification of PER, LOC, ORG and MISC entities.

Feature Description
Name de_core_news_md
Version 2.1.0
spaCy >=2.1.0
Model size 210 MB
Pipeline  tagger, parser, ner
Vectors 276087 keys, 20000 unique vectors (300 dimensions)
Sources TIGER Corpus, Wikipedia
License MIT
Author Explosion AI

Accuracy

Type Score
ENTS_F  83.78
ENTS_P  84.25
ENTS_R  83.31
LAS  89.41
TAGS_ACC  96.55
TOKEN_ACC  95.88
UAS  91.23

Because the model is trained on Wikipedia, it may perform inconsistently on many genres, such as social media text. The NER accuracy refers to the "silver standard" annotations in the WikiNER corpus. Accuracy on these annotations tends to be higher than correct human annotations.

Installation

pip install spacy
spacy download de_core_news_md

en_vectors_web_lg-2.1.0a0

13 Feb 14:54
e465331
Compare
Choose a tag to compare
Pre-release

Details: https://spacy.io/models/en#en_vectors_web_lg

File checksum: 832304e25f0a35cdaed9638eca1d09a265ea1fb0c63ca90d74232f48c817d5ec

1.2m 300d vectors trained on Common Crawl with GloVe

Feature Description
Name en_vectors_web_lg
Version 2.1.0a0
spaCy >=2.1.0.a7,<3.0.0
Model size 631 MB
Vectors 1070971 keys, 1070971 unique vectors (300 dimensions)
Sources Common Crawl
License CC BY-SA 3.0
Author Jeffrey Pennington, Richard Socher, and Christopher D. Manning. Repackaged by Explosion AI

Installation

pip install spacy-nightly
spacy download en_vectors_web_lg

xx_ent_wiki_sm-2.1.0a7

09 Feb 11:49
Compare
Choose a tag to compare
Pre-release

Details: https://spacy.io/models/xx#xx_ent_wiki_sm

File checksum: 1d07e7a7acbc1750c680c687725eabc836f094b8eeaf1a1b7c7901d5e4b67fdc

Multi-lingual CNN trained on Nothman et al. (2010) Wikipedia corpus. Assigns named entities. Supports identification of PER, LOC, ORG and MISC entities for English, German, Spanish, French, Italian, Portuguese and Russian.

Feature Description
Name xx_ent_wiki_sm
Version 2.1.0a7
spaCy >=2.1.0a4
Model size 3 MB
Pipeline  ner
Vectors 0 keys, 0 unique vectors (0 dimensions)
Sources Wikipedia
License MIT
Author Explosion AI

Accuracy

Type Score
ENTS_F  81.64
ENTS_P  82.23
ENTS_R  81.06

Because the model is trained on Wikipedia, it may perform inconsistently on many genres, such as social media text.

Installation

pip install spacy-nightly
spacy download xx_ent_wiki_sm

pt_core_news_sm-2.1.0a7

09 Feb 11:49
Compare
Choose a tag to compare
Pre-release

Details: https://spacy.io/models/pt#pt_core_news_sm

File checksum: 75e9d7ec03b5be522124a8b4e504cf636860c0466fc24a59c0b00928b29e0df1

Portuguese multi-task CNN trained on the Universal Dependencies and WikiNER corpus. Assigns context-specific token vectors, POS tags, dependency parse and named entities. Supports identification of PER, LOC, ORG and MISC entities.

Feature Description
Name pt_core_news_sm
Version 2.1.0a7
spaCy >=2.1.0a4
Model size 12 MB
Pipeline  tagger, parser, ner
Vectors 0 keys, 0 unique vectors (0 dimensions)
Sources Universal Dependencies, Wikipedia
License CC BY-SA 4.0
Author Explosion AI

Accuracy

Type Score
ENTS_F  88.98
ENTS_P  89.10
ENTS_R  88.86
LAS  86.20
TAGS_ACC  80.65
TOKEN_ACC  100.00
UAS  89.45

Because the model is trained on Wikipedia, it may perform inconsistently on many genres, such as social media text. The NER accuracy refers to the "silver standard" annotations in the WikiNER corpus. Accuracy on these annotations tends to be higher than correct human annotations.

Installation

pip install spacy-nightly
spacy download pt_core_news_sm

nl_core_news_sm-2.1.0a7

09 Feb 11:49
Compare
Choose a tag to compare
Pre-release

Details: https://spacy.io/models/nl#nl_core_news_sm

File checksum: 0ef11af5cac084f5d5110bf163e1f6ddb3575fd1e6ff64aee4e09140dbf7e1ea

Dutch multi-task CNN trained on the Universal Dependencies and WikiNER corpus. Assigns context-specific token vectors, POS tags, dependency parse and named entities. Supports identification of PER, LOC, ORG and MISC entities.

Feature Description
Name nl_core_news_sm
Version 2.1.0a7
spaCy >=2.1.0a4
Model size 10 MB
Pipeline  tagger, parser, ner
Vectors 0 keys, 0 unique vectors (0 dimensions)
Sources Universal Dependencies, Wikipedia
License CC BY-SA 4.0
Author Explosion AI

Accuracy

Type Score
ENTS_F  86.98
ENTS_P  86.46
ENTS_R  87.50
LAS  77.62
TAGS_ACC  91.53
TOKEN_ACC  100.00
UAS  83.85

Because the model is trained on Wikipedia, it may perform inconsistently on many genres, such as social media text. The NER accuracy refers to the "silver standard" annotations in the WikiNER corpus. Accuracy on these annotations tends to be higher than correct human annotations.

Installation

pip install spacy-nightly
spacy download nl_core_news_sm

it_core_news_sm-2.1.0a7

09 Feb 11:50
Compare
Choose a tag to compare
Pre-release

Details: https://spacy.io/models/it#it_core_news_sm

File checksum: 87c9839a78977388d0d2503779c1cbda1373237e9b434f72623eb19b5c3cc32c

Italian multi-task CNN trained on the Universal Dependencies and WikiNER corpus. Assigns context-specific token vectors, POS tags, dependency parse and named entities. Supports identification of PER, LOC, ORG and MISC entities.

Feature Description
Name it_core_news_sm
Version 2.1.0a7
spaCy >=2.1.0a4
Model size 10 MB
Pipeline  tagger, parser, ner
Vectors 0 keys, 0 unique vectors (0 dimensions)
Sources Universal Dependencies, Wikipedia
License CC BY-NC-SA 3.0
Author Explosion AI

Accuracy

Type Score
ENTS_F  86.31
ENTS_P  86.48
ENTS_R  86.13
LAS  87.20
TAGS_ACC  95.97
TOKEN_ACC  100.00
UAS  91.09

Because the model is trained on Wikipedia, it may perform inconsistently on many genres, such as social media text. The NER accuracy refers to the "silver standard" annotations in the WikiNER corpus. Accuracy on these annotations tends to be higher than correct human annotations.

Installation

pip install spacy-nightly
spacy download it_core_news_sm

fr_core_news_sm-2.1.0a7

09 Feb 11:50
Compare
Choose a tag to compare
Pre-release

Details: https://spacy.io/models/fr#fr_core_news_sm

File checksum: 5ac388c58ed1b2f9b306a24ae770a292c7919f8a5298eb2ab14f351a3dca8207

French multi-task CNN trained on the French Sequoia (Universal Dependencies) and WikiNER corpus. Assigns context-specific token vectors, POS tags, dependency parse and named entities. Supports identification of PER, LOC, ORG and MISC entities.

Feature Description
Name fr_core_news_sm
Version 2.1.0a7
spaCy >=2.1.0a4
Model size 12 MB
Pipeline  tagger, parser, ner
Vectors 0 keys, 0 unique vectors (0 dimensions)
Sources Sequoia Corpus (UD), Wikipedia
License LGPL
Author Explosion AI

Accuracy

Type Score
ENTS_F  83.02
ENTS_P  83.10
ENTS_R  82.95
LAS  84.41
TAGS_ACC  94.66
TOKEN_ACC  100.00
UAS  87.28

Because the model is trained on Wikipedia, it may perform inconsistently on many genres, such as social media text. The NER accuracy refers to the "silver standard" annotations in the WikiNER corpus. Accuracy on these annotations tends to be higher than correct human annotations.

Installation

pip install spacy-nightly
spacy download fr_core_news_sm

fr_core_news_md-2.1.0a7

09 Feb 11:50
Compare
Choose a tag to compare
Pre-release

Details: https://spacy.io/models/fr#fr_core_news_md

File checksum: b18a388fae63d0dac45b0bb2aef6ce6f6833af4b29e11030325c3c58972afd21

French multi-task CNN trained on the French Sequoia (Universal Dependencies) and WikiNER corpus. Assigns context-specific token vectors, POS tags, dependency parse and named entities. Supports identification of PER, LOC, ORG and MISC entities.

Feature Description
Name fr_core_news_md
Version 2.1.0a7
spaCy >=2.1.0a4
Model size 81 MB
Pipeline  tagger, parser, ner
Vectors 579447 keys, 20000 unique vectors (300 dimensions)
Sources Sequoia Corpus (UD), Wikipedia
License LGPL
Author Explosion AI

Accuracy

Type Score
ENTS_F  83.34
ENTS_P  83.43
ENTS_R  83.25
LAS  86.20
TAGS_ACC  95.28
TOKEN_ACC  100.00
UAS  89.10

Because the model is trained on Wikipedia, it may perform inconsistently on many genres, such as social media text. The NER accuracy refers to the "silver standard" annotations in the WikiNER corpus. Accuracy on these annotations tends to be higher than correct human annotations.

Installation

pip install spacy-nightly
spacy download fr_core_news_md