Releases: explosion/spacy-models
el_core_news_md-2.1.0
Details: https://spacy.io/models/el#el_core_news_md
File checksum:
7ae074564e262a954ae5c0be18949058eb7e10f421b5c917855589b4284cfee5
Greek pipeline with word vectors, POS tags, dependencies and named entities. Word vectors use Facebook's FastText Common Crawl vectors, pruned to a vocabulary of 20,000 items. Words outside the most frequent were mapped to the nearest neighbouring vector within the 20,000 rows retained. Syntax (dependencies and POS tags) trained from the Universal Dependencies conversion of the Greek Dependency Treebank (v2.2). Named entity annotations were created by Giannis Daras using Prodigy, using the OntoNotes 5 annotation schema.
Feature | Description |
---|---|
Name | el_core_news_md |
Version | 2.1.0 |
spaCy | >=2.1.0 |
Model size | 126 MB |
Pipeline | tagger , parser , ner |
Vectors | 1999938 keys, 20000 unique vectors (300 dimensions) |
Sources | Common Crawl, Greek Dependency Treebank, Daras GSOC 2018 |
License | CC BY-NC 4.0 |
Author | Giannis Daras |
Accuracy
Type | Score |
---|---|
ENTS_F |
81.06 |
ENTS_P |
78.66 |
ENTS_R |
83.61 |
LAS |
85.00 |
TAGS_ACC |
96.56 |
TOKEN_ACC |
100.00 |
UAS |
88.29 |
Installation
pip install spacy
spacy download el_core_news_md
de_core_news_sm-2.1.0
Details: https://spacy.io/models/de#de_core_news_sm
File checksum:
25f1140eef95dccb124ca74b28db83f27787c210bcf6a97db4269690146e443a
German multi-task CNN trained on the TIGER and WikiNER corpus. Assigns context-specific token vectors, POS tags, dependency parse and named entities. Supports identification of PER, LOC, ORG and MISC entities.
Feature | Description |
---|---|
Name | de_core_news_sm |
Version | 2.1.0 |
spaCy | >=2.1.0 |
Model size | 10 MB |
Pipeline | tagger , parser , ner |
Vectors | 0 keys, 0 unique vectors (0 dimensions) |
Sources | TIGER Corpus, Wikipedia |
License | MIT |
Author | Explosion AI |
Accuracy
Type | Score |
---|---|
ENTS_F |
83.10 |
ENTS_P |
84.09 |
ENTS_R |
82.13 |
LAS |
88.58 |
TAGS_ACC |
96.27 |
TOKEN_ACC |
95.88 |
UAS |
90.66 |
Because the model is trained on Wikipedia, it may perform inconsistently on many genres, such as social media text. The NER accuracy refers to the "silver standard" annotations in the WikiNER corpus. Accuracy on these annotations tends to be higher than correct human annotations.
Installation
pip install spacy
spacy download de_core_news_sm
de_core_news_md-2.1.0
Details: https://spacy.io/models/de#de_core_news_md
File checksum:
c1ab2f9e95b084dd895ac8f76b12c880d20766e886d3364350125971b3a72e60
German multi-task CNN trained on the TIGER and WikiNER corpus. Assigns context-specific token vectors, POS tags, dependency parse and named entities. Supports identification of PER, LOC, ORG and MISC entities.
Feature | Description |
---|---|
Name | de_core_news_md |
Version | 2.1.0 |
spaCy | >=2.1.0 |
Model size | 210 MB |
Pipeline | tagger , parser , ner |
Vectors | 276087 keys, 20000 unique vectors (300 dimensions) |
Sources | TIGER Corpus, Wikipedia |
License | MIT |
Author | Explosion AI |
Accuracy
Type | Score |
---|---|
ENTS_F |
83.78 |
ENTS_P |
84.25 |
ENTS_R |
83.31 |
LAS |
89.41 |
TAGS_ACC |
96.55 |
TOKEN_ACC |
95.88 |
UAS |
91.23 |
Because the model is trained on Wikipedia, it may perform inconsistently on many genres, such as social media text. The NER accuracy refers to the "silver standard" annotations in the WikiNER corpus. Accuracy on these annotations tends to be higher than correct human annotations.
Installation
pip install spacy
spacy download de_core_news_md
en_vectors_web_lg-2.1.0a0
Details: https://spacy.io/models/en#en_vectors_web_lg
File checksum:
832304e25f0a35cdaed9638eca1d09a265ea1fb0c63ca90d74232f48c817d5ec
1.2m 300d vectors trained on Common Crawl with GloVe
Feature | Description |
---|---|
Name | en_vectors_web_lg |
Version | 2.1.0a0 |
spaCy | >=2.1.0.a7,<3.0.0 |
Model size | 631 MB |
Vectors | 1070971 keys, 1070971 unique vectors (300 dimensions) |
Sources | Common Crawl |
License | CC BY-SA 3.0 |
Author | Jeffrey Pennington, Richard Socher, and Christopher D. Manning. Repackaged by Explosion AI |
Installation
pip install spacy-nightly
spacy download en_vectors_web_lg
xx_ent_wiki_sm-2.1.0a7
Details: https://spacy.io/models/xx#xx_ent_wiki_sm
File checksum:
1d07e7a7acbc1750c680c687725eabc836f094b8eeaf1a1b7c7901d5e4b67fdc
Multi-lingual CNN trained on Nothman et al. (2010) Wikipedia corpus. Assigns named entities. Supports identification of PER, LOC, ORG and MISC entities for English, German, Spanish, French, Italian, Portuguese and Russian.
Feature | Description |
---|---|
Name | xx_ent_wiki_sm |
Version | 2.1.0a7 |
spaCy | >=2.1.0a4 |
Model size | 3 MB |
Pipeline | ner |
Vectors | 0 keys, 0 unique vectors (0 dimensions) |
Sources | Wikipedia |
License | MIT |
Author | Explosion AI |
Accuracy
Type | Score |
---|---|
ENTS_F |
81.64 |
ENTS_P |
82.23 |
ENTS_R |
81.06 |
Because the model is trained on Wikipedia, it may perform inconsistently on many genres, such as social media text.
Installation
pip install spacy-nightly
spacy download xx_ent_wiki_sm
pt_core_news_sm-2.1.0a7
Details: https://spacy.io/models/pt#pt_core_news_sm
File checksum:
75e9d7ec03b5be522124a8b4e504cf636860c0466fc24a59c0b00928b29e0df1
Portuguese multi-task CNN trained on the Universal Dependencies and WikiNER corpus. Assigns context-specific token vectors, POS tags, dependency parse and named entities. Supports identification of PER, LOC, ORG and MISC entities.
Feature | Description |
---|---|
Name | pt_core_news_sm |
Version | 2.1.0a7 |
spaCy | >=2.1.0a4 |
Model size | 12 MB |
Pipeline | tagger , parser , ner |
Vectors | 0 keys, 0 unique vectors (0 dimensions) |
Sources | Universal Dependencies, Wikipedia |
License | CC BY-SA 4.0 |
Author | Explosion AI |
Accuracy
Type | Score |
---|---|
ENTS_F |
88.98 |
ENTS_P |
89.10 |
ENTS_R |
88.86 |
LAS |
86.20 |
TAGS_ACC |
80.65 |
TOKEN_ACC |
100.00 |
UAS |
89.45 |
Because the model is trained on Wikipedia, it may perform inconsistently on many genres, such as social media text. The NER accuracy refers to the "silver standard" annotations in the WikiNER corpus. Accuracy on these annotations tends to be higher than correct human annotations.
Installation
pip install spacy-nightly
spacy download pt_core_news_sm
nl_core_news_sm-2.1.0a7
Details: https://spacy.io/models/nl#nl_core_news_sm
File checksum:
0ef11af5cac084f5d5110bf163e1f6ddb3575fd1e6ff64aee4e09140dbf7e1ea
Dutch multi-task CNN trained on the Universal Dependencies and WikiNER corpus. Assigns context-specific token vectors, POS tags, dependency parse and named entities. Supports identification of PER, LOC, ORG and MISC entities.
Feature | Description |
---|---|
Name | nl_core_news_sm |
Version | 2.1.0a7 |
spaCy | >=2.1.0a4 |
Model size | 10 MB |
Pipeline | tagger , parser , ner |
Vectors | 0 keys, 0 unique vectors (0 dimensions) |
Sources | Universal Dependencies, Wikipedia |
License | CC BY-SA 4.0 |
Author | Explosion AI |
Accuracy
Type | Score |
---|---|
ENTS_F |
86.98 |
ENTS_P |
86.46 |
ENTS_R |
87.50 |
LAS |
77.62 |
TAGS_ACC |
91.53 |
TOKEN_ACC |
100.00 |
UAS |
83.85 |
Because the model is trained on Wikipedia, it may perform inconsistently on many genres, such as social media text. The NER accuracy refers to the "silver standard" annotations in the WikiNER corpus. Accuracy on these annotations tends to be higher than correct human annotations.
Installation
pip install spacy-nightly
spacy download nl_core_news_sm
it_core_news_sm-2.1.0a7
Details: https://spacy.io/models/it#it_core_news_sm
File checksum:
87c9839a78977388d0d2503779c1cbda1373237e9b434f72623eb19b5c3cc32c
Italian multi-task CNN trained on the Universal Dependencies and WikiNER corpus. Assigns context-specific token vectors, POS tags, dependency parse and named entities. Supports identification of PER, LOC, ORG and MISC entities.
Feature | Description |
---|---|
Name | it_core_news_sm |
Version | 2.1.0a7 |
spaCy | >=2.1.0a4 |
Model size | 10 MB |
Pipeline | tagger , parser , ner |
Vectors | 0 keys, 0 unique vectors (0 dimensions) |
Sources | Universal Dependencies, Wikipedia |
License | CC BY-NC-SA 3.0 |
Author | Explosion AI |
Accuracy
Type | Score |
---|---|
ENTS_F |
86.31 |
ENTS_P |
86.48 |
ENTS_R |
86.13 |
LAS |
87.20 |
TAGS_ACC |
95.97 |
TOKEN_ACC |
100.00 |
UAS |
91.09 |
Because the model is trained on Wikipedia, it may perform inconsistently on many genres, such as social media text. The NER accuracy refers to the "silver standard" annotations in the WikiNER corpus. Accuracy on these annotations tends to be higher than correct human annotations.
Installation
pip install spacy-nightly
spacy download it_core_news_sm
fr_core_news_sm-2.1.0a7
Details: https://spacy.io/models/fr#fr_core_news_sm
File checksum:
5ac388c58ed1b2f9b306a24ae770a292c7919f8a5298eb2ab14f351a3dca8207
French multi-task CNN trained on the French Sequoia (Universal Dependencies) and WikiNER corpus. Assigns context-specific token vectors, POS tags, dependency parse and named entities. Supports identification of PER, LOC, ORG and MISC entities.
Feature | Description |
---|---|
Name | fr_core_news_sm |
Version | 2.1.0a7 |
spaCy | >=2.1.0a4 |
Model size | 12 MB |
Pipeline | tagger , parser , ner |
Vectors | 0 keys, 0 unique vectors (0 dimensions) |
Sources | Sequoia Corpus (UD), Wikipedia |
License | LGPL |
Author | Explosion AI |
Accuracy
Type | Score |
---|---|
ENTS_F |
83.02 |
ENTS_P |
83.10 |
ENTS_R |
82.95 |
LAS |
84.41 |
TAGS_ACC |
94.66 |
TOKEN_ACC |
100.00 |
UAS |
87.28 |
Because the model is trained on Wikipedia, it may perform inconsistently on many genres, such as social media text. The NER accuracy refers to the "silver standard" annotations in the WikiNER corpus. Accuracy on these annotations tends to be higher than correct human annotations.
Installation
pip install spacy-nightly
spacy download fr_core_news_sm
fr_core_news_md-2.1.0a7
Details: https://spacy.io/models/fr#fr_core_news_md
File checksum:
b18a388fae63d0dac45b0bb2aef6ce6f6833af4b29e11030325c3c58972afd21
French multi-task CNN trained on the French Sequoia (Universal Dependencies) and WikiNER corpus. Assigns context-specific token vectors, POS tags, dependency parse and named entities. Supports identification of PER, LOC, ORG and MISC entities.
Feature | Description |
---|---|
Name | fr_core_news_md |
Version | 2.1.0a7 |
spaCy | >=2.1.0a4 |
Model size | 81 MB |
Pipeline | tagger , parser , ner |
Vectors | 579447 keys, 20000 unique vectors (300 dimensions) |
Sources | Sequoia Corpus (UD), Wikipedia |
License | LGPL |
Author | Explosion AI |
Accuracy
Type | Score |
---|---|
ENTS_F |
83.34 |
ENTS_P |
83.43 |
ENTS_R |
83.25 |
LAS |
86.20 |
TAGS_ACC |
95.28 |
TOKEN_ACC |
100.00 |
UAS |
89.10 |
Because the model is trained on Wikipedia, it may perform inconsistently on many genres, such as social media text. The NER accuracy refers to the "silver standard" annotations in the WikiNER corpus. Accuracy on these annotations tends to be higher than correct human annotations.
Installation
pip install spacy-nightly
spacy download fr_core_news_md