Releases: explosion/spacy-models
es_core_news_sm-2.1.0a7
Details: https://spacy.io/models/es#es_core_news_sm
File checksum:
635634edbb74e07e41be0c7308e7cc3735aa3c11d4796dd3a2573b74d98116cd
Spanish multi-task CNN trained on the AnCora and WikiNER corpus. Assigns context-specific token vectors, POS tags, dependency parse and named entities. Supports identification of PER, LOC, ORG and MISC entities.
Feature | Description |
---|---|
Name | es_core_news_sm |
Version | 2.1.0a7 |
spaCy | >=2.1.0a4 |
Model size | 10 MB |
Pipeline | tagger , parser , ner |
Vectors | 0 keys, 0 unique vectors (0 dimensions) |
Sources | AnCora, Wikipedia |
License | GPL |
Author | Explosion AI |
Accuracy
Type | Score |
---|---|
ENTS_F |
89.06 |
ENTS_P |
89.14 |
ENTS_R |
88.97 |
LAS |
87.12 |
TAGS_ACC |
96.95 |
TOKEN_ACC |
100.00 |
UAS |
90.22 |
Because the model is trained on Wikipedia, it may perform inconsistently on many genres, such as social media text. The NER accuracy refers to the "silver standard" annotations in the WikiNER corpus. Accuracy on these annotations tends to be higher than correct human annotations.
Installation
pip install spacy-nightly
spacy download es_core_news_sm
es_core_news_md-2.1.0a7
Details: https://spacy.io/models/es#es_core_news_md
File checksum:
06d827f4822d06308b2a8d66d5ac526dec521d041826405cd5ade0f4d587b656
Spanish multi-task CNN trained on the AnCora and WikiNER corpus. Assigns context-specific token vectors, POS tags, dependency parse and named entities. Supports identification of PER, LOC, ORG and MISC entities.
Feature | Description |
---|---|
Name | es_core_news_md |
Version | 2.1.0a7 |
spaCy | >=2.1.0a4 |
Model size | 69 MB |
Pipeline | tagger , parser , ner |
Vectors | 533736 keys, 20000 unique vectors (50 dimensions) |
Sources | AnCora, Wikipedia |
License | GPL |
Author | Explosion AI |
Accuracy
Type | Score |
---|---|
ENTS_F |
89.38 |
ENTS_P |
89.50 |
ENTS_R |
89.27 |
LAS |
88.35 |
TAGS_ACC |
97.21 |
TOKEN_ACC |
100.00 |
UAS |
91.17 |
Because the model is trained on Wikipedia, it may perform inconsistently on many genres, such as social media text. The NER accuracy refers to the "silver standard" annotations in the WikiNER corpus. Accuracy on these annotations tends to be higher than correct human annotations.
Installation
pip install spacy-nightly
spacy download es_core_news_md
en_core_web_sm-2.1.0a7
Details: https://spacy.io/models/en#en_core_web_sm
File checksum:
0b6264abb56aa6163dc431c98f625791945f225738c0906c09abfd763f655ec2
English multi-task CNN trained on OntoNotes. Assigns context-specific token vectors, POS tags, dependency parse and named entities.
Feature | Description |
---|---|
Name | en_core_web_sm |
Version | 2.1.0a7 |
spaCy | >=2.1.0a4 |
Model size | 10 MB |
Pipeline | tagger , parser , ner |
Vectors | 0 keys, 0 unique vectors (0 dimensions) |
Sources | OntoNotes 5 |
License | MIT |
Author | Explosion AI |
Accuracy
Type | Score |
---|---|
ENTS_F |
85.51 |
ENTS_P |
85.47 |
ENTS_R |
85.55 |
LAS |
89.68 |
TAGS_ACC |
96.83 |
TOKEN_ACC |
99.06 |
UAS |
91.57 |
Installation
pip install spacy-nightly
spacy download en_core_web_sm
en_core_web_md-2.1.0a7
Details: https://spacy.io/models/en#en_core_web_md
File checksum:
f54a6e6a2ff34c1adb1a2eabeb67b170933453ed878125c76813dc2e31c8cf8a
English multi-task CNN trained on OntoNotes, with GloVe vectors trained on Common Crawl. Assigns word vectors, context-specific token vectors, POS tags, dependency parse and named entities.
Feature | Description |
---|---|
Name | en_core_web_md |
Version | 2.1.0a7 |
spaCy | >=2.1.0a4 |
Model size | 91 MB |
Pipeline | tagger , parser , ner |
Vectors | 684830 keys, 20000 unique vectors (300 dimensions) |
Sources | OntoNotes 5, Common Crawl |
License | MIT |
Author | Explosion AI |
Accuracy
Type | Score |
---|---|
ENTS_F |
86.31 |
ENTS_P |
86.29 |
ENTS_R |
86.33 |
LAS |
90.00 |
TAGS_ACC |
96.93 |
TOKEN_ACC |
99.06 |
UAS |
91.81 |
Installation
pip install spacy-nightly
spacy download en_core_web_md
en_core_web_lg-2.1.0a7
Details: https://spacy.io/models/en#en_core_web_lg
File checksum:
10b9ad440f66acc406013d7878f5dc73791849ca126d135e0c28c6c49abeedf5
English multi-task CNN trained on OntoNotes, with GloVe vectors trained on Common Crawl. Assigns word vectors, context-specific token vectors, POS tags, dependency parse and named entities.
Feature | Description |
---|---|
Name | en_core_web_lg |
Version | 2.1.0a7 |
spaCy | >=2.1.0a4 |
Model size | 788 MB |
Pipeline | tagger , parser , ner |
Vectors | 684830 keys, 684831 unique vectors (300 dimensions) |
Sources | OntoNotes 5, Common Crawl |
License | MIT |
Author | Explosion AI |
Accuracy
Type | Score |
---|---|
ENTS_F |
86.59 |
ENTS_P |
86.63 |
ENTS_R |
86.54 |
LAS |
90.13 |
TAGS_ACC |
96.99 |
TOKEN_ACC |
99.06 |
UAS |
91.90 |
Installation
pip install spacy-nightly
spacy download en_core_web_lg
el_core_news_sm-2.1.0a7
Details: https://spacy.io/models/el#el_core_news_sm
File checksum:
4c321bbfb499fe5ffd31ccd57edab230c999c5a4a3f8fce8d5e14b729bfcd79c
Greek pipeline with word vectors, POS tags, dependencies and named entities. Word vectors use Facebook's FastText Common Crawl vectors, pruned to a vocabulary of 20,000 items. Words outside the most frequent were mapped to the nearest neighbouring vector within the 20,000 rows retained. Syntax (dependencies and POS tags) trained from the Universal Dependencies conversion of the Greek Dependency Treebank (v2.2). Named entity annotations were created by Giannis Daras using Prodigy, using the OntoNotes 5 annotation schema.
Feature | Description |
---|---|
Name | el_core_news_sm |
Version | 2.1.0a7 |
spaCy | >=2.1.0a4 |
Model size | 10 MB |
Pipeline | tagger , parser , ner |
Vectors | 0 keys, 0 unique vectors (0 dimensions) |
Sources | Greek Dependency Treebank, Daras GSOC 2018 |
License | CC BY-NC 4.0 |
Author | Giannis Daras |
Accuracy
Type | Score |
---|---|
ENTS_F |
73.26 |
ENTS_P |
73.42 |
ENTS_R |
73.11 |
LAS |
81.53 |
TAGS_ACC |
94.47 |
TOKEN_ACC |
100.00 |
UAS |
85.12 |
Installation
pip install spacy-nightly
spacy download el_core_news_sm
el_core_news_md-2.1.0a7
Details: https://spacy.io/models/el#el_core_news_md
File checksum:
df30f3bb24e7e38d7aa473cf4bebada78c8eb358dfb65f98bea46fed2fe35c0d
Greek pipeline with word vectors, POS tags, dependencies and named entities. Word vectors use Facebook's FastText Common Crawl vectors, pruned to a vocabulary of 20,000 items. Words outside the most frequent were mapped to the nearest neighbouring vector within the 20,000 rows retained. Syntax (dependencies and POS tags) trained from the Universal Dependencies conversion of the Greek Dependency Treebank (v2.2). Named entity annotations were created by Giannis Daras using Prodigy, using the OntoNotes 5 annotation schema.
Feature | Description |
---|---|
Name | el_core_news_md |
Version | 2.1.0a7 |
spaCy | >=2.1.0a4 |
Model size | 126 MB |
Pipeline | tagger , parser , ner |
Vectors | 1999938 keys, 20000 unique vectors (300 dimensions) |
Sources | Common Crawl, Greek Dependency Treebank, Daras GSOC 2018 |
License | CC BY-NC 4.0 |
Author | Giannis Daras |
Accuracy
Type | Score |
---|---|
ENTS_F |
78.05 |
ENTS_P |
75.59 |
ENTS_R |
80.67 |
LAS |
85.10 |
TAGS_ACC |
96.66 |
TOKEN_ACC |
100.00 |
UAS |
88.21 |
Installation
pip install spacy-nightly
spacy download el_core_news_md
de_core_news_sm-2.1.0a7
Details: https://spacy.io/models/de#de_core_news_sm
File checksum:
b123755ebe4b59c55afd86c074d5f37e15fd3331dca3f2d403cc544cde1d0b25
German multi-task CNN trained on the TIGER and WikiNER corpus. Assigns context-specific token vectors, POS tags, dependency parse and named entities. Supports identification of PER, LOC, ORG and MISC entities.
Feature | Description |
---|---|
Name | de_core_news_sm |
Version | 2.1.0a7 |
spaCy | >=2.1.0a4 |
Model size | 10 MB |
Pipeline | tagger , parser , ner |
Vectors | 0 keys, 0 unique vectors (0 dimensions) |
Sources | TIGER Corpus, Wikipedia |
License | MIT |
Author | Explosion AI |
Accuracy
Type | Score |
---|---|
ENTS_F |
83.42 |
ENTS_P |
84.38 |
ENTS_R |
82.48 |
LAS |
89.52 |
TAGS_ACC |
97.26 |
TOKEN_ACC |
99.48 |
UAS |
91.65 |
Because the model is trained on Wikipedia, it may perform inconsistently on many genres, such as social media text. The NER accuracy refers to the "silver standard" annotations in the WikiNER corpus. Accuracy on these annotations tends to be higher than correct human annotations.
Installation
pip install spacy-nightly
spacy download de_core_news_sm
de_core_news_md-2.1.0a7
Details: https://spacy.io/models/de#de_core_news_md
File checksum:
75f9b5843021e0ff6e12e280777fa6a51c51a77f68d5b75abb1e6c68d59fdb0c
German multi-task CNN trained on the TIGER and WikiNER corpus. Assigns context-specific token vectors, POS tags, dependency parse and named entities. Supports identification of PER, LOC, ORG and MISC entities.
Feature | Description |
---|---|
Name | de_core_news_md |
Version | 2.1.0a7 |
spaCy | >=2.1.0a4 |
Model size | 210 MB |
Pipeline | tagger , parser , ner |
Vectors | 276087 keys, 20000 unique vectors (300 dimensions) |
Sources | TIGER Corpus, Wikipedia |
License | MIT |
Author | Explosion AI |
Accuracy
Type | Score |
---|---|
ENTS_F |
83.77 |
ENTS_P |
84.64 |
ENTS_R |
82.93 |
LAS |
90.36 |
TAGS_ACC |
97.42 |
TOKEN_ACC |
99.48 |
UAS |
92.26 |
Because the model is trained on Wikipedia, it may perform inconsistently on many genres, such as social media text. The NER accuracy refers to the "silver standard" annotations in the WikiNER corpus. Accuracy on these annotations tends to be higher than correct human annotations.
Installation
pip install spacy-nightly
spacy download de_core_news_md
xx_ent_wiki_sm-2.1.0a6
Details: https://spacy.io/models/xx#xx_ent_wiki_sm
File checksum:
9445f433985304b717b72a86f544bcaef8e571ed03cd30aa8b82d1ec8127c91a
Multi-lingual CNN trained on Nothman et al. (2010) Wikipedia corpus. Assigns named entities. Supports identification of PER, LOC, ORG and MISC entities for English, German, Spanish, French, Italian, Portuguese and Russian.
Feature | Description |
---|---|
Name | xx_ent_wiki_sm |
Version | 2.1.0a6 |
spaCy | >=2.1.0a4 |
Model size | 3 MB |
Pipeline | ner |
Vectors | 0 keys, 0 unique vectors (0 dimensions) |
Sources | Wikipedia |
License | MIT |
Author | Explosion AI |
Accuracy
Type | Score |
---|---|
ENTS_F |
81.58 |
ENTS_P |
82.17 |
ENTS_R |
81.01 |
Because the model is trained on Wikipedia, it may perform inconsistently on many genres, such as social media text.
Installation
pip install spacy-nightly
spacy download xx_ent_wiki_sm