Skip to content

Releases: explosion/spacy-models

es_core_news_sm-2.1.0a7

09 Feb 11:50
Compare
Choose a tag to compare
Pre-release

Details: https://spacy.io/models/es#es_core_news_sm

File checksum: 635634edbb74e07e41be0c7308e7cc3735aa3c11d4796dd3a2573b74d98116cd

Spanish multi-task CNN trained on the AnCora and WikiNER corpus. Assigns context-specific token vectors, POS tags, dependency parse and named entities. Supports identification of PER, LOC, ORG and MISC entities.

Feature Description
Name es_core_news_sm
Version 2.1.0a7
spaCy >=2.1.0a4
Model size 10 MB
Pipeline  tagger, parser, ner
Vectors 0 keys, 0 unique vectors (0 dimensions)
Sources AnCora, Wikipedia
License GPL
Author Explosion AI

Accuracy

Type Score
ENTS_F  89.06
ENTS_P  89.14
ENTS_R  88.97
LAS  87.12
TAGS_ACC  96.95
TOKEN_ACC  100.00
UAS  90.22

Because the model is trained on Wikipedia, it may perform inconsistently on many genres, such as social media text. The NER accuracy refers to the "silver standard" annotations in the WikiNER corpus. Accuracy on these annotations tends to be higher than correct human annotations.

Installation

pip install spacy-nightly
spacy download es_core_news_sm

es_core_news_md-2.1.0a7

09 Feb 11:50
Compare
Choose a tag to compare
Pre-release

Details: https://spacy.io/models/es#es_core_news_md

File checksum: 06d827f4822d06308b2a8d66d5ac526dec521d041826405cd5ade0f4d587b656

Spanish multi-task CNN trained on the AnCora and WikiNER corpus. Assigns context-specific token vectors, POS tags, dependency parse and named entities. Supports identification of PER, LOC, ORG and MISC entities.

Feature Description
Name es_core_news_md
Version 2.1.0a7
spaCy >=2.1.0a4
Model size 69 MB
Pipeline  tagger, parser, ner
Vectors 533736 keys, 20000 unique vectors (50 dimensions)
Sources AnCora, Wikipedia
License GPL
Author Explosion AI

Accuracy

Type Score
ENTS_F  89.38
ENTS_P  89.50
ENTS_R  89.27
LAS  88.35
TAGS_ACC  97.21
TOKEN_ACC  100.00
UAS  91.17

Because the model is trained on Wikipedia, it may perform inconsistently on many genres, such as social media text. The NER accuracy refers to the "silver standard" annotations in the WikiNER corpus. Accuracy on these annotations tends to be higher than correct human annotations.

Installation

pip install spacy-nightly
spacy download es_core_news_md

en_core_web_sm-2.1.0a7

09 Feb 11:51
Compare
Choose a tag to compare
Pre-release

Details: https://spacy.io/models/en#en_core_web_sm

File checksum: 0b6264abb56aa6163dc431c98f625791945f225738c0906c09abfd763f655ec2

English multi-task CNN trained on OntoNotes. Assigns context-specific token vectors, POS tags, dependency parse and named entities.

Feature Description
Name en_core_web_sm
Version 2.1.0a7
spaCy >=2.1.0a4
Model size 10 MB
Pipeline  tagger, parser, ner
Vectors 0 keys, 0 unique vectors (0 dimensions)
Sources OntoNotes 5
License MIT
Author Explosion AI

Accuracy

Type Score
ENTS_F  85.51
ENTS_P  85.47
ENTS_R  85.55
LAS  89.68
TAGS_ACC  96.83
TOKEN_ACC  99.06
UAS  91.57

Installation

pip install spacy-nightly
spacy download en_core_web_sm

en_core_web_md-2.1.0a7

09 Feb 11:51
Compare
Choose a tag to compare
Pre-release

Details: https://spacy.io/models/en#en_core_web_md

File checksum: f54a6e6a2ff34c1adb1a2eabeb67b170933453ed878125c76813dc2e31c8cf8a

English multi-task CNN trained on OntoNotes, with GloVe vectors trained on Common Crawl. Assigns word vectors, context-specific token vectors, POS tags, dependency parse and named entities.

Feature Description
Name en_core_web_md
Version 2.1.0a7
spaCy >=2.1.0a4
Model size 91 MB
Pipeline  tagger, parser, ner
Vectors 684830 keys, 20000 unique vectors (300 dimensions)
Sources OntoNotes 5, Common Crawl
License MIT
Author Explosion AI

Accuracy

Type Score
ENTS_F  86.31
ENTS_P  86.29
ENTS_R  86.33
LAS  90.00
TAGS_ACC  96.93
TOKEN_ACC  99.06
UAS  91.81

Installation

pip install spacy-nightly
spacy download en_core_web_md

en_core_web_lg-2.1.0a7

09 Feb 11:51
Compare
Choose a tag to compare
Pre-release

Details: https://spacy.io/models/en#en_core_web_lg

File checksum: 10b9ad440f66acc406013d7878f5dc73791849ca126d135e0c28c6c49abeedf5

English multi-task CNN trained on OntoNotes, with GloVe vectors trained on Common Crawl. Assigns word vectors, context-specific token vectors, POS tags, dependency parse and named entities.

Feature Description
Name en_core_web_lg
Version 2.1.0a7
spaCy >=2.1.0a4
Model size 788 MB
Pipeline  tagger, parser, ner
Vectors 684830 keys, 684831 unique vectors (300 dimensions)
Sources OntoNotes 5, Common Crawl
License MIT
Author Explosion AI

Accuracy

Type Score
ENTS_F  86.59
ENTS_P  86.63
ENTS_R  86.54
LAS  90.13
TAGS_ACC  96.99
TOKEN_ACC  99.06
UAS  91.90

Installation

pip install spacy-nightly
spacy download en_core_web_lg

el_core_news_sm-2.1.0a7

09 Feb 11:51
Compare
Choose a tag to compare
Pre-release

Details: https://spacy.io/models/el#el_core_news_sm

File checksum: 4c321bbfb499fe5ffd31ccd57edab230c999c5a4a3f8fce8d5e14b729bfcd79c

Greek pipeline with word vectors, POS tags, dependencies and named entities. Word vectors use Facebook's FastText Common Crawl vectors, pruned to a vocabulary of 20,000 items. Words outside the most frequent were mapped to the nearest neighbouring vector within the 20,000 rows retained. Syntax (dependencies and POS tags) trained from the Universal Dependencies conversion of the Greek Dependency Treebank (v2.2). Named entity annotations were created by Giannis Daras using Prodigy, using the OntoNotes 5 annotation schema.

Feature Description
Name el_core_news_sm
Version 2.1.0a7
spaCy >=2.1.0a4
Model size 10 MB
Pipeline  tagger, parser, ner
Vectors 0 keys, 0 unique vectors (0 dimensions)
Sources Greek Dependency Treebank, Daras GSOC 2018
License CC BY-NC 4.0
Author Giannis Daras

Accuracy

Type Score
ENTS_F  73.26
ENTS_P  73.42
ENTS_R  73.11
LAS  81.53
TAGS_ACC  94.47
TOKEN_ACC  100.00
UAS  85.12

Installation

pip install spacy-nightly
spacy download el_core_news_sm

el_core_news_md-2.1.0a7

09 Feb 11:51
Compare
Choose a tag to compare
Pre-release

Details: https://spacy.io/models/el#el_core_news_md

File checksum: df30f3bb24e7e38d7aa473cf4bebada78c8eb358dfb65f98bea46fed2fe35c0d

Greek pipeline with word vectors, POS tags, dependencies and named entities. Word vectors use Facebook's FastText Common Crawl vectors, pruned to a vocabulary of 20,000 items. Words outside the most frequent were mapped to the nearest neighbouring vector within the 20,000 rows retained. Syntax (dependencies and POS tags) trained from the Universal Dependencies conversion of the Greek Dependency Treebank (v2.2). Named entity annotations were created by Giannis Daras using Prodigy, using the OntoNotes 5 annotation schema.

Feature Description
Name el_core_news_md
Version 2.1.0a7
spaCy >=2.1.0a4
Model size 126 MB
Pipeline  tagger, parser, ner
Vectors 1999938 keys, 20000 unique vectors (300 dimensions)
Sources Common Crawl, Greek Dependency Treebank, Daras GSOC 2018
License CC BY-NC 4.0
Author Giannis Daras

Accuracy

Type Score
ENTS_F  78.05
ENTS_P  75.59
ENTS_R  80.67
LAS  85.10
TAGS_ACC  96.66
TOKEN_ACC  100.00
UAS  88.21

Installation

pip install spacy-nightly
spacy download el_core_news_md

de_core_news_sm-2.1.0a7

09 Feb 11:52
Compare
Choose a tag to compare
Pre-release

Details: https://spacy.io/models/de#de_core_news_sm

File checksum: b123755ebe4b59c55afd86c074d5f37e15fd3331dca3f2d403cc544cde1d0b25

German multi-task CNN trained on the TIGER and WikiNER corpus. Assigns context-specific token vectors, POS tags, dependency parse and named entities. Supports identification of PER, LOC, ORG and MISC entities.

Feature Description
Name de_core_news_sm
Version 2.1.0a7
spaCy >=2.1.0a4
Model size 10 MB
Pipeline  tagger, parser, ner
Vectors 0 keys, 0 unique vectors (0 dimensions)
Sources TIGER Corpus, Wikipedia
License MIT
Author Explosion AI

Accuracy

Type Score
ENTS_F  83.42
ENTS_P  84.38
ENTS_R  82.48
LAS  89.52
TAGS_ACC  97.26
TOKEN_ACC  99.48
UAS  91.65

Because the model is trained on Wikipedia, it may perform inconsistently on many genres, such as social media text. The NER accuracy refers to the "silver standard" annotations in the WikiNER corpus. Accuracy on these annotations tends to be higher than correct human annotations.

Installation

pip install spacy-nightly
spacy download de_core_news_sm

de_core_news_md-2.1.0a7

09 Feb 11:52
Compare
Choose a tag to compare
Pre-release

Details: https://spacy.io/models/de#de_core_news_md

File checksum: 75f9b5843021e0ff6e12e280777fa6a51c51a77f68d5b75abb1e6c68d59fdb0c

German multi-task CNN trained on the TIGER and WikiNER corpus. Assigns context-specific token vectors, POS tags, dependency parse and named entities. Supports identification of PER, LOC, ORG and MISC entities.

Feature Description
Name de_core_news_md
Version 2.1.0a7
spaCy >=2.1.0a4
Model size 210 MB
Pipeline  tagger, parser, ner
Vectors 276087 keys, 20000 unique vectors (300 dimensions)
Sources TIGER Corpus, Wikipedia
License MIT
Author Explosion AI

Accuracy

Type Score
ENTS_F  83.77
ENTS_P  84.64
ENTS_R  82.93
LAS  90.36
TAGS_ACC  97.42
TOKEN_ACC  99.48
UAS  92.26

Because the model is trained on Wikipedia, it may perform inconsistently on many genres, such as social media text. The NER accuracy refers to the "silver standard" annotations in the WikiNER corpus. Accuracy on these annotations tends to be higher than correct human annotations.

Installation

pip install spacy-nightly
spacy download de_core_news_md

xx_ent_wiki_sm-2.1.0a6

21 Jan 17:48
9d44cac
Compare
Choose a tag to compare
Pre-release

Details: https://spacy.io/models/xx#xx_ent_wiki_sm

File checksum: 9445f433985304b717b72a86f544bcaef8e571ed03cd30aa8b82d1ec8127c91a

Multi-lingual CNN trained on Nothman et al. (2010) Wikipedia corpus. Assigns named entities. Supports identification of PER, LOC, ORG and MISC entities for English, German, Spanish, French, Italian, Portuguese and Russian.

Feature Description
Name xx_ent_wiki_sm
Version 2.1.0a6
spaCy >=2.1.0a4
Model size 3 MB
Pipeline  ner
Vectors 0 keys, 0 unique vectors (0 dimensions)
Sources Wikipedia
License MIT
Author Explosion AI

Accuracy

Type Score
ENTS_F  81.58
ENTS_P  82.17
ENTS_R  81.01

Because the model is trained on Wikipedia, it may perform inconsistently on many genres, such as social media text.

Installation

pip install spacy-nightly
spacy download xx_ent_wiki_sm