Skip to content

Releases: explosion/spacy-models

pt_core_news_sm-2.1.0a6

21 Jan 17:49
9d44cac
Compare
Choose a tag to compare
Pre-release

Details: https://spacy.io/models/pt#pt_core_news_sm

File checksum: 306516b5b761ce7a20c6adb719a570b0b3d432e35f188fca06be9f1a7f42406d

Portuguese multi-task CNN trained on the Universal Dependencies and WikiNER corpus. Assigns context-specific token vectors, POS tags, dependency parse and named entities. Supports identification of PER, LOC, ORG and MISC entities.

Feature Description
Name pt_core_news_sm
Version 2.1.0a6
spaCy >=2.1.0a4
Model size 12 MB
Pipeline  tagger, parser, ner
Vectors 0 keys, 0 unique vectors (0 dimensions)
Sources Universal Dependencies, Wikipedia
License CC BY-SA 4.0
Author Explosion AI

Accuracy

Type Score
ENTS_F  89.14
ENTS_P  89.23
ENTS_R  89.04
LAS  86.02
TAGS_ACC  80.44
TOKEN_ACC  100.00
UAS  89.36

Because the model is trained on Wikipedia, it may perform inconsistently on many genres, such as social media text. The NER accuracy refers to the "silver standard" annotations in the WikiNER corpus. Accuracy on these annotations tends to be higher than correct human annotations.

Installation

pip install spacy-nightly
spacy download pt_core_news_sm

nl_core_news_sm-2.1.0a6

21 Jan 17:50
9d44cac
Compare
Choose a tag to compare
Pre-release

Details: https://spacy.io/models/nl#nl_core_news_sm

File checksum: 8eb8bf0133694bfa28a6f27dcc44c178b15ed9c08f38f54e7a2cb351e7618c7d

Dutch multi-task CNN trained on the Universal Dependencies and WikiNER corpus. Assigns context-specific token vectors, POS tags, dependency parse and named entities. Supports identification of PER, LOC, ORG and MISC entities.

Feature Description
Name nl_core_news_sm
Version 2.1.0a6
spaCy >=2.1.0a4
Model size 10 MB
Pipeline  tagger, parser, ner
Vectors 0 keys, 0 unique vectors (0 dimensions)
Sources Universal Dependencies, Wikipedia
License CC BY-SA 4.0
Author Explosion AI

Accuracy

Type Score
ENTS_F  87.05
ENTS_P  86.56
ENTS_R  87.54
LAS  77.56
TAGS_ACC  91.47
TOKEN_ACC  100.00
UAS  83.72

Because the model is trained on Wikipedia, it may perform inconsistently on many genres, such as social media text. The NER accuracy refers to the "silver standard" annotations in the WikiNER corpus. Accuracy on these annotations tends to be higher than correct human annotations.

Installation

pip install spacy-nightly
spacy download nl_core_news_sm

it_core_news_sm-2.1.0a6

21 Jan 17:50
9d44cac
Compare
Choose a tag to compare
Pre-release

Details: https://spacy.io/models/it#it_core_news_sm

File checksum: 537c0c85d112a8f5d9c2e1d11049b273839767182eb2e085baebebb73843fe32

Italian multi-task CNN trained on the Universal Dependencies and WikiNER corpus. Assigns context-specific token vectors, POS tags, dependency parse and named entities. Supports identification of PER, LOC, ORG and MISC entities.

Feature Description
Name it_core_news_sm
Version 2.1.0a6
spaCy >=2.1.0a4
Model size 10 MB
Pipeline  tagger, parser, ner
Vectors 0 keys, 0 unique vectors (0 dimensions)
Sources Universal Dependencies, Wikipedia
License CC BY-NC-SA 3.0
Author Explosion AI

Accuracy

Type Score
ENTS_F  86.41
ENTS_P  86.63
ENTS_R  86.18
LAS  87.18
TAGS_ACC  95.91
TOKEN_ACC  100.00
UAS  90.93

Because the model is trained on Wikipedia, it may perform inconsistently on many genres, such as social media text. The NER accuracy refers to the "silver standard" annotations in the WikiNER corpus. Accuracy on these annotations tends to be higher than correct human annotations.

Installation

pip install spacy-nightly
spacy download it_core_news_sm

fr_core_news_sm-2.1.0a6

21 Jan 17:50
9d44cac
Compare
Choose a tag to compare
Pre-release

Details: https://spacy.io/models/fr#fr_core_news_sm

File checksum: db806c0e640d4ac9c11471461e2884e0a8c2fa91ddbacf0dea078c018afb79a9

French multi-task CNN trained on the French Sequoia (Universal Dependencies) and WikiNER corpus. Assigns context-specific token vectors, POS tags, dependency parse and named entities. Supports identification of PER, LOC, ORG and MISC entities.

Feature Description
Name fr_core_news_sm
Version 2.1.0a6
spaCy >=2.1.0a4
Model size 12 MB
Pipeline  tagger, parser, ner
Vectors 0 keys, 0 unique vectors (0 dimensions)
Sources Sequoia Corpus (UD), Wikipedia
License LGPL
Author Explosion AI

Accuracy

Type Score
ENTS_F  82.87
ENTS_P  82.97
ENTS_R  82.77
LAS  84.76
TAGS_ACC  94.54
TOKEN_ACC  100.00
UAS  87.67

Because the model is trained on Wikipedia, it may perform inconsistently on many genres, such as social media text. The NER accuracy refers to the "silver standard" annotations in the WikiNER corpus. Accuracy on these annotations tends to be higher than correct human annotations.

Installation

pip install spacy-nightly
spacy download fr_core_news_sm

fr_core_news_md-2.1.0a6

21 Jan 17:50
9d44cac
Compare
Choose a tag to compare
Pre-release

Details: https://spacy.io/models/fr#fr_core_news_md

File checksum: b2b76fc3f3313b7492f15b6339191419095379289ccbbbc14c67f7991efcf13e

French multi-task CNN trained on the French Sequoia (Universal Dependencies) and WikiNER corpus. Assigns context-specific token vectors, POS tags, dependency parse and named entities. Supports identification of PER, LOC, ORG and MISC entities.

Feature Description
Name fr_core_news_md
Version 2.1.0a6
spaCy >=2.1.0a4
Model size 81 MB
Pipeline  tagger, parser, ner
Vectors 579447 keys, 20000 unique vectors (300 dimensions)
Sources Sequoia Corpus (UD), Wikipedia
License LGPL
Author Explosion AI

Accuracy

Type Score
ENTS_F  83.36
ENTS_P  83.48
ENTS_R  83.25
LAS  86.48
TAGS_ACC  95.12
TOKEN_ACC  100.00
UAS  89.14

Because the model is trained on Wikipedia, it may perform inconsistently on many genres, such as social media text. The NER accuracy refers to the "silver standard" annotations in the WikiNER corpus. Accuracy on these annotations tends to be higher than correct human annotations.

Installation

pip install spacy-nightly
spacy download fr_core_news_md

es_core_news_sm-2.1.0a6

21 Jan 17:50
9d44cac
Compare
Choose a tag to compare
Pre-release

Details: https://spacy.io/models/es#es_core_news_sm

File checksum: 56473ffbdb1bd125881681a161c5a3bbbd13f5e76b3fbc5bf4e2b09adf541615

Spanish multi-task CNN trained on the AnCora and WikiNER corpus. Assigns context-specific token vectors, POS tags, dependency parse and named entities. Supports identification of PER, LOC, ORG and MISC entities.

Feature Description
Name es_core_news_sm
Version 2.1.0a6
spaCy >=2.1.0a4
Model size 10 MB
Pipeline  tagger, parser, ner
Vectors 0 keys, 0 unique vectors (0 dimensions)
Sources AnCora, Wikipedia
License GPL
Author Explosion AI

Accuracy

Type Score
ENTS_F  88.98
ENTS_P  89.06
ENTS_R  88.90
LAS  87.28
TAGS_ACC  97.03
TOKEN_ACC  100.00
UAS  90.33

Because the model is trained on Wikipedia, it may perform inconsistently on many genres, such as social media text. The NER accuracy refers to the "silver standard" annotations in the WikiNER corpus. Accuracy on these annotations tends to be higher than correct human annotations.

Installation

pip install spacy-nightly
spacy download es_core_news_sm

es_core_news_md-2.1.0a6

21 Jan 17:50
9d44cac
Compare
Choose a tag to compare
Pre-release

Details: https://spacy.io/models/es#es_core_news_md

File checksum: 77c0a1b9ebd2cf32644bf1c592c0b1d14041a0202f7a7ef8d0be3b68afa44519

Spanish multi-task CNN trained on the AnCora and WikiNER corpus. Assigns context-specific token vectors, POS tags, dependency parse and named entities. Supports identification of PER, LOC, ORG and MISC entities.

Feature Description
Name es_core_news_md
Version 2.1.0a6
spaCy >=2.1.0a4
Model size 69 MB
Pipeline  tagger, parser, ner
Vectors 533736 keys, 20000 unique vectors (50 dimensions)
Sources AnCora, Wikipedia
License GPL
Author Explosion AI

Accuracy

Type Score
ENTS_F  89.30
ENTS_P  89.42
ENTS_R  89.19
LAS  88.06
TAGS_ACC  97.18
TOKEN_ACC  100.00
UAS  90.87

Because the model is trained on Wikipedia, it may perform inconsistently on many genres, such as social media text. The NER accuracy refers to the "silver standard" annotations in the WikiNER corpus. Accuracy on these annotations tends to be higher than correct human annotations.

Installation

pip install spacy-nightly
spacy download es_core_news_md

en_core_web_sm-2.1.0a6

21 Jan 17:50
9d44cac
Compare
Choose a tag to compare
Pre-release

Details: https://spacy.io/models/en#en_core_web_sm

File checksum: 927785b2aabb43d888437295a11b071798570dbd8c67cf80c611bc1c6927898c

English multi-task CNN trained on OntoNotes. Assigns context-specific token vectors, POS tags, dependency parse and named entities.

Feature Description
Name en_core_web_sm
Version 2.1.0a6
spaCy >=2.1.0a4
Model size 10 MB
Pipeline  tagger, parser, ner
Vectors 0 keys, 0 unique vectors (0 dimensions)
Sources OntoNotes 5
License MIT
Author Explosion AI

Accuracy

Type Score
ENTS_F  85.49
ENTS_P  85.66
ENTS_R  85.33
LAS  89.64
TAGS_ACC  96.80
TOKEN_ACC  99.06
UAS  91.53

Installation

pip install spacy-nightly
spacy download en_core_web_sm

en_core_web_md-2.1.0a6

21 Jan 17:50
9d44cac
Compare
Choose a tag to compare
Pre-release

Details: https://spacy.io/models/en#en_core_web_md

File checksum: ea971369a13056cee2bddaaf1c5b342b16bc0a0f45228abde4b4c4635f469f1f

English multi-task CNN trained on OntoNotes, with GloVe vectors trained on Common Crawl. Assigns word vectors, context-specific token vectors, POS tags, dependency parse and named entities.

Feature Description
Name en_core_web_md
Version 2.1.0a6
spaCy >=2.1.0a4
Model size 91 MB
Pipeline  tagger, parser, ner
Vectors 684830 keys, 20000 unique vectors (300 dimensions)
Sources OntoNotes 5, Common Crawl
License MIT
Author Explosion AI

Accuracy

Type Score
ENTS_F  86.40
ENTS_P  86.50
ENTS_R  86.30
LAS  90.16
TAGS_ACC  96.96
TOKEN_ACC  99.06
UAS  91.94

Installation

pip install spacy-nightly
spacy download en_core_web_md

en_core_web_lg-2.1.0a6

21 Jan 17:50
9d44cac
Compare
Choose a tag to compare
Pre-release

Details: https://spacy.io/models/en#en_core_web_lg

File checksum: 6ee2325f253b8f74693c07311071eab99e504acfc37f8da7a6a88a53fb0496f9

English multi-task CNN trained on OntoNotes, with GloVe vectors trained on Common Crawl. Assigns word vectors, context-specific token vectors, POS tags, dependency parse and named entities.

Feature Description
Name en_core_web_lg
Version 2.1.0a6
spaCy >=2.1.0a4
Model size 788 MB
Pipeline  tagger, parser, ner
Vectors 684830 keys, 684831 unique vectors (300 dimensions)
Sources OntoNotes 5, Common Crawl
License MIT
Author Explosion AI

Accuracy

Type Score
ENTS_F  86.62
ENTS_P  86.68
ENTS_R  86.57
LAS  90.20
TAGS_ACC  97.02
TOKEN_ACC  99.06
UAS  91.97

Installation

pip install spacy-nightly
spacy download en_core_web_lg