v2.1.4: Training improvements and bug fixes
✨ New features and improvements
- NEW:
util.filter_spans
helper to filter duplicates and overlaps from a list ofSpan
objects. - Improve language data for Thai, Japanese, Indonesian and Dutch.
- Add
--n-save-every
tospacy pretrain
and rename--nr-iter
to--n-iter
for consistency. - Add
--return-scores
flag tospacy evaluate
to return a dict. - Add
--n-early-stopping
option tospacy train
to define maximum number of iterations without dev accuracy improvements.
🔴 Bug fixes
- Fix issue #3307: Fix symlink creation to show error on Windows.
- Fix issue #3473: Fix GPU training for text classification.
- Fix issue #3475: Change favicon.
- Fix issue #3482: Add Estonian base support to documentation.
- Fix issue #3484: Ensure lemmatization is always consistent between sessions.
- Fix issue #3521: Add variations of contractions to English stop words.
- Fix issue #3523: Make
spacy convert
correctly default tojson
. - Fix issue #3525, #3551, #3572: Fix problem that'd cause lemmas to not be lowercase.
- Fix issue #3531: Don't make
"settings"
or"title"
required in displaCy data. - Fix issue #3533: Remove non-existent example from docs.
- Fix issue #3546: Make sure path in
GoldParse.__del__
is a string. - Fix issue #3549: Ensure match pattern error isn't raised on empty errors list.
- Fix issue #3561: Fix
DependencyParser.predict
docs. - Fix issue #3598: Allow
jupyter=False
to override Jupyter mode indisplacy
. - Fix issue #3620: Fix bug in
.iob
converter. - Fix issue #3628: Relax
jsonschema
pin. - Fix issue #3667: Fix offset bug in loading pre-trained word2vec.
- Fix issue #3679: Update glossary to include missing labels in
spacy.explain
. - Fix issue #3680: Re-add missing universe README.
- Fix issue #3681: Rewrite information extraction example to use
Doc.retokenize
. - Fix issue #3692: Fix return value in
Language.update
docs. - Fix issue #3694: Make
"text"
inspacy pretrain
optional when"tokens"
is provided. - Fix issue #3701: Improve
Token.prob
andLexeme.prob
docs. - Fix issue #3708: Fix error in regex matcher examples.
- Fix issue #3713: Call
rmtree
andcopytree
with strings inspacy train
. - Fix issue #3720: Add version tag to
--base-model
argument inspacy train
docs.
📖 Documentation and examples
- Add free interactive spaCy course.
- Fix various typos and inconsistencies.
- Add new projects to the spaCy universe.
👥 Contributors
Thanks to @svlandeg, @wannaphongcom, @Bharat123rox, @DuyguA, @SamuelLKane, @graus, @HiromuHota, @jeannefukumaru, @ivigamberdiev, @socool, @yvespeirsman, @lemontheme, @Dobita21, @w4nderlust, @pierremonico, @bryant1410, @celikomer, @xssChauhan, @kowaalczyk, @BreakBB, @fizban99, @tokestermw, @bjascob, @pickfire, @yaph, @amitness, @henry860916, @d5555, @BramVanroy, @F0rge1cE, @richardpaulhudson, @ldorigo, @aaronkub and @devforfu for the pull requests and contributions.