Skip to content

v2.1.4: Training improvements and bug fixes

Compare
Choose a tag to compare
@ines ines released this 11 May 22:08
· 6129 commits to master since this release

✨ New features and improvements

  • NEW: util.filter_spans helper to filter duplicates and overlaps from a list of Span objects.
  • Improve language data for Thai, Japanese, Indonesian and Dutch.
  • Add --n-save-every to spacy pretrain and rename --nr-iter to --n-iter for consistency.
  • Add --return-scores flag to spacy evaluate to return a dict.
  • Add --n-early-stopping option to spacy train to define maximum number of iterations without dev accuracy improvements.

🔴 Bug fixes

  • Fix issue #3307: Fix symlink creation to show error on Windows.
  • Fix issue #3473: Fix GPU training for text classification.
  • Fix issue #3475: Change favicon.
  • Fix issue #3482: Add Estonian base support to documentation.
  • Fix issue #3484: Ensure lemmatization is always consistent between sessions.
  • Fix issue #3521: Add variations of contractions to English stop words.
  • Fix issue #3523: Make spacy convert correctly default to json.
  • Fix issue #3525, #3551, #3572: Fix problem that'd cause lemmas to not be lowercase.
  • Fix issue #3531: Don't make "settings" or "title" required in displaCy data.
  • Fix issue #3533: Remove non-existent example from docs.
  • Fix issue #3546: Make sure path in GoldParse.__del__ is a string.
  • Fix issue #3549: Ensure match pattern error isn't raised on empty errors list.
  • Fix issue #3561: Fix DependencyParser.predict docs.
  • Fix issue #3598: Allow jupyter=False to override Jupyter mode in displacy.
  • Fix issue #3620: Fix bug in .iob converter.
  • Fix issue #3628: Relax jsonschema pin.
  • Fix issue #3667: Fix offset bug in loading pre-trained word2vec.
  • Fix issue #3679: Update glossary to include missing labels in spacy.explain.
  • Fix issue #3680: Re-add missing universe README.
  • Fix issue #3681: Rewrite information extraction example to use Doc.retokenize.
  • Fix issue #3692: Fix return value in Language.update docs.
  • Fix issue #3694: Make "text" in spacy pretrain optional when "tokens" is provided.
  • Fix issue #3701: Improve Token.prob and Lexeme.prob docs.
  • Fix issue #3708: Fix error in regex matcher examples.
  • Fix issue #3713: Call rmtree and copytree with strings in spacy train.
  • Fix issue #3720: Add version tag to --base-model argument in spacy train docs.

📖 Documentation and examples

👥 Contributors

Thanks to @svlandeg, @wannaphongcom, @Bharat123rox, @DuyguA, @SamuelLKane, @graus, @HiromuHota, @jeannefukumaru, @ivigamberdiev, @socool, @yvespeirsman, @lemontheme, @Dobita21, @w4nderlust, @pierremonico, @bryant1410, @celikomer, @xssChauhan, @kowaalczyk, @BreakBB, @fizban99, @tokestermw, @bjascob, @pickfire, @yaph, @amitness, @henry860916, @d5555, @BramVanroy, @F0rge1cE, @richardpaulhudson, @ldorigo, @aaronkub and @devforfu for the pull requests and contributions.