Skip to content

v2.2.4: Alpha support for Yoruba and Basque, language data improvements and lots of bug fixes

Compare
Choose a tag to compare
@ines ines released this 12 Mar 13:40
· 4979 commits to master since this release

✨ New features and improvements

  • NEW: Add Span.char_span method.
  • NEW: Base language support for Yoruba and Basque.
  • NEW: Add --tag-map-path argument to debug-data and train commands.
  • NEW Add add_lemma option to displacy dependency visualizer.
  • Add IDX as an attribute available via Doc.to_array.
  • Improve speed of adding large number of patterns to EntityRuler.
  • Replace python-mecab3 with fugashi for Japanese.
  • Improve language data for Norwegian, Luxembourgish, Finnish, Slovak, Romanian, Greek and German.

🔴 Bug fixes

  • Fix issue #3979, #4819, #4871: Add tok2vec parameters to train command.
  • Fix issue #4009: Fix use of pretrained vectors in text classifier.
  • Fix issue #4342: Improve CLI training with base model.
  • Fix issue #4432: Add destructors for states in TransitionSystem.
  • Fix issue #4440: Require HEAD for is_parsed in Doc.from_array.
  • Fix issue #4615: Update SHAPE docs and examples.
  • Fix issue #4665: Allow HEAD field in CoNLL-U format to be an underscore.
  • Fix issue #4673: Ensure correct array module is used when returning a vector via Vocab.
  • Fix issue #4674: Make set_entities in the KnowledgeBase more robust.
  • Fix issue #4677: Add missing tags to tag maps for el, es and pt.
  • Fix issue #4688: Iterate over lr_edges until Doc.sents are correct.
  • Fix issue #4703, #4823: Facilitate large training files.
  • Fix issue #4707: Auto-exclude disabled when calling from_disk during load.
  • Fix issue #4717: Fix int value handling in Matcher.
  • Fix issue #4719: Add message when cli train script throws exception.
  • Fix issue #4723: Update EntityLinker example.
  • Fix issue #4725: Take care of global vectors in multiprocessing.
  • Fix issue #4770: Include Doc.cats in serialization of Doc and DocBin.
  • Fix issue #4772: Fix bug in EntityLinker.predict.
  • Fix issue #4777: Fix link to user hooks in documentation.
  • Fix issue #4829: Update build dependencies in pyproject.toml.
  • Fix issue #4830: Warn for punctuation in entities when training with noise.
  • Fix issue #4833: Make example scripts work with transformer starter models.
  • Fix issue #4849: Fix serialization of ENT_ID.
  • Fix issue #4862: Fix and improve URL pattern.
  • Fix issue #4868: Include .pyx and .pxd files in the distribution.
  • Fix issue #4876: Add friendlier error to entity linking example script.
  • Fix issue #4903: Fix handling of custom underscore attributes during multiprocessing.
  • Fix issue #4924: Fix handling of empty docs or golds in Language.evaluate.
  • Fix issue #4934: Prevent updating component config if the Model was already defined.
  • Fix issue #4935: Fix Sentencizer.pipe for empty Doc.
  • Fix issue #4961: Remove old docs section links.
  • Fix issue #4965: Sync Span.__eq__ and Span.__hash__.
  • Fix issue #4975: Adjust srsly pin.
  • Fix issue #5048: Fix behavior of get_doc test utility.
  • Fix issue #5073: Normalize IS_SENT_START to SENT_START for Matcher.
  • Fix issue #5075: Make it impossible to create invalid heads with Doc.from_array.
  • Fix issue #5082: Correctly set vector of merged span in merge_entities.
  • Fix issue #5115: Ensure paths in Tokenizer.to_disk and Tokenizer.from_disk.
  • Fix issue #5117: Clarify behavior of Doc.is_ flags for empty Docs.

📖 Documentation and examples

  • Fix various typos and inconsistencies.
  • Add new projects to the spaCy Universe.

👥 Contributors

Thanks to @polm, @mmaybeno, @jarib, @questoph, @aajanki, @mr-bjerre, @Tclack88, @thiagola92, @tamuhey, @Olamyy, @AlJohri, @iechevarria, @iurshina, @lineality, @pbadeer, @BramVanroy, @kabirkhan, @ceteri, @omri374, @maknotavailable, @onlyanegg, @drndos, @ju-sh, @nlptechbook, @chkoar, @Jan-711, @MisterKeefe, @bryant1410, @mirfan899, @dhpollack and @mabraham for the pull requests and contributions!