Skip to content

v3.0.7: Bug fixes and base support for Azerbaijani

Compare
Choose a tag to compare
@adrianeboyd adrianeboyd released this 23 Jul 08:37
· 1674 commits to master since this release
034ac0a

✨ New features and improvements

  • Alpha tokenization support for Azerbaijani.
  • Updates for French stop words.

🔴 Bug fixes

  • Fix issue #7629: Fix scoring normalization.
  • Fix issue #7886: Fix unknown tokens percentage in debug data.
  • Fix issue #7907: Update load_lookups return type and docstring.
  • Fix issue #7930: Make EntityLinker robust for nO=None.
  • Fix issue #7925: Skip vector ngram backoff if minn is not set.
  • Fix issue #7973: Fix debug model for transformers.
  • Fix issue #7988: Preserve existing ENT_KB_ID in ner annotation.
  • Fix issue #7992: Fix span offsets for Matcher(as_spans) on spans.
  • Fix issue #8004: Handle errors while multiprocessing.
  • Fix issue #8009: Fix Doc.from_docs() for all empty docs.
  • Fix issue #8012: Fix ensemble textcat with listener.
  • Fix issue #8054: Add ENT_ID and NORM to DocBin strings.
  • Fix issue #8055: Handle partial entities in Span.as_doc.
  • Fix issue #8062: Make all Span attrs writable.
  • Fix issue #8066: Update debug data for textcat.
  • Fix issue #8069: Custom warning if DocBin is too large.
  • Fix issue #8113: Support to/from_bytes for KnowledgeBase and EntityLinker.
  • Fix issue #8116: Fix offsets in Span.get_lca_matrix.
  • Fix issue #8132: Remove unsupported attrs from attrs.IDS.
  • Fix issue #8158: Ensure tolerance is passed on in spacy.batch_by_words.v1.
  • Fix issue #8169: Fix bug from EntityRuler: ent_ids returns None for phrases.
  • Fix issue #8208: Address missing config overrides post load of models.
  • Fix issue #8212: Add all symbols in Unicode Currency Symbols to currency characters.
  • Fix issue #8216: Don't add duplicate patterns in EntityRuler.
  • Fix issue #8244: Use context manager when reading model file.
  • Fix issue #8245: Fix other open calls without context managers.
  • Fix issue #8265: Address mypy errors.
  • Fix issue #8299: Restrict pymorphy2 requirement to pymorphy2 mode in Russian and Ukrainian lemmatizers.
  • Fix issue #8335: Raise error if deps not provided with heads in Doc.
  • Fix issue #8368: Preserve whitespace in Span.lemma_.
  • Fix issue #8396: Make JsonlReader path optional.
  • Fix issue #8421: Fix non-deterministic deduplication in Greek lemmatizer.
  • Fix issue #8423: Update validate CLI to fix compat and ignore warnings.
  • Fix issue #8426: Fix setting empty entities in Example.from_dict.
  • Fix issue #8487: Fix span offsets and keys in Doc.from_docs.
  • Fix issue #8584: Raise an error for textcat with <2 labels.
  • Fix issue #8551: Fix duplicate spacy package CLI opts.

👥 Contributors

@adrianeboyd, @bodak, @bryant1410, @dhruvrnaik, @fhopp, @frascuchon, @graue70, @ines, @jenojp, @jhroy, @jklaise, @juliensalinas, @meghanabhange, @michael-k, @narayanacharya6, @polm, @sevdimali, @svlandeg, @ZeeD