Skip to content

Python v0.6.0

Compare
Choose a tag to compare
@n1t0 n1t0 released this 02 Mar 20:03

Changes:

  • Big improvements in speed for BPE (Both training and tokenization) (#165)

Fixes:

  • Some default tokens were missing from BertWordPieceTokenizer (cf #160)
  • There was a bug in ByteLevel PreTokenizer that caused offsets to be wrong if a char got split up
    in multiple bytes. (cf #156)
  • The longest_first truncation strategy had a bug (#174)