SPDX-FileCopyrightText | SPDX-License-Identifier |
---|---|
2024 PyThaiNLP Project |
Apache-2.0 |
Notable changes between versions.
- For full release notes, see: https://github.com/PyThaiNLP/pythainlp/releases
- For detailed commit changes, see: https://github.com/PyThaiNLP/pythainlp/compare/v5.0.4...dev (select tags to compare)
- Add Thai Discourse Treebank postag #910
- Add Thai Universal Dependency Treebank postag #916
- Add Thai G2P v2 Grapheme-to-Phoneme model #923
- Add support for list of strings as input to sent_tokenize() #927
- Add pythainlp.tools.safe_print to handle UnicodeEncodeError on console #969
- Fix collate() to consider tonemark in ordering #926
- Add clause_tokenize warnings #1026
- Fix maiyamok() that expanding the wrong word #962
- Fix: pythainlp.util.maiyamok does not duplicate words when more than one Maiyamok is used #917
- Fix: empty string ('') added when using word_tokenize with join_broken_num=True #912
- Fix: crfcut: Ensure splitting of sentences using terminal punctuation #905
- Fix: delay calling syllable_tokenize to avoid pycrfsuite ImportError #901