Skip to content

Releases: jawah/charset_normalizer

Charset Normalizer

23 Sep 13:02
5abfb83
Compare
Choose a tag to compare

Changes :

  • Bugfix : from_bytes parameters steps and chunk_size were not adapted to sequence len if provided values were not fitted to content. Therefore could lead to misdetection on small content.

Charset Normalizer

21 Sep 16:17
Compare
Choose a tag to compare

Changes :

  • Bugfix : Sequence having lenght bellow 10 chars was not checked by ProbeChaos at all. (#14)
  • Bugfix : Legacy detect method inspired by chardet was not returning intended result when having no result. (#14)

Charset Normalizer

17 Sep 17:17
d3996ce
Compare
Choose a tag to compare
Release 1.0.0 (#11)

* Adjustement in frequencies.json about Chinese

Remove latin based char in it

* Added the possibility to list encoding aliases for a match

Encoding name are known by many name, using this could help when searching for IBM855 when it's listed as CP855.

* Added submatch in match

list of submatch that produce the EXACT same output as a match

* Changes in docs

+ comment unused code.

* Add param in doc ProbeChaos giveup_threshold

* Doc improvement in unicode.py

* Add static method list_by_range in unicode.py

Sort letters by unicode range in a dict

* ProbeCoherence reliability improved 

Can now probe & sort by alphabet used or unicode range.

* Added coherence_non_latin method in NormalizerMatch

Verify if a non latin based language got verified by probe coherence

* CLI is now more verbose

* More tests, yay !

* bump 1.0.0

* readme upd8

Charset Normalizer

12 Sep 17:09
6009bf8
Compare
Choose a tag to compare
  • Improvement on detection
  • Performance loss to expect
  • Added --threshold option to CLI

Charset Normalizer

06 Sep 20:23
Compare
Choose a tag to compare
  • Bugfix on UTF 7 support
  • Legacy detect(byte_str) method

Charset Normalizer

04 Sep 17:14
Compare
Choose a tag to compare

RC 5

  • BOM support (Unicode mostly)
  • Chaos prober improved on small text

Charset Normalizer

03 Sep 20:12
Compare
Choose a tag to compare

RC 4

  • Probe Chaos: Code cleanup, performance review and accuracy improved

Charset Normalizer

31 Aug 16:12
Compare
Choose a tag to compare

RC 3

  • Language detection has been reviewed to give better result
  • Bugfix on jp detection, every jp text was considered chaotic

Charset Normalizer

28 Aug 10:38
Compare
Choose a tag to compare

RC 2

  • Fixes #3 🎉 First PR
  • Close file after reading them in CLI mode

Charset Normalizer RC

27 Aug 17:45
Compare
Choose a tag to compare

🍰 First RC !