Releases: jawah/charset_normalizer
Releases · jawah/charset_normalizer
Charset Normalizer
Changes :
- Bugfix :
from_bytes
parameters steps and chunk_size were not adapted to sequence len if provided values were not fitted to content. Therefore could lead to misdetection on small content.
Charset Normalizer
Charset Normalizer
Release 1.0.0 (#11) * Adjustement in frequencies.json about Chinese Remove latin based char in it * Added the possibility to list encoding aliases for a match Encoding name are known by many name, using this could help when searching for IBM855 when it's listed as CP855. * Added submatch in match list of submatch that produce the EXACT same output as a match * Changes in docs + comment unused code. * Add param in doc ProbeChaos giveup_threshold * Doc improvement in unicode.py * Add static method list_by_range in unicode.py Sort letters by unicode range in a dict * ProbeCoherence reliability improved Can now probe & sort by alphabet used or unicode range. * Added coherence_non_latin method in NormalizerMatch Verify if a non latin based language got verified by probe coherence * CLI is now more verbose * More tests, yay ! * bump 1.0.0 * readme upd8
Charset Normalizer
- Improvement on detection
- Performance loss to expect
- Added --threshold option to CLI
Charset Normalizer
- Bugfix on UTF 7 support
- Legacy detect(byte_str) method
Charset Normalizer
RC 5
- BOM support (Unicode mostly)
- Chaos prober improved on small text
Charset Normalizer
RC 4
- Probe Chaos: Code cleanup, performance review and accuracy improved
Charset Normalizer
RC 3
- Language detection has been reviewed to give better result
- Bugfix on jp detection, every jp text was considered chaotic
Charset Normalizer
RC 2
- Fixes #3 🎉 First PR
- Close file after reading them in CLI mode
Charset Normalizer RC
🍰 First RC !