Skip to content
forked from chardet/chardet

Python 2/3 compatible character encoding detector.

License

Notifications You must be signed in to change notification settings

zougloub/chardet

This branch is 2 commits ahead of, 295 commits behind chardet/chardet:main.

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ab176dc · Oct 3, 2014

History

93 Commits
Oct 3, 2014
Dec 16, 2013
Dec 16, 2013
Dec 15, 2013
Nov 29, 2012
Feb 4, 2013
Nov 28, 2012
Dec 16, 2013
Feb 4, 2013
Mar 27, 2014
Dec 16, 2013
Dec 18, 2013
Dec 15, 2013

Repository files navigation

Chardet: The Universal Character Encoding Detector

Detects
  • ASCII, UTF-8, UTF-16 (2 variants), UTF-32 (4 variants)
  • Big5, GB2312, EUC-TW, HZ-GB-2312, ISO-2022-CN (Traditional and Simplified Chinese)
  • EUC-JP, SHIFT_JIS, ISO-2022-JP (Japanese)
  • EUC-KR, ISO-2022-KR (Korean)
  • KOI8-R, MacCyrillic, IBM855, IBM866, ISO-8859-5, windows-1251 (Cyrillic)
  • ISO-8859-2, windows-1250 (Hungarian)
  • ISO-8859-5, windows-1251 (Bulgarian)
  • windows-1252 (English)
  • ISO-8859-7, windows-1253 (Greek)
  • ISO-8859-8, windows-1255 (Visual and Logical Hebrew)
  • TIS-620 (Thai)

Requires Python 2.6 or later

Installation

Install from PyPI:

pip install chardet

Command-line Tool

chardet comes with a command-line script which reports on the encodings of one or more files:

% chardetect somefile someotherfile
somefile: windows-1252 with confidence 0.5
someotherfile: ascii with confidence 1.0

About

This is a continuation of Mark Pilgrim's excellent chardet. Previously, two versions needed to be maintained: one that supported python 2.x and one that supported python 3.x. We've recently merged with Ian Cordasco's charade fork, so now we have one coherent version that works for Python 2.6+.

maintainer:Dan Blanchard

About

Python 2/3 compatible character encoding detector.

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages

  • Python 98.9%
  • CSS 1.1%