Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
🐛 Legacy detect should return UTF-8-SIG if sig is detected (#38)
In order to be as close as possible to chardet behavior regarding the detect function, this PR adjusts the return value when the actual given content contains the SIG (utf-8). This minor fix is related to the possible integration of this lib to requests. see psf/requests#5797 Why does Charset-Normalizer return 'utf-8' instead of 'utf-8-sig'? Here are the main reasons : The SIG is actually not very useful, and not widely recognized even by the Unicode consortium. The detection/normalization process does strip the sig but takes it into account.
- Loading branch information