Skip to content

Commit

Permalink
Unicode: add elaboration into comments for valid_utf8()
Browse files Browse the repository at this point in the history
  • Loading branch information
AlekseyCherepanov committed Sep 21, 2024
1 parent 31413a7 commit 22aae83
Showing 1 changed file with 7 additions and 2 deletions.
9 changes: 7 additions & 2 deletions src/unicode.h
Original file line number Diff line number Diff line change
Expand Up @@ -232,8 +232,9 @@ extern void truncate_utf8(UTF8 *string, int len);
* returns > 1 if data is valid and in fact contains UTF-8 sequences
*
* Actually in the last case, the return is the number of proper UTF-8
* sequences, so it can be used as a quality measure. A low number might be
* a false positive, a high number most probably isn't.
* sequences (plus one and not counting ASCII), so it can be used as a
* quality measure. A low number might be a false positive, a high
* number most probably isn't.
*
*
* Related info about UTF-8
Expand Down Expand Up @@ -275,6 +276,10 @@ extern void truncate_utf8(UTF8 *string, int len);
* invalid for purposes of valid_utf8()).
*
* See also https://en.wikipedia.org/wiki/UTF-8#Codepage_layout
*
* Sequences for unallocated, unassigned, reserved (including
* noncharecters) code points are considered valid. See here:
* https://en.wikipedia.org/wiki/Universal_Character_Set_characters#Noncharacters
*/
extern int valid_utf8(const UTF8 *source);

Expand Down

0 comments on commit 22aae83

Please sign in to comment.