-
Notifications
You must be signed in to change notification settings - Fork 85
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
missing documentation for several files #99
Comments
I don't know what this is but it seems to be related to this: http://en.glyphwiki.org/wiki/Group:%E6%B1%8E%E7%94%A8%E9%9B%BB%E5%AD%90-40
It gives the origin of the characters or something, I'm not sure why it is in that order.
Should be
I guess with enough puzzling you could work out what they all do. It looks more like some of the computer programs which reformat these files are missing, rather than the documentation. |
Hanyo = Hanyo Denshi From the auto-translation below (from http://kanji-database.sourceforge.net/ids/ids.html), these appear to be characters and variants of non-Chinese (i.e. Japanese) origin:
The first column are Hanyo codes, which correspond to Unicode codes ('points') in the first column of the second link below. When using the Unicode codes to look up the characters, they do appear to be less common/Japanese variants. See also: https://www.unicode.org/ivd/hanyo-denshi/ |
CDP = Chinese Document Processing (CDP) database developped by C.C. Hsieh and his team at Academia Sinica in Taipei, Taiwan. CHISE-IDS and CDP appear to be similar projects, though CDP has Chinese characters only, while CHISE also has other scripts, e.g. Japanese. When representing the breakdown of a character, it appears the CDP file shows, where, partial characters from the CDP project can be used instead. See: https://www.freedesktop.org/wiki/Software/CJKUnifonts/Resources/Tutorial/ You can look up CDP characters at https://glyphwiki.org , e.g. See also: |
ids-ext-cdef As the Unicode character database has expanded, less common characters have been added in stages, as extension blocks. The ids file has breakdowns of all (?) the characters in the database. ids-ext-cdef has this only the additional characters found in blocks C, D, E and F. See also: |
Ws2015-ids.txt The Ideographic Description Sequences (IDS) for Wasei kango, which are "Japanese-made Chinese words". GHZR codes correspond to dictionary codes from the GHZR dictionary (Google translation):
|
ids.txt = Ideographic Description Sequences
They were originally intended as part of the Unicode project, to describe characters (being very many) that had not yet been encoded. As the official documentation has noted, they are also useful for learning.
They are subjective. The 12 characters below are called Ideographic Description Characters From: See also: |
IDS-analysis.txt The second column is the character, and the first column is its Unicode. The third column gives the ideographic description, same as for ids.txt, or else, a semantic variant of the character. Some characters have both, on separate lines, e.g.:
The fourth column gives 六書 'six scripts' etymology (phonosemantic, ideograph, etc.) or other information such as speech parts (e.g. noun, adverb), written in Japanese. The numbers in the fifth column correspond to the Shuowen Jiezi, an ancient dictionary, as indexed in the "Daxu edition by Zhonghua Book Company" They numbers correspond to characters in this file: See: (Recommend using Google-translate, or a similar service) |
Wasei-kanji-ids.txt = "Japanese-made kanji dictionary / IDS data" The numbers in this file correspond to Japanese kanji characters not yet encoded. By altering the link below, in particular the number at the end, you can look them up. See: |
General documentation for this project: The Japanese version of the page (in the top right hand corner), is more complete than the English version, so it is recommended to use an auto-translation website. |
Note for the maintainer of this project: Documentation should describe all the parts of the project in simple language. |
Hi, thanks for maintaining this project! It's quite useful.
The readme does not document what the following files contain:
The text was updated successfully, but these errors were encountered: