-
Notifications
You must be signed in to change notification settings - Fork 156
DBF encodings in GeoDa
When saving a DBF file, it is possible a .CPG
file is created if a specific encoding (e.g. GB2312 for Chinese characters) is used in GeoDa Table. The encoding information could be loaded from original dataset or specified manually by a user using Table->Encode menu.
A .CPG
file is an optional file that can be used to specify the code page for identifying the character set to be used. From OGR's Shapefile/DBF driver page: https://www.gdal.org/drv_shapefile.html
An attempt is made to read the code page setting in the .cpg file, or as a fallback in the LDID/codepage setting from the .dbf file, and use it to translate string fields to UTF-8 on read, and back when writing.
LDID valid Language Driver ID
value is defined in: http://www.autopark.ru/ASBProgrammerGuide/DBFSTRUC.HTM, and it also shows which LDID value matches to which code page.
Please note that: Shapefile's DBF table contains a valid Language Driver ID (LDID) value in its header. However, the .CPG file has the highest priority.
For GDAL/OGR library, it seems that code page values can be directly written in a .CPG file, and OGR provides code to translate LDID to code page, or if it is not able to translate, it will try to use the value in a .CPG file as a code page. See the code below:
The valid code page:
Windows code page: CPxxxx
ISO code page: ISO-88859-xxx
Others: e.g. UTF-8, Big5, etc.
For development, the logic to handle additional files when saving a DBF file:
- don't create a .prj file
- don't create a .cpg file if no specific encoding is used in GeoDa
- only create a .cpg file when a specific encoding is used in GeoDa