-
Notifications
You must be signed in to change notification settings - Fork 156
DBF encodings in GeoDa
When saving a DBF file, it is possible a .CPG
file is created if a specific encoding (e.g. GB2312 for Chinese characters) is used in GeoDa Table. The encoding information could be loaded from original dataset or specified manually by a user using Table->Encode menu.
A .CPG
file is an optional file that can be used to specify the code page for identifying the character set to be used. From OGR's Shapefile/DBF driver page: https://www.gdal.org/drv_shapefile.html
An attempt is made to read the code page setting in the .cpg file, or as a fallback in the LDID/codepage setting from the .dbf file, and use it to translate string fields to UTF-8 on read, and back when writing.
LDID
valid Language Driver ID
is defined in http://www.autopark.ru/ASBProgrammerGuide/DBFSTRUC.HTM, and this page also shows which LDID value matches to which code page.
Please note that: Shapefile's DBF table contains a valid Language Driver ID (LDID) value in its header. However, the .CPG file has the highest priority.
From GDAL/OGR source code, it seems that OGR provides the code to translate LDID to code page, which is used internally in GDAL/OGR. A code page value (instead of LDID value) can also be directly written in a .CPG file (e.g. "Big5" for Traditional Chinese). See the code here:
The valid code page includes:
Windows code page: CPxxxx
ISO code page: ISO-88859-xxx
Others: e.g. UTF-8, Big5, etc.
For development, the logic to handle additional files when saving a DBF file:
- don't create a .prj file
- don't create a .cpg file if no specific encoding is used in GeoDa
- only create a .cpg file when a specific encoding is used in GeoDa