-
Notifications
You must be signed in to change notification settings - Fork 157
DBF encodings in GeoDa
When saving a DBF file, it is possible a .CPG
file is created if a specific encoding (e.g. GB2312 for Chinese characters) is used in GeoDa Table. The encoding information could be loaded from original dataset (see following discussion of LDID
and code page
in a .CPG file) or specified manually by a user using Table->Encode menu.
A .CPG
file is an optional file that can be used to specify the code page for identifying the character set to be used. From OGR's Shapefile/DBF driver page: https://www.gdal.org/drv_shapefile.html
An attempt is made to read the code page setting in the .cpg file, or as a fallback in the LDID/codepage setting from the .dbf file, and use it to translate string fields to UTF-8 on read, and back when writing.
LDID
valid Language Driver ID
code can be found in http://www.autopark.ru/ASBProgrammerGuide/DBFSTRUC.HTM, and this page also shows which LDID value matches to which code page.
Please note that: Shapefile's DBF table contains a valid Language Driver ID (LDID) value in its header. However, the .CPG file has the highest priority.
From GDAL/OGR source code, it seems that OGR provides the code to translate LDID to code page, which is used internally in GDAL/OGR. A code page value (instead of a LDID value) can also be directly written in a .CPG file (e.g. "Big5" for Traditional Chinese). See the code here:
The valid code page includes:
Windows code page: CPxxxx
ISO code page: ISO-88859-xxx
Others: e.g. UTF-8, Big5, etc.
For development in GeoDa, the logic to handle additional files when saving a DBF file:
- don't create a .prj file
- don't create a .cpg file if no specific encoding is used in GeoDa
- only create a .cpg file when a specific encoding is used in GeoDa
-
Open a dbf file in GeoDa (no .cpg file). When exporting to a new dbf file, no .cpg file should be created.
-
Open a dbf file in GeoDa (with a .cpg file). When exporting to a new dbf file, the same .cpg file should be created with the new dbf file.
-
Open a dbf file in GeoDa (no .cpg file). Manually specify encodings (e.g. Chinese Simplified). When exporting to a new dbf file, a .cpg file (with content
CP936
) should be created with the new dbf file. -
Open a dataset with geometries (e.g. Guerry). When exporting to a new dbf file (table only), no .shp file should be created.