A character set conversion library based on GNU LIBICONV. Supports C and C++20.
If your C++ standard is lower than C++20. A few code needs to be modified.
This library requires build-aux
and cppp-platform
for building.
See docs/ for full requirements list.
We are planning to use Rubisco to manage submodules. But it is not ready yet. So we don not use git submodule for now. Please clone them manually.
Use the following command to fetch source code, or download source package:
git clone https://github.com/cppp-project/cppp-reiconv
cd cppp-reiconv
git clone https://github.com/cppp-project/build-aux --depth 1
git clone https://github.com/cppp-project/cppp-platform --depth 1
Use the following command to build and install.
mkdir build
cd build
cmake .. -DCMAKE_BUILD_TYPE=[[BUILD_TYPE]] -DCMAKE_INSTALL_PREFIX=[[PREFIX]]
cmake --build . --config=[[BUILD_TYPE]]
cmake --install . --config=[[BUILD_TYPE]]
#include <cppp/reiconv.hpp>
#include <cstdlib>
#include <iostream>
int main()
{
const std::string_view src = "\xb8\xfc\xcf\xb2\xe1\xba\xc9\xbd\xc7\xa7\xc0\xef\xd1\xa9\xa3\xac\xc8\xfd\xbe\xfc\xb9\xfd\xba\xf3\xbe\xa1\xbf\xaa\xd1\xd5\xa3\xa1";
std::string result = reiconv::convert("GB18030", "UTF-8", src);
std::cout << result << std::endl;
const std::string_view correct_result = "\u66f4\u559c\u5cb7\u5c71\u5343\u91cc\u96ea\uff0c\u4e09\u519b\u8fc7\u540e\u5c3d\u5f00\u989c\uff01";
if (result == correct_result)
{
std::cout << "Correct!" << std::endl;
}
else
{
// If it happens, please report a bug.
std::cout << "Wrong!" << std::endl;
}
return EXIT_SUCCESS;
}
The full documentation can be found in docs/.
Same as GNU LIBICONV, It provides support for the encodings:
- European languages
- ASCII, ISO-8859-{1,2,3,4,5,7,9,10,13,14,15,16},
- KOI8-R, KOI8-U, KOI8-RU,
- CP{1250,1251,1252,1253,1254,1257}, CP{850,866,1131},
- Mac{Roman,CentralEurope,Iceland,Croatian,Romania},
- Mac{Cyrillic,Ukraine,Greek,Turkish},
- Macintosh
- Semitic languages
- ISO-8859-{6,8}, CP{1255,1256}, CP862, Mac{Hebrew,Arabic}
- Japanese
- EUC-JP, SHIFT_JIS, CP932, ISO-2022-JP, ISO-2022-JP-2, ISO-2022-JP-1,
- ISO-2022-JP-MS
- Chinese
- EUC-CN, HZ, GBK, CP936, GB18030, GB18030:2022, EUC-TW, BIG5, CP950,
- BIG5-HKSCS, BIG5-HKSCS:2004, BIG5-HKSCS:2001, BIG5-HKSCS:1999,
- ISO-2022-CN, ISO-2022-CN-EXT
- Korean
- EUC-KR, CP949, ISO-2022-KR, JOHAB
- Armenian
- ARMSCII-8
- Georgian
- Georgian-Academy, Georgian-PS
- Tajik
- KOI8-T
- Kazakh
- PT154, RK1048
- Thai
- ISO-8859-11, TIS-620, CP874, MacThai
- Laotian
- MuleLao-1, CP1133
- Vietnamese
- VISCII, TCVN, CP1258
- Platform specifics
- HP-ROMAN8, NEXTSTEP
- Full Unicode
- UTF-8
- UCS-2, UCS-2BE, UCS-2LE
- UCS-4, UCS-4BE, UCS-4LE
- UTF-16, UTF-16BE, UTF-16LE
- UTF-32, UTF-32BE, UTF-32LE
- UTF-7
- C99, JAVA
- Full Unicode, in terms of 'uint16_t' or 'uint32_t'
- UCS-2-INTERNAL, UCS-4-INTERNAL (with machine dependent endianness and alignment)
And some extra encodings. These encodings are GNU LIBICONV's extra encodings.
- European languages
- CP{437,737,775,852,853,855,857,858,860,861,863,865,869,1125}
- Semitic languages
- CP864
- Japanese
- EUC-JISX0213, Shift_JISX0213, ISO-2022-JP-3
- Chinese
- BIG5-2003 (experimental)
- Turkmen
- TDS565
- Platform specifics
- ATARIST, RISCOS-LATIN1
- EBCDIC compatible (not ASCII compatible, very rarely used)
- European languages
- IBM-{037,273,277,278,280,282,284,285,297,423,500,870,871,875,880},
- IBM-{905,924,1025,1026,1047,1112,1122,1123,1140,1141,1142,1143},
- IBM-{1144,1145,1146,1147,1148,1149,1153,1154,1155,1156,1157,1158},
- IBM-{1165,1166,4971}
- Semitic languages
- IBM-{424,425,12712,16804}
- Persian
- IBM-1097
- Thai
- IBM-{838,1160}
- Laotian
- IBM-1132
- Vietnamese
- IBM-{1130,1164}
- Indic languages
- IBM-1137
- European languages
It can convert from any of these encodings to any other, through Unicode conversion.
We use CMake for build
BUILD_TESTING
: Build tests. Default is ON.
cmake .. -DBUILD_TESTING=ON
cmake --build . --config=RelWithDebInfo
ctest -C RelWithDebInfo --output-on-failure
ICONV_COMPAT
: Enable compatibility with iconv. Default is OFF.
If ICONV_COMPAT
is ON, it will install iconv.h
with basic functions.
But we do not support all features of POSIX:2024. See TODO.
This library installs:
- A shared library
libcppp-reiconv
. - A static library
libcppp-reiconv.static
. - Header files.
include
├── cppp
│ ├── cppp-platform.h
│ ├── encodings
│ │ ├── reiconv.h
│ │ └── reiconv.hpp
│ ├── reiconv.h
│ └── reiconv.hpp
└── iconv.h # Only if ICONV_COMPAT is ON
The cppp-reiconv is under LGPLv3, see file LICENSE.
See https://github.com/cppp-project/cppp-reiconv/releases
https://github.com/cppp-project/cppp-reiconv
- Create a issue on GitHub Report now