Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Concerning wstring-to-string conversion in WIN32 implementation #151

Open
whatevermarch opened this issue Jul 13, 2019 · 1 comment
Open

Comments

@whatevermarch
Copy link

Hi, I have tried compiling on Windows 10 with VS 2019. There is a warning exposed during the compilation in Anvil project ( in VS solution ), and this is the details

8>$(VS_PATH)\include\xstring(2308,23): warning C4244: 'argument': conversion from 'wchar_t' to 'const _Elem', possible loss of data
8>$(VS_PATH)\include\xstring(2308,23): warning C4244: with
8>$(VS_PATH)\include\xstring(2308,23): warning C4244: [
8>$(VS_PATH)\include\xstring(2308,23): warning C4244: _Elem=char
8>$(VS_PATH)\include\xstring(2308,23): warning C4244: ] (compiling source file $(ANVIL_ROOT)\src\misc\io.cpp)

I have investigated it and found that the problem line is 168 ( in master branch at this time I post ), which is

std::string file_name (file_name_with_path_wide.begin(), file_name_with_path_wide.end() );

This produces an error since the compiler is set to threat warning as an error, but it is fine because I can disable it in the project settings. However, this becomes my concern whether the file name data will be interpreted wrong or not according to this weird conversion. wchat_t stores 4 bytes character, while normal char only stores one byte. Using this constructor is like copying all the data to the new one and reinterpreting one wchat_t element as 4 char elements.

Maybe I misunderstood the conversion, or it might be the safe conversion as expected, but at least please fix the warning message if you plan to still enable -D_CRT_SECURE_NO_WARNINGS to MSVC compiler.

Thank you in advance.

@quetzalcoatl
Copy link

Paths certainly CAN contain non-ASCII characters.

On Windows that's pretty common. Even copying a directory via Explorer often adds a " - copy" suffix and the actual ' - ' is a non-ascii long-dash.

On unix/linux, wchar_t is usually a 4 byte UTF32 character, and char is UTF8 character. Casting it like this has a huge chance of destroying the string contents for any non-ASCII characters in the path and producing some trash in the std::string.

On windows, wchar_t is a 2 byte UTF16LE character, and char is your-windows-version-specific-codepage ASCII-like non-UTF8 character see here and dig further. Casting it like this has a huge chance of destroying the string contents and producing some trash in the std::string.

IMHO, that needs fixing.

Of course, you may decide to not support Unicode in paths and limit paths to the always-safe lower 0..127 part of ASCII, but that should be very explicitely stated in docs and tutorials. It may be so. I didn't check. If it is, then fine.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants