Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Several strings in win\pty.cc are converting wide strings to narrow strings #104

Open
Tyriar opened this issue Jul 9, 2017 · 3 comments
Labels

Comments

@Tyriar
Copy link
Member

Tyriar commented Jul 9, 2017

From @rprichard in #103 (comment)

I'm not 100% sure, but I think the initializer of msg_ converts the wide string to a narrow string by truncating each wchar_t value to a char. It will only be correct for ASCII characters. I'm guessing node.js and/or v8 want UTF-8 strings? Standard C++ has a way to do that conversion (as of C++11, I think -- I'm thinking of std::codecvt), and maybe node.js / v8 also does? Windows has C-style APIs to do the conversion.

I think the pattern already existed in the code, so it's probably not necessary to fix it in this PR? e.g. see shellpath_'s initializer.

@jerch
Copy link
Collaborator

jerch commented Jul 11, 2017

@Tyriar Imho the conversion is not needed at all since Windows uses internally a type of UTF-16 for wchar_t and so does Javascript (not sure though if they agree on all parts under all circumstances as there are several flavors of UTF-16).

Untested:

wchar_t *some_wstring;
...
Nan::MaybeLocal<v8::String> error = Nan::New<v8::String>(
    (uint16_t *) some_wstring,
    wcslen(some_wstring));
Nan::ThrowError(error.ToLocalChecked());

@Tyriar
Copy link
Member Author

Tyriar commented Jul 11, 2017

@jerch I did something like that but I wasn't sure what the best way to test it was.

@jerch
Copy link
Collaborator

jerch commented Jul 12, 2017

@Tyriar The official docs state the following:

V8 developer doc mentions UTF-16 for method v8::String::NewFromTwoByte (http://bespin.cz/~ondras/html/classv8_1_1String.html#a876615eb027092a6a71a4e7d69b82d00) - seems to be safe to just cast the wchar_t* to uint16_t*.

And for testing - hmm, to see if all wchar_ts get correctly mapped and understood in JS maybe you can use the codePointAt() method in JS and compare the values with the codepoint number from C++.

Btw the new for (xy in s) and the Array.from(s) in JS act on codepoints and not on charcodes anymore, which is helpful to get an iteration on those running. String.length still maps to the charcode length though.

@Tyriar Tyriar removed their assignment Oct 14, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants