-
Notifications
You must be signed in to change notification settings - Fork 19
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
utf16le + readStringNT compatibility #23
Comments
Hmm this one is interesting. I'll have to check the other possible encodings and see if any others do this. |
So I looked into this a bit more, utf-16 is variable length, and a single character is represented by either 2 bytes or 4 bytes. So even the fix above will only work for certain characters. I think the solution here is to just throw an error if attempting to write or read a null terminated string using utf16 or ucs2. https://nodejs.org/api/buffer.html#buffer_buffers_and_character_encodings https://en.wikipedia.org/wiki/Null-terminated_string#Character_encodings Technically it looks like this isn't possible with even utf8, but it works for most characters. |
I might be misremembering my UTF studies, but I'm pretty sure that continuation bytes (when code points extend beyond a single byte for utf8, or two bytes for utf16) cannot be |
There is one question to ask here: is the null terminator to be interpreted as a character that is part of the string's encoding? If yes, then the null terminator would be as it is in the string: 2 bytes, meaning you'd be checking for two consecutive null bytes at an even offset from the starting read offset. If not, there is no way to safely detect a null terminator in UTF-16, as either byte of a code point may be null, so there are no guarantees when checking individual bytes. Thus, I would say it's the logical decision to interpret the null terminator as a character in the string's encoding. |
Consider:
We'd expect
output
to be"hello"
, but it's current''
, due to:smart-buffer/src/smartbuffer.ts
Lines 685 to 690 in d35c0ce
The buffer (after the write) looks like this:
I'm not sure if any encodings other than
utf16le
suffer from this, but to fix it thei++
should be changed toi += 2
forutf16le
.The text was updated successfully, but these errors were encountered: