-
Notifications
You must be signed in to change notification settings - Fork 30.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Buffer.concat
and Buffer.copy
silently produce invalid results when the operation involves indices equal or greater than 2^32
#55422
Comments
Buffer.concat
silently produces invalid output when its output size is greater than 4GBBuffer.concat
silently produces invalid output when its output size is greater than 4GiB
My current workaround (tested to produce correct results with sizes greater than 4 GiB): export function concatBuffers(buffers: Buffer[]) {
let totalLength = 0
for (const buffer of buffers) {
totalLength += buffer.length
}
const resultBuffer = Buffer.alloc(totalLength)
if (totalLength === 0) {
return resultBuffer
}
let writeOffset = 0
for (const buffer of buffers) {
resultBuffer.set(buffer, writeOffset)
writeOffset += buffer.length
}
return resultBuffer
} |
The issue started in v22.7.0. I'll start bisecting. Maybe #54087? |
I've finished bisecting. This was indeed caused by #54087 cc @ronag.
|
Anyone care to open a PR? I think this could be a simple case of just switching to |
I reproduced this on macOS. @ronag I'd like to try and tackle this one. |
good luck. |
This call to function _copyActual(source, target, targetStart, sourceStart, sourceEnd) {
if (sourceEnd - sourceStart > target.byteLength - targetStart)
sourceEnd = sourceStart + target.byteLength - targetStart;
let nb = sourceEnd - sourceStart;
const sourceLen = source.byteLength - sourceStart;
if (nb > sourceLen)
nb = sourceLen;
if (nb <= 0)
return 0;
_copy(source, target, targetStart, sourceStart, nb); // <--
return nb;
}
const {
byteLengthUtf8,
compare: _compare,
compareOffset,
copy: _copy, // <--
fill: bindingFill,
isAscii: bindingIsAscii,
isUtf8: bindingIsUtf8,
indexOfBuffer,
indexOfNumber,
indexOfString,
swap16: _swap16,
swap32: _swap32,
swap64: _swap64,
kMaxLength,
kStringMaxLength,
atob: _atob,
btoa: _btoa,
} = internalBinding('buffer'); A thorough solution is to ensure this method correctly handles large array sizes, or fails. Just working around it by falling back to |
So the root cause of this problem is 32-bit integer overflow in const auto target_start = args[2]->Uint32Value(env->context()).ToChecked();
const auto source_start = args[3]->Uint32Value(env->context()).ToChecked();
const auto to_copy = args[4]->Uint32Value(env->context()).ToChecked(); Apparently const largeBuffer = Buffer.alloc(2 ** 32 + 5)
largeBuffer.fill(111)
const result = Buffer.concat([largeBuffer])
console.log(result); // 6f 6f 6f 6f 6f 00 00 00 ...
// 1 2 3 4 5 Simply replacing |
I'm not sure what exactly the binding refers to, but I found a candidate method in the C++ code (at // Assume caller has properly validated args.
void SlowCopy(const FunctionCallbackInfo<Value>& args) {
Environment* env = Environment::GetCurrent(args);
ArrayBufferViewContents<char> source(args[0]);
SPREAD_BUFFER_ARG(args[1].As<Object>(), target);
const auto target_start = args[2]->Uint32Value(env->context()).ToChecked();
const auto source_start = args[3]->Uint32Value(env->context()).ToChecked();
const auto to_copy = args[4]->Uint32Value(env->context()).ToChecked();
memmove(target_data + target_start, source.data() + source_start, to_copy);
args.GetReturnValue().Set(to_copy);
} Regardless on whether it's the method used in the binding, using This method follows, also taking in uint32_t FastCopy(Local<Value> receiver,
const v8::FastApiTypedArray<uint8_t>& source,
const v8::FastApiTypedArray<uint8_t>& target,
uint32_t target_start,
uint32_t source_start,
uint32_t to_copy) {
uint8_t* source_data;
CHECK(source.getStorageIfAligned(&source_data));
uint8_t* target_data;
CHECK(target.getStorageIfAligned(&target_data));
memmove(target_data + target_start, source_data + source_start, to_copy);
return to_copy;
} |
@rotemdan this is correct |
If you simply search for the string
|
I think the fast methods won't get called with anything that doesn't fit into uint32. |
It's the slow methods that need fixing I guess. Should we even support 4G+ Buffers? @jasnell |
It already supports large typed arrays ( Fixing the methods in As an intermediate solution, you could allow large |
The fix should be really simple (couldn't test because I don't really know how to compile Node.js at the moment): In SlowCopy: change // Assume caller has properly validated args.
void SlowCopy(const FunctionCallbackInfo<Value>& args) {
Environment* env = Environment::GetCurrent(args);
ArrayBufferViewContents<char> source(args[0]);
SPREAD_BUFFER_ARG(args[1].As<Object>(), target);
const auto target_start = args[2]->IntegerValue(env->context()).ToChecked();
const auto source_start = args[3]->IntegerValue(env->context()).ToChecked();
const auto to_copy = args[4]->IntegerValue(env->context()).ToChecked();
memmove(target_data + target_start, source.data() + source_start, to_copy);
args.GetReturnValue().Set(to_copy);
} The signature of _VCRTIMP void* __cdecl memmove(
_Out_writes_bytes_all_opt_(_Size) void* _Dst,
_In_reads_bytes_opt_(_Size) void const* _Src,
_In_ size_t _Size
); This means there's an implicit cast here from For extra safety for 32-bit platforms, we could ensure they are all in the range of It's also easy fix // Assume caller has properly validated args.
size_t FastCopy(Local<Value> receiver,
const v8::FastApiTypedArray<uint8_t>& source,
const v8::FastApiTypedArray<uint8_t>& target,
size_t target_start,
size_t source_start,
size_t to_copy) {
uint8_t* source_data;
CHECK(source.getStorageIfAligned(&source_data));
uint8_t* target_data;
CHECK(target.getStorageIfAligned(&target_data));
memmove(target_data + target_start, source_data + source_start, to_copy);
return to_copy;
} These kind of changes are really simple to do. I definitely think they are worth it. Anyway, 4 GiB+ contiguous ArrayBuffers should be important (essential?) for WASM64, I believe (and so many other great applications, of course, like memory mapping, databases, machine-learning, large vectors/matrices etc.), and based on my observations of the Node.js code, the amount of effort that would be required to try to artificially restrict Deprecating |
I've verified that changing: const auto target_start = args[2]->Uint32Value(env->context()).ToChecked();
const auto source_start = args[3]->Uint32Value(env->context()).ToChecked();
const auto to_copy = args[4]->Uint32Value(env->context()).ToChecked(); To: const auto target_start = args[2]->IntegerValue(env->context()).ToChecked();
const auto source_start = args[3]->IntegerValue(env->context()).ToChecked();
const auto to_copy = args[4]->IntegerValue(env->context()).ToChecked(); Seems to fix the issue (tested on Windows 11 x64) I'll try to do some more testing before I'll give a pull request. I also made a fix for I've had trouble with fixing other methods that required changing the signature, like There are also two other minor fixes I looked at: In
May be changed to:
Since those assignments may be casting from I'll try to work on each fix separately for now. Not all at once. |
Buffer.concat
silently produces invalid output when its output size is greater than 4GiBBuffer.concat
and Buffer.copy
silently produce invalid results when the operation involves indices equal or greater than 2^32
Based on observations on the code, I realized the same problem should also occur in
Buffer.prototype.copy =
function copy(target, targetStart, sourceStart, sourceEnd) {
return copyImpl(this, target, targetStart, sourceStart, sourceEnd);
};
function _copyActual(source, target, targetStart, sourceStart, sourceEnd) {
if (sourceEnd - sourceStart > target.byteLength - targetStart)
sourceEnd = sourceStart + target.byteLength - targetStart;
let nb = sourceEnd - sourceStart;
const sourceLen = source.byteLength - sourceStart;
if (nb > sourceLen)
nb = sourceLen;
if (nb <= 0)
return 0;
_copy(source, target, targetStart, sourceStart, nb); // <------- Binds to SlowCopy
return nb;
} Fixing the C++ method ( |
Version
v22.9.0, v23.0.0
Platform
Subsystem
Buffer
What steps will reproduce the bug?
How often does it reproduce? Is there a required condition?
Consistent in
v22.9.0
andv23.0.0
What is the expected behavior? Why is that the expected behavior?
All bytes of the return buffer produced by
Buffer.concat([largeBuffer])
should be identical to the source:In this example:
What do you see instead?
In the returned buffer, first 5 bytes are
111
, and all following ones are 0.The
console.log(result)
output looks like:Additional information
No response
The text was updated successfully, but these errors were encountered: