From 414bc875fc0d682c6f6810d9e2dc2e0d08c040ab Mon Sep 17 00:00:00 2001 From: bmcq-0 <66541602+bmcq-0@users.noreply.github.com> Date: Fri, 1 Dec 2023 14:31:58 -0500 Subject: [PATCH 1/7] Create function-buffer-varints.md --- docs/function-buffer-varints.md | 70 +++++++++++++++++++++++++++++++++ 1 file changed, 70 insertions(+) create mode 100644 docs/function-buffer-varints.md diff --git a/docs/function-buffer-varints.md b/docs/function-buffer-varints.md new file mode 100644 index 00000000..b6d0c8b4 --- /dev/null +++ b/docs/function-buffer-varints.md @@ -0,0 +1,70 @@ +# Varint Functions for Buffers + +## Summary + +This RFC proposes the addition of new functions to the buffer API: + +- `buffer.readleb128` +- `buffer.readleb128` +- `buffer.readuleb128` +- `buffer.writeuleb128` + +These functions will write and read (U)LEB-128 variable-width integers (varints) from a buffer. + +## Motivation + +One of the top use cases for buffers is compressing and serializing data as small as possible. Numbers consume 8 bytes in their raw form but can be serialized in smaller sizes using buffers to reduce size. However, writing code to serialize as the smallest size can be tedious and confusing for less-experienced programmers. + +The following are implementations of ULEB-128 reading/writing in Luau: + +```lua +local function writeuleb128(stream, value) + while value >= 0x80 do + buffer.writeu8(bit32.bor(bit32.band(value, 0x7f), 0x80)) + value = bit32.rshift(value, 7) + end + buffer.writeu8(value) +end +``` + +```lua +local function readuleb128(stream) + local result = 0 + local bits = 0 + while true do + local byte = buffer.readu8(stream) + result = bit32.bor(result, bit32.lshift(bit32.band(byte, 0x7f), bits)) + if byte < 0x80 then + break + end + bits = bits + 7 + end + return result +end +``` + +The functions above are inefficient and difficult to understand compared to a native implementation. Implementations will also be needed for the corresponding signed functions. + +## Design + +The `buffer` library will receive 4 new functions: + +``` +buffer.readleb128(b: buffer, offset: number): number +buffer.readuleb128(b: buffer, offset: number): number + +buffer.readleb128(b: buffer, offset: number, value: number): () +buffer.writeuleb128(b: buffer, offset: number, value: number): () +``` + +Since other numbers in the buffer library have unsigned and signed implementations, it also makes sense to include both options for varints. + +## Drawbacks + +The only drawback known is a marginal increase in built-in complexity. However, the performance benefit from having native implementations of these functions outweighs the negligible change in complexity and is not a serious concern. + +## Alternatives + +Serialization and deserialization for varints can be recreated directly in Luau. However, the algorithm for doing this may be complicated for less-experienced programmers as it involves bitwise operations. Additionally, the algorithm requires repeated buffer reads and calls to bitwise functions to function correctly, which is far less performant than it could be in native code. + +It is also possible to have a function that serializes a number in the smallest amount of bits it can fit into. However, to read it, the amount of bits it was serialized in would also have to be included. This count of how many bytes to read would also have to be stored in the buffer, which adds an extra byte of unnecessary data. From f108ef4881f3ab9704ee4d9b6e58f1654c3f613d Mon Sep 17 00:00:00 2001 From: bmcq-0 <66541602+bmcq-0@users.noreply.github.com> Date: Fri, 1 Dec 2023 14:44:56 -0500 Subject: [PATCH 2/7] Update function-buffer-varints.md --- docs/function-buffer-varints.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/function-buffer-varints.md b/docs/function-buffer-varints.md index b6d0c8b4..f63719d3 100644 --- a/docs/function-buffer-varints.md +++ b/docs/function-buffer-varints.md @@ -53,7 +53,7 @@ The `buffer` library will receive 4 new functions: buffer.readleb128(b: buffer, offset: number): number buffer.readuleb128(b: buffer, offset: number): number -buffer.readleb128(b: buffer, offset: number, value: number): () +buffer.writeleb128(b: buffer, offset: number, value: number): () buffer.writeuleb128(b: buffer, offset: number, value: number): () ``` From 07aa7e092816512eba76c0ea561d43e530e1ee38 Mon Sep 17 00:00:00 2001 From: bmcq-0 <66541602+bmcq-0@users.noreply.github.com> Date: Fri, 1 Dec 2023 15:43:03 -0500 Subject: [PATCH 3/7] Update function-buffer-varints.md --- docs/function-buffer-varints.md | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/docs/function-buffer-varints.md b/docs/function-buffer-varints.md index f63719d3..1bd49323 100644 --- a/docs/function-buffer-varints.md +++ b/docs/function-buffer-varints.md @@ -1,3 +1,4 @@ + # Varint Functions for Buffers ## Summary @@ -42,8 +43,7 @@ local function readuleb128(stream) return result end ``` - -The functions above are inefficient and difficult to understand compared to a native implementation. Implementations will also be needed for the corresponding signed functions. +The functions above are inefficient and difficult to understand compared to a library implementation. In some very common examples such as network event compression or data decompression, these functions can be called hundreds or even thousands of times per second. Library implementations would solve all readability/complexity, performance, and compression efficiency problems. ## Design @@ -61,10 +61,10 @@ Since other numbers in the buffer library have unsigned and signed implementatio ## Drawbacks -The only drawback known is a marginal increase in built-in complexity. However, the performance benefit from having native implementations of these functions outweighs the negligible change in complexity and is not a serious concern. +The only drawback known is a marginal increase in library complexity. However, the performance benefit from having library implementations of these functions outweighs the negligible change in complexity and is not a serious concern. ## Alternatives -Serialization and deserialization for varints can be recreated directly in Luau. However, the algorithm for doing this may be complicated for less-experienced programmers as it involves bitwise operations. Additionally, the algorithm requires repeated buffer reads and calls to bitwise functions to function correctly, which is far less performant than it could be in native code. +Serialization and deserialization for varints can be recreated directly in Luau. However, the algorithm for doing this may be complicated for less-experienced programmers as it involves bitwise operations. Additionally, the algorithm requires repeated buffer reads and calls to bitwise functions to function correctly, which is far less performant than a library implementation. -It is also possible to have a function that serializes a number in the smallest amount of bits it can fit into. However, to read it, the amount of bits it was serialized in would also have to be included. This count of how many bytes to read would also have to be stored in the buffer, which adds an extra byte of unnecessary data. +It is also possible to have a function that serializes a number in the smallest amount of bytes it can fit into. However, to read it, the amount of bytes would also have to be included. This size to read would also have to be stored in the buffer, which adds an extra byte of unnecessary data. From e1644b671dd5ead46cc64b4a03db056efcbdd49e Mon Sep 17 00:00:00 2001 From: bmcq-0 <66541602+bmcq-0@users.noreply.github.com> Date: Thu, 7 Dec 2023 20:48:51 -0500 Subject: [PATCH 4/7] Update function-buffer-varints.md --- docs/function-buffer-varints.md | 10 ++++++---- 1 file changed, 6 insertions(+), 4 deletions(-) diff --git a/docs/function-buffer-varints.md b/docs/function-buffer-varints.md index 1bd49323..20c24ad5 100644 --- a/docs/function-buffer-varints.md +++ b/docs/function-buffer-varints.md @@ -50,15 +50,17 @@ The functions above are inefficient and difficult to understand compared to a li The `buffer` library will receive 4 new functions: ``` -buffer.readleb128(b: buffer, offset: number): number -buffer.readuleb128(b: buffer, offset: number): number +buffer.readleb128(b: buffer, offset: number): (number, number) +buffer.readuleb128(b: buffer, offset: number): (number, number) -buffer.writeleb128(b: buffer, offset: number, value: number): () -buffer.writeuleb128(b: buffer, offset: number, value: number): () +buffer.writeleb128(b: buffer, offset: number, value: number): number +buffer.writeuleb128(b: buffer, offset: number, value: number): number ``` Since other numbers in the buffer library have unsigned and signed implementations, it also makes sense to include both options for varints. +The functions take arguments similar to other read/write functions in the buffer library. However, they differ in that they return the amount of bytes that were read/written. In readleb128/readuleb128, this is the second return value. Having these functions return the count is necessary as it must be known in order for users to keep track of the buffer offset. + ## Drawbacks The only drawback known is a marginal increase in library complexity. However, the performance benefit from having library implementations of these functions outweighs the negligible change in complexity and is not a serious concern. From f2d485d80a117d4c5b4c31663f098fba1da52ecc Mon Sep 17 00:00:00 2001 From: bmcq-0 <66541602+bmcq-0@users.noreply.github.com> Date: Tue, 23 Jan 2024 14:57:49 -0500 Subject: [PATCH 5/7] Update function-buffer-varints.md --- docs/function-buffer-varints.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/function-buffer-varints.md b/docs/function-buffer-varints.md index 20c24ad5..6bb35236 100644 --- a/docs/function-buffer-varints.md +++ b/docs/function-buffer-varints.md @@ -6,7 +6,7 @@ This RFC proposes the addition of new functions to the buffer API: - `buffer.readleb128` -- `buffer.readleb128` +- `buffer.writeleb128` - `buffer.readuleb128` - `buffer.writeuleb128` From 86c4f094ac05ea4ef3a06e92c9a7ccee006f2b77 Mon Sep 17 00:00:00 2001 From: bmcq-0 <66541602+bmcq-0@users.noreply.github.com> Date: Tue, 30 Jan 2024 16:41:57 -0500 Subject: [PATCH 6/7] Update function-buffer-varints.md --- docs/function-buffer-varints.md | 36 +++++++++++++++++++-------------- 1 file changed, 21 insertions(+), 15 deletions(-) diff --git a/docs/function-buffer-varints.md b/docs/function-buffer-varints.md index 6bb35236..851f0243 100644 --- a/docs/function-buffer-varints.md +++ b/docs/function-buffer-varints.md @@ -19,30 +19,36 @@ One of the top use cases for buffers is compressing and serializing data as smal The following are implementations of ULEB-128 reading/writing in Luau: ```lua -local function writeuleb128(stream, value) +local function writeuleb128(stream, offset, value) + local start = offset + while value >= 0x80 do - buffer.writeu8(bit32.bor(bit32.band(value, 0x7f), 0x80)) + buffer.writeu8(stream, offset, bit32.bor(bit32.band(value, 0x7f), 0x80)) value = bit32.rshift(value, 7) + offset = offset + 1 end - buffer.writeu8(value) + buffer.writeu8(stream, offset, value) + + return (offset - start) + 1 end ``` ```lua -local function readuleb128(stream) - local result = 0 - local bits = 0 - while true do - local byte = buffer.readu8(stream) - result = bit32.bor(result, bit32.lshift(bit32.band(byte, 0x7f), bits)) - if byte < 0x80 then - break - end - bits = bits + 7 - end - return result +local function readuleb128(stream, offset) + local result, shift = 0, 0 + local length = buffer.len(stream) + local start = offset + + repeat + local byte = buffer.readu8(stream, offset) + result = bit32.bor(result, bit32.lshift(bit32.band(byte, 0x7f), shift)) + shift, offset = shift + 7, offset + 1 + until byte < 0x80 or length <= offset + + return result, offset - start end ``` + The functions above are inefficient and difficult to understand compared to a library implementation. In some very common examples such as network event compression or data decompression, these functions can be called hundreds or even thousands of times per second. Library implementations would solve all readability/complexity, performance, and compression efficiency problems. ## Design From fd814cea666e55d824dfb178eb6cddec0d907890 Mon Sep 17 00:00:00 2001 From: bmcq-0 <66541602+bmcq-0@users.noreply.github.com> Date: Tue, 30 Jan 2024 16:54:58 -0500 Subject: [PATCH 7/7] Update function-buffer-varints.md --- docs/function-buffer-varints.md | 24 ++++++++++++++++-------- 1 file changed, 16 insertions(+), 8 deletions(-) diff --git a/docs/function-buffer-varints.md b/docs/function-buffer-varints.md index 851f0243..65955b3d 100644 --- a/docs/function-buffer-varints.md +++ b/docs/function-buffer-varints.md @@ -21,13 +21,21 @@ The following are implementations of ULEB-128 reading/writing in Luau: ```lua local function writeuleb128(stream, offset, value) local start = offset + local length = buffer.len(stream) - while value >= 0x80 do - buffer.writeu8(stream, offset, bit32.bor(bit32.band(value, 0x7f), 0x80)) - value = bit32.rshift(value, 7) - offset = offset + 1 + while true do + if offset >= length then + error("buffer access out of bounds") + end + if value >= 0x80 then + buffer.writeu8(stream, offset, bit32.bor(bit32.band(value, 0x7f), 0x80)) + value = bit32.rshift(value, 7) + offset = offset + 1 + else + buffer.writeu8(stream, offset, value) + break + end end - buffer.writeu8(stream, offset, value) return (offset - start) + 1 end @@ -40,9 +48,9 @@ local function readuleb128(stream, offset) local start = offset repeat - local byte = buffer.readu8(stream, offset) - result = bit32.bor(result, bit32.lshift(bit32.band(byte, 0x7f), shift)) - shift, offset = shift + 7, offset + 1 + local byte = buffer.readu8(stream, offset) + result = bit32.bor(result, bit32.lshift(bit32.band(byte, 0x7f), shift)) + shift, offset = shift + 7, offset + 1 until byte < 0x80 or length <= offset return result, offset - start