-
-
Notifications
You must be signed in to change notification settings - Fork 402
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimize VarInt decoding #1117
base: main
Are you sure you want to change the base?
Optimize VarInt decoding #1117
Conversation
Upfront: This isn't the most elegant change. But given VarInts sit in the hot path it seems worth it. The current VarInt decoding performs some duplicate bounds check. `VarInt::decode` checks for available bytes in order to return an error. And `bytes::Buf` performs additional bounds checks in various helper methods. By using `Buf::chunk` and `Buf::advance` directly we can save some of those. We could improve even more by only support decoding from contiguous buffers, but that would change the API contract. In addition it seemed like the `u16` be->le conversion wasn't as free as it should be. A 2byte LE buffer improves things here. Overall this reduces CPU time spent in VarInt decoding from about 2% to around 1%.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we have a microbenchmark for this path?
// check. We already performed this one earlier on. | ||
buf[0] = r.chunk()[0]; | ||
r.advance(1); | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this specific change actually help? I wouldn't expect this to have any difference in bounds checking semantics.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's an assert!
in the get_u8
, not just a rust array bounds check. I don't think those are removed. And since even without the "le" change this proved to be faster, I guess practical experience shows that it helps.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Uh, left this sitting around for a long time. @Matthias247 what do you think?
@@ -159,21 +163,35 @@ impl Codec for VarInt { | |||
if r.remaining() < 1 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we do this without unsafe if we use <&[u8; 1]>::try_from()
to extract an array out of the bounds check?
Along these lines:
diff --git a/quinn-proto/src/varint.rs b/quinn-proto/src/varint.rs
index 7f57bc3c..8f4b9b7c 100644
--- a/quinn-proto/src/varint.rs
+++ b/quinn-proto/src/varint.rs
@@ -1,4 +1,4 @@
-use std::{convert::TryInto, fmt};
+use std::{convert::{TryInto, TryFrom}, fmt};
use bytes::{Buf, BufMut};
use thiserror::Error;
@@ -146,38 +146,37 @@ pub struct VarIntBoundsExceeded;
impl Codec for VarInt {
fn decode<B: Buf>(r: &mut B) -> coding::Result<Self> {
- if !r.has_remaining() {
- return Err(UnexpectedEnd);
- }
let mut buf = [0; 8];
- buf[0] = r.get_u8();
+ let input = r.chunk();
+ buf[0] = match input.get(0) {
+ Some(&b) => b,
+ None => return Err(UnexpectedEnd),
+ };
+
let tag = buf[0] >> 6;
buf[0] &= 0b0011_1111;
- let x = match tag {
- 0b00 => u64::from(buf[0]),
+ let (x, len) = match tag {
+ 0b00 => (u64::from(buf[0]), 1),
0b01 => {
- if r.remaining() < 1 {
- return Err(UnexpectedEnd);
- }
- r.copy_to_slice(&mut buf[1..2]);
- u64::from(u16::from_be_bytes(buf[..2].try_into().unwrap()))
+ let arr = [0, buf[0]];
+ let read = <&[u8; 1]>::try_from(&input[1..]).map_err(|_| UnexpectedEnd)?;
+ let arr = [read[0], buf[0]];
+ (u64::from(u16::from_le_bytes(arr)), 2)
}
0b10 => {
- if r.remaining() < 3 {
- return Err(UnexpectedEnd);
- }
- r.copy_to_slice(&mut buf[1..4]);
- u64::from(u32::from_be_bytes(buf[..4].try_into().unwrap()))
+ let read = <&[u8; 3]>::try_from(&input[1..]).map_err(|_| UnexpectedEnd)?;
+ let arr = [read[2], read[1], read[0], buf[0]];
+ (u64::from(u32::from_le_bytes(arr)), 4)
}
0b11 => {
- if r.remaining() < 7 {
- return Err(UnexpectedEnd);
- }
- r.copy_to_slice(&mut buf[1..8]);
- u64::from_be_bytes(buf)
+ let read = <&[u8; 7]>::try_from(&input[1..]).map_err(|_| UnexpectedEnd)?;
+ let arr = [read[6], read[5], read[4], read[3], read[2], read[1], read[0], buf[0]];
+ (u64::from(u64::from_le_bytes(arr)), 8)
}
_ => unreachable!(),
};
+
+ r.advance(len);
Ok(VarInt(x))
}
Upfront: This isn't the most elegant change. But given VarInts sit in the
hot path it seems worth it.
The current VarInt decoding performs some duplicate bounds check.
VarInt::decode
checks for available bytes in order to return an error.And
bytes::Buf
performs additional bounds checks in various helper methods.By using
Buf::chunk
andBuf::advance
directly we can save some of those.We could improve even more by only support decoding from contiguous buffers,
but that would change the API contract.
In addition it seemed like the
u16
be->le conversion wasn't as freeas it should be. A 2byte LE buffer improves things here.
Overall this reduces CPU time spent in VarInt decoding from about 2% to around 1%.