-
Notifications
You must be signed in to change notification settings - Fork 83
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
make Range headers strongly typed (again) #93
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here's my comments of points I'm unsure of. Other than that, the only questions I still have is whether we should clean up the ranges a bit before returning them, like coalescing overlapping ranges and sorting them.
ee933be
to
178fe1e
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, way too much time later, I've rewritten most of the implementation, while also discovering an older implementation, that achieved very similar goals as my new implementation. I've merged the older implementation and my new one, resulting in something that's relatively far from where we are right now, so writing documentation to help people migrate would probably be a good idea.
There's still some open questions around error handling, and it might make sense to go through the documentation again, but this should be mostly good right now.
As a future task, I'd say that someone should probably go through RFC7233 again and give all areas here the same treatment, as other headers in here have similar problems. Content-Range
does not support non-bytes units for example, same with Accept-Ranges
. There's no obvious problems with the If-Range
header though :D
PS: If any of the comments sound ranty or annoyed, I'm sorry, I'm really not trying to offend anyone here. I've put quite some time into this, and in the end realised that my implementation was nearly equivalent to something that had already been done ~5 years ago, which is why I'm a little frustrated.
src/common/range.rs
Outdated
.filter_map(|spec: &str| -> Option<ByteRangeSpec> { | ||
let spec = spec.parse(); | ||
if spec.is_err() { | ||
invalid = true; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here, we mark a range header as invalid if we encounter a byte-range-spec we can't parse. Silently dropping specs we can't parse feels wrong, and section 4.4 of RFC7233 does sound like an invalid byte-range-spec should cause a rejected byte-range-set. This is not quite clear from the wording though, so I'd like a second set of eyes and opinions on this here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're right, it's a little hard to judge whether this is saying "you should reject on invalid ranges" or "this is up to the implementer". I feel like it's less surprising if you fail in this case, especially if the alternative is silently dropping ranges.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My main problem here is that we don't have any way to communicate which bit we stumbled over back to a dev that might be debugging this, without them actually connecting a debugger and stepping down this far. If we were using log
, we'd be able to put out debug logs here, which I'd be happier with. Would a log
feature that adds a dependency to log
be acceptable here, so that we can log this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Logs are nice, but not super discoverable or obvious (even if documented) compared to actual errors. Really the best thing would be to have a useful error type that the caller can inspect.
AllFrom(u64), | ||
/// For a suffix, only the length of the suffix is given, and the server is prompted to return | ||
/// the last bytes only. | ||
Last(NonZeroU64), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
RFC 7233 states that suffix ranges with length zero are not satisfiable, but doesn't say anything about them being inherently invalid. I would argue that they are though, which is why I'd say that we should make them impossible using the type system.
src/common/range.rs
Outdated
@@ -141,278 +146,454 @@ impl ByteRangeSpec { | |||
/// length of the selected representation). | |||
/// | |||
/// [1]: https://tools.ietf.org/html/rfc7233 | |||
pub fn to_satisfiable_range(&self, full_length: u64) -> Option<(u64, u64)> { | |||
pub fn to_range_bounds(&self, len: u64) -> Option<impl RangeBounds<u64>> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As a comment for potential reviewers: This part down here is all commented out in the current tree, so most of the changes here don't break stuff for users. Some of the changes (which are necessary for RFC compliant operation) will break stuff, but there's not really a way around this right now.
let mut parts = s.splitn(2, '-'); | ||
fn err() -> InvalidRange { | ||
InvalidRange { _inner: () } | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm very, very unhappy with error handling in this module, but I couldn't really find a better way that has precedent in this crate? I'm used to using thiserror, but adding dependencies is probably out of the question, right? As an alternative, writing better error types and implementing the stuff thiserror would provide is also an option, but I couldn't really find other cases where that was done in this crate either.
Aside of error types, for some of the choices that this crate makes, as an application developer, I'd be happy to get trace logs if I needed them. The old implementation just silently dropped invalid ranges, while the new one just refuses parsing. Neither is great, especially considering that the error given when parsing is refused isn't very helpful either.
Feedback in this area would be immensely helpful.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps you could take a look at how hyper does error handling, for consistency? Definitely think a more descriptive error type is the way to go.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oki, I will give it a look. For now I've just used ::Error
src/common/range.rs
Outdated
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there any special place where I should put these 3 helper methods? Or are they good where they are?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice :) Super glad to see someone cleaning this up, I'd wanted to do that a while ago but didn't get the time.
src/common/range.rs
Outdated
Self::Bytes(ranges) => Some( | ||
ranges | ||
.iter() | ||
.filter_map(|spec| spec.to_range_bounds(len)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The filter_map
here feels like not the most intuitive (or desirable) behaviour IMO. I think if some of the subranges fail to convert to RangeBounds
, this whole thing should probably fail?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah I see, this is consistent with RFC 7233. The wording there implies that it's only an error if none of the ranges are satisfiable.
I still think that the behaviour is surprising, is there a more explicit name (to_satisfiable_range_bounds
or something hopefully less verbose) that would help? I think the doc comment should at least note this behaviour :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah yes, that method name should probably be a bit more verbose, and looking at it again, it's not quite correct either, because it still returns some if the Vec is empty. Will fix :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
mhm, thought about this a bit more (after I ran into trouble with returning None on empty vec): The option is to check whether it's a range header that makes sense to be represented by range bounds, so maybe Some
together with an empty Vec does make sense. Again, more thoughts on this would be appreciated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As far as I can tell from RFC 7233, there's no explicit behaviour specified for unrecognized range units. With that in mind, it's probably best to keep this method as simple as possible and let the caller decide how they want to handle it. Think you're right about returning Some
of empty Vec
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've changed the return type of both Range::to_satisfiable_range_bounds
and ByteRangeSpec::to_satisfiable_range_bounds
from Option
to Result
. I'm using ::Error
as the error type here although I would prefer a custom error type here indicating that the error happened during conversion to end inclusive range bounds. As I haven't seen any custom error types apart from ::Error
and I'm not sure what the right convention would be for custom error types, I'm just using ::Error
here.
src/common/range.rs
Outdated
.filter_map(|spec: &str| -> Option<ByteRangeSpec> { | ||
let spec = spec.parse(); | ||
if spec.is_err() { | ||
invalid = true; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're right, it's a little hard to judge whether this is saying "you should reject on invalid ranges" or "this is up to the implementer". I feel like it's less surprising if you fail in this case, especially if the alternative is silently dropping ranges.
fe28be4
to
5e0fecb
Compare
src/common/range.rs
Outdated
fn from_str(s: &str) -> Result<Self, Self::Err> { | ||
let mut iter = s.trim().splitn(2, '-'); | ||
Ok( | ||
match (parse_bound(iter.next())?, parse_bound(iter.next())?) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think it makes sense to pass an Option
to parse_bound
here. Here are the situations where iter.next()
can return None
here:
- for the first call, it's impossible. Even for empty string,
splitn
will give at least one element. Playground, also splitn docs show this - for the second call to
iter.next()
, you'll only getNone
if the string doesn't contain-
. Even in thefoo-
case, you'll get empty string as the second component. Playground
If the string doesn't contain a -
, that's an error.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oki, I've changed this function back to the old implementation that didn't have these issues
2554a56
to
00f87b9
Compare
From what I can see, the old, good implementation was commented out back in 2018 during some large refactoring of the whole crate, and 4b42e4f reenabled the range module, while replacing the old, strongly typed implementation with an implmentation that moves parsing of ranges out of the decoding, into an iterator function. While the implementation that 4b42e4f brought was certainly better than no implementation at all, the old implementation should've been fixed instead, considering that it was mostly spec compliant and supported more variants than the back then new implementation, and it even had tests, which the other one had not. Sadly, the I had only found the old implementation after writing a mostly feature complete new implementation, so this wasted a lot of effort right here. With the old implementation found, and the new and improved implementation at hand, a consolidated implementation did arise, now with more documentation, stronger type guarantees and more test cases then ever. Signed-off-by: Jan Christian Grünhage <[email protected]>
00f87b9
to
89f5904
Compare
Apologies for taking so long, part of the problem was the API breakage in this change. I've opened #155 as an alternative now that a breaking change is possible, but went that direction because it's more conservative. It just fixes what was broken, without added much new surface area to express everything in the spec. I'm hoping to start with the immediate fix, and if there's desire to construct or parse much more complicated |
fixes #87
I'm very much not happy with it as it is right now, but this is what I've been able to hack together so far. I'm still posting it as a starting point for further work on this, and to be able to point out the points I'm unsure of.Current state of the PR: #93 (review)