Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

make Range headers strongly typed (again) #93

Closed
wants to merge 1 commit into from

Conversation

jcgruenhage
Copy link

@jcgruenhage jcgruenhage commented Oct 14, 2021

fixes #87

I'm very much not happy with it as it is right now, but this is what I've been able to hack together so far. I'm still posting it as a starting point for further work on this, and to be able to point out the points I'm unsure of.

Current state of the PR: #93 (review)

Copy link
Author

@jcgruenhage jcgruenhage left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here's my comments of points I'm unsure of. Other than that, the only questions I still have is whether we should clean up the ranges a bit before returning them, like coalescing overlapping ranges and sorting them.

src/common/range.rs Outdated Show resolved Hide resolved
src/common/range.rs Outdated Show resolved Hide resolved
src/common/range.rs Outdated Show resolved Hide resolved
src/common/range.rs Outdated Show resolved Hide resolved
src/common/range.rs Outdated Show resolved Hide resolved
@jcgruenhage jcgruenhage force-pushed the fix-suffix-headers branch 5 times, most recently from ee933be to 178fe1e Compare October 18, 2021 15:44
Copy link
Author

@jcgruenhage jcgruenhage left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, way too much time later, I've rewritten most of the implementation, while also discovering an older implementation, that achieved very similar goals as my new implementation. I've merged the older implementation and my new one, resulting in something that's relatively far from where we are right now, so writing documentation to help people migrate would probably be a good idea.

There's still some open questions around error handling, and it might make sense to go through the documentation again, but this should be mostly good right now.

As a future task, I'd say that someone should probably go through RFC7233 again and give all areas here the same treatment, as other headers in here have similar problems. Content-Range does not support non-bytes units for example, same with Accept-Ranges. There's no obvious problems with the If-Range header though :D

PS: If any of the comments sound ranty or annoyed, I'm sorry, I'm really not trying to offend anyone here. I've put quite some time into this, and in the end realised that my implementation was nearly equivalent to something that had already been done ~5 years ago, which is why I'm a little frustrated.

.filter_map(|spec: &str| -> Option<ByteRangeSpec> {
let spec = spec.parse();
if spec.is_err() {
invalid = true;
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here, we mark a range header as invalid if we encounter a byte-range-spec we can't parse. Silently dropping specs we can't parse feels wrong, and section 4.4 of RFC7233 does sound like an invalid byte-range-spec should cause a rejected byte-range-set. This is not quite clear from the wording though, so I'd like a second set of eyes and opinions on this here.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right, it's a little hard to judge whether this is saying "you should reject on invalid ranges" or "this is up to the implementer". I feel like it's less surprising if you fail in this case, especially if the alternative is silently dropping ranges.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My main problem here is that we don't have any way to communicate which bit we stumbled over back to a dev that might be debugging this, without them actually connecting a debugger and stepping down this far. If we were using log, we'd be able to put out debug logs here, which I'd be happier with. Would a log feature that adds a dependency to log be acceptable here, so that we can log this?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Logs are nice, but not super discoverable or obvious (even if documented) compared to actual errors. Really the best thing would be to have a useful error type that the caller can inspect.

AllFrom(u64),
/// For a suffix, only the length of the suffix is given, and the server is prompted to return
/// the last bytes only.
Last(NonZeroU64),
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

RFC 7233 states that suffix ranges with length zero are not satisfiable, but doesn't say anything about them being inherently invalid. I would argue that they are though, which is why I'd say that we should make them impossible using the type system.

@@ -141,278 +146,454 @@ impl ByteRangeSpec {
/// length of the selected representation).
///
/// [1]: https://tools.ietf.org/html/rfc7233
pub fn to_satisfiable_range(&self, full_length: u64) -> Option<(u64, u64)> {
pub fn to_range_bounds(&self, len: u64) -> Option<impl RangeBounds<u64>> {
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As a comment for potential reviewers: This part down here is all commented out in the current tree, so most of the changes here don't break stuff for users. Some of the changes (which are necessary for RFC compliant operation) will break stuff, but there's not really a way around this right now.

let mut parts = s.splitn(2, '-');
fn err() -> InvalidRange {
InvalidRange { _inner: () }
}
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm very, very unhappy with error handling in this module, but I couldn't really find a better way that has precedent in this crate? I'm used to using thiserror, but adding dependencies is probably out of the question, right? As an alternative, writing better error types and implementing the stuff thiserror would provide is also an option, but I couldn't really find other cases where that was done in this crate either.

Aside of error types, for some of the choices that this crate makes, as an application developer, I'd be happy to get trace logs if I needed them. The old implementation just silently dropped invalid ranges, while the new one just refuses parsing. Neither is great, especially considering that the error given when parsing is refused isn't very helpful either.

Feedback in this area would be immensely helpful.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps you could take a look at how hyper does error handling, for consistency? Definitely think a more descriptive error type is the way to go.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oki, I will give it a look. For now I've just used ::Error

}
}
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any special place where I should put these 3 helper methods? Or are they good where they are?

@jcgruenhage jcgruenhage marked this pull request as ready for review October 18, 2021 16:17
@jcgruenhage jcgruenhage changed the title fix: support http suffix ranges make Range headers strongly typed (again) Oct 18, 2021
Copy link

@nsunderland1 nsunderland1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice :) Super glad to see someone cleaning this up, I'd wanted to do that a while ago but didn't get the time.

Self::Bytes(ranges) => Some(
ranges
.iter()
.filter_map(|spec| spec.to_range_bounds(len))

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The filter_map here feels like not the most intuitive (or desirable) behaviour IMO. I think if some of the subranges fail to convert to RangeBounds, this whole thing should probably fail?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah I see, this is consistent with RFC 7233. The wording there implies that it's only an error if none of the ranges are satisfiable.

I still think that the behaviour is surprising, is there a more explicit name (to_satisfiable_range_bounds or something hopefully less verbose) that would help? I think the doc comment should at least note this behaviour :)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah yes, that method name should probably be a bit more verbose, and looking at it again, it's not quite correct either, because it still returns some if the Vec is empty. Will fix :)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mhm, thought about this a bit more (after I ran into trouble with returning None on empty vec): The option is to check whether it's a range header that makes sense to be represented by range bounds, so maybe Some together with an empty Vec does make sense. Again, more thoughts on this would be appreciated.

Copy link

@nsunderland1 nsunderland1 Nov 3, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As far as I can tell from RFC 7233, there's no explicit behaviour specified for unrecognized range units. With that in mind, it's probably best to keep this method as simple as possible and let the caller decide how they want to handle it. Think you're right about returning Some of empty Vec.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've changed the return type of both Range::to_satisfiable_range_bounds and ByteRangeSpec::to_satisfiable_range_bounds from Option to Result. I'm using ::Error as the error type here although I would prefer a custom error type here indicating that the error happened during conversion to end inclusive range bounds. As I haven't seen any custom error types apart from ::Error and I'm not sure what the right convention would be for custom error types, I'm just using ::Error here.

src/common/range.rs Outdated Show resolved Hide resolved
src/common/range.rs Outdated Show resolved Hide resolved
.filter_map(|spec: &str| -> Option<ByteRangeSpec> {
let spec = spec.parse();
if spec.is_err() {
invalid = true;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right, it's a little hard to judge whether this is saying "you should reject on invalid ranges" or "this is up to the implementer". I feel like it's less surprising if you fail in this case, especially if the alternative is silently dropping ranges.

src/common/range.rs Outdated Show resolved Hide resolved
src/common/range.rs Outdated Show resolved Hide resolved
src/common/range.rs Outdated Show resolved Hide resolved
@jcgruenhage jcgruenhage force-pushed the fix-suffix-headers branch 5 times, most recently from fe28be4 to 5e0fecb Compare November 2, 2021 14:30
fn from_str(s: &str) -> Result<Self, Self::Err> {
let mut iter = s.trim().splitn(2, '-');
Ok(
match (parse_bound(iter.next())?, parse_bound(iter.next())?) {
Copy link

@nsunderland1 nsunderland1 Nov 3, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it makes sense to pass an Option to parse_bound here. Here are the situations where iter.next() can return None here:

  • for the first call, it's impossible. Even for empty string, splitn will give at least one element. Playground, also splitn docs show this
  • for the second call to iter.next(), you'll only get None if the string doesn't contain -. Even in the foo- case, you'll get empty string as the second component. Playground

If the string doesn't contain a -, that's an error.

Copy link

@herkhinah herkhinah Feb 1, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oki, I've changed this function back to the old implementation that didn't have these issues

@herkhinah herkhinah force-pushed the fix-suffix-headers branch 2 times, most recently from 2554a56 to 00f87b9 Compare February 1, 2022 13:26
From what I can see, the old, good implementation was commented out back
in 2018 during some large refactoring of the whole crate, and 4b42e4f
reenabled the range module,  while replacing the old, strongly typed
implementation with an implmentation that moves parsing of ranges out of
the decoding, into an iterator function. While the implementation that
4b42e4f brought was certainly better than no implementation at all, the
old implementation should've been fixed instead, considering that it was
mostly spec compliant and supported more variants than the back then new
implementation, and it even had tests, which the other one had not.

Sadly, the I had only found the old implementation after writing a
mostly feature complete new implementation, so this wasted a lot of
effort right here. With the old implementation found, and the new and
improved implementation at hand, a consolidated implementation did
arise, now with more documentation, stronger type guarantees and more
test cases then ever.

Signed-off-by: Jan Christian Grünhage <[email protected]>
@seanmonstar
Copy link
Member

Apologies for taking so long, part of the problem was the API breakage in this change. I've opened #155 as an alternative now that a breaking change is possible, but went that direction because it's more conservative. It just fixes what was broken, without added much new surface area to express everything in the spec.

I'm hoping to start with the immediate fix, and if there's desire to construct or parse much more complicated Range headers, we can file an issue with a proposal that doesn't use as many public enums (I know they were there originally, this crate learned they are API compatibility hazards).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Mapping of HTTP ranges to std::ops::Bound based ranges is faulty
4 participants