-
Notifications
You must be signed in to change notification settings - Fork 827
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
object_store: suffix requests #5206
Conversation
Ranges requested beyond the length of the resource can no longer cause panics in the `coalesce_ranges` function.
With default implementation using a HEAD and then GET request. See apache#4611.
Thank you for this, I'll try to take some time over the next week or so to see if we can't make this slightly less of a "bodge", to borrow your phrasing 😅 |
Apologies, I do understand the pressures of maintaining a stable API, just don't want to add tech debt which people may be lumbered with in future. The build failure seems to be from tokio-macro requiring rust 1.63, where the MSRV is 1.62.1. |
I wonder how urgently you require this functionality, if you're happy to wait until the new year we can probably look to make a breaking release then? |
Not urgent at all! My zarr library isn't (yet) based on object_store, so nothing's being held up - I just didn't want to start the work of rebasing it if I didn't know whether the suffix feature would land. |
What do you think then of updating this PR so that GetOptions holds an HttpRange, and we just look to release this as a breaking change in the new year? |
Sure thing, I have a different branch which has most of that put together already. With the possible advent of multipart ranges #4612 , what do you think about storing a multi-range type (which still implements Eh actually I guess that makes |
I am personally rather lukewarm on multipart ranges as they're not currently supported by any stores, and as you've articulated on the ticket, have rather opaque semantics. I say we cross that bridge when we come to it - it may be we support them outside the main ObjectStore trait. |
And before I go through all the implementations again, for the sake of object safety would you prefer I have a preference for reading the |
I would leave get_range and get_ranges unchanged, and only change GetOptions, especially given get_ranges, in particular colaesce_ranges, won't work with partial ranges. This will also avoid breaking existing code that uses get_range/get_ranges.
We cannot break object safety, it is critical to being able to abstract across stores based on runtime configuration. |
pub(crate) enum HttpRange { | ||
/// A range with a given first and last bytes. | ||
/// If `last > first`, the response should be empty. | ||
Range { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps we could avoid ambiguity and make it easier to port existing code by using an actual Range here, i.e. Range(Range<usize>)
, I appreciate this changes this to an exclusive bound but I think people are more familiar with these
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the impl From<Range>
makes porting existing code easy enough, could do a impl TryInto<Range>
going the other way to complete the loop. By renaming it from HttpRange -> GetRange
we aren't tied so closely to the HTTP spec so an exclusive bound would be fine.
Tuple variants all round does make construction easier, but with the different constructors, people shouldn't need to construct variants explicitly. I've been going back and forth about whether to have separate variants for ::Range
and ::Offset
or whether ::Range.last
should just be optional; having tuple structs everywhere is a good argument in favour of all separate.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A slight annoyance in using a Range
object here is that it isn't Copy
, although presumably the .clone()
is very cheap.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah IIRC this is just because it created a confusing situation when using Range as an iterator
/// | ||
/// At present, this is only used internally. | ||
#[derive(Debug, PartialEq, Eq, Clone, Copy)] | ||
pub(crate) enum HttpRange { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pub(crate) enum HttpRange { | |
pub(crate) enum GetRange { |
Perhaps, as it isn't just for HTTP
} | ||
} | ||
|
||
pub(crate) fn concrete_range(range: Option<HttpRange>, size: usize) -> Range<usize> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This could probably be a member function
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wanted it to cover the None
case too because turning an Option<HttpRange>
into a concrete Range
is something we do quite a lot, saves seeing opt_range.map_or(0..len, |r| r.concrete_range(len))
in several places. But it's not too much of an issue.
/// A range defined only by the first byte requested (requests all remaining bytes). | ||
Offset { | ||
/// Offset of the first byte requested. | ||
first: usize, | ||
}, | ||
/// A range defined as the number of bytes at the end of the resource. | ||
Suffix { | ||
/// Number of bytes requested. | ||
nbytes: usize, | ||
}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you think of making these tuple variants instead, this makes them easier to construct and match on
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are already extra constructors for each variant, but you're right that matching would be easier. The structs are a little more explicit but with good docs it should be clear.
I've marked this PR as a draft to reflect the above discussion |
And before I go through all the implementations again, for the sake of object safety would you prefer I have a preference for reading the |
As commented above we should only change GetOptions as this avoids breaking most existing workloads and object safety. With regards to inspecting the response Range, returning an error if we get something else back makes sense to me |
I have an implementation of coalesce_ranges which uses the new struct. There's a helper function which returns
Should be fine, I can see the |
Thanks for all your feedback! I have raised a new PR with the suggested (breaking) changes; this can be closed at your convenience. |
Closing in favor of #5222 |
Which issue does this PR close?
Relates to #4611, but should not close it.
Rationale for this change
Certain use cases require suffix requests (see linked issue). Most stores support such requests directly; a workaround for Azure is trivial although not fast.
What changes are included in this PR?
pub trait GetSuffixClient
, with an implementation for all clients except Azure. Additionally,ObjectStore::get_suffix
, with a default implementation with the workaround (HEAD then GET), which is then overridden in all stores except Azure.Also includes some of the infrastructure around full support for HTTP ranges, which is the direction this functionality should go in when we next make a breaking API change. That infra is probably overcomplicated for its use here, but gives a good foundation for full support later, e.g. 80cdf66
Are there any user-facing changes?
Downstream implementors of
ObjectStore
should override the defaultget_suffix
method, if their clients support a direct form, preferably by implementingGetSuffixClient
on their client.Remaining questions
GetSuffixClient
probably doesn't need to be a trait as it doesn't depend on any behaviour and isn't used in generics;get_suffix
could just be implemented directly on the client structs. However, it will make it much easier to delete suffix-specific behaviour when a breaking API change is allowed.nbytes
are longer than the resource. However, AFAICT that isn't currently done with current range requests outside of the local and memory stores.Better solution
Per the linked issue, GetOptions should contain a full representation of an HTTP range (implementing
From<Range<usize>>
) and this functionality should be built intoget_opts
.