Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

object-store: http shouldn't perform range requests unless accept-ranges: bytes header is present #4839

Closed
universalmind303 opened this issue Sep 19, 2023 · 4 comments · Fixed by #4841
Labels
bug object-store Object Store Interface

Comments

@universalmind303
Copy link
Contributor

Describe the bug
if you have a remote file https://somewhere.com/file.parquet and your server does not support ranges as specified by the accept-ranges: bytes header, the object store should not try to perform range scans.
To Reproduce

#[tokio::main]
async fn main() {
    let http = HttpBuilder::new().with_url("https://not.arealwebsite.com/file.parquet");
    let store = http.build().unwrap();
    let path = Path::default();
    let mut options = GetOptions::default();
    let some_range = 100..2000;
    options.range = Some(some_range);
    let data = store.get_opts(&path, options).await.unwrap();
}

Expected behavior
the range is ignored as the server doesn't specify it can handle it.

Additional context

@tustvold
Copy link
Contributor

tustvold commented Sep 19, 2023

The RFC would suggest this is not a requirement - https://www.rfc-editor.org/rfc/rfc9110#section-14.3

Potentially we should instead be checking for a 206 response and returning an error if we get back a 200 🤔.

I definitely think we should return an error if a range request is requested but ignored, as falling back to fetching the entire file is almost never desirable.

@universalmind303
Copy link
Contributor Author

Hmm. If we'd error on a 200, wouldn't we have still performed the full request at that point though?

... But i guess there's no good way to actually know if a server supports it without actually trying to perform the request.

@tustvold
Copy link
Contributor

tustvold commented Sep 19, 2023

performed the full request at that point though

We will have read the response headers but not streamed the majority of the request body. I'm not sure what hyper does if you simply discard the response body.

Regardless I view this as being an exceptional case not worth overly optimising for, as I anticipate most workloads will simply treat it as fatal error and abort the query/task

@tustvold
Copy link
Contributor

label_issue.py automatically added labels {'object-store'} from #4841

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug object-store Object Store Interface
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants