-
Notifications
You must be signed in to change notification settings - Fork 65
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Enable HTTP range header support for data requests #61
Conversation
Hey, @pescennius. Thanks for the PR! This is definitely a feature we'd like to integrate. I think there are a couple things we'll want to change in terms of where the range application to the stream happens in the code, but this is a great start. I'll get some notes on that written up in the next day or two about proposed, and then we can discuss. You, and anyone else interested in this functionality, should be feel comfortable building on it though prior to integration. The API for this is already standardized by itself HTTP, so that won't change. |
Side note - I love this motivation:
Shared data sets on Arweave is one of the applications I'm personally most excited about. |
@djwhitt awesome thanks for the quick reply. Def open to notes on implementation. Thanks for the opportunity to contribute 😅 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alright, I think we should work on this this as follows:
- Address the inline comments.
- Add some tests. We don't currently have any for handlers, but they're structured to support testing without too much hassle.
- Push some of the range handling down into the ContiguousDataSource implementations so that we can avoid reading all the data from caches (FS and S3 in the future).
Let's focus on 1 and 2 and not worry too much about 3 yet. For 2 (tests) don't worry about getting them complete. I'd just like something minimal in place to help prevent regressions. If you get stuck on any part of that, please reach out. We (ar.io team) will help fill in the gaps as needed.
@@ -64,6 +65,70 @@ const setDataHeaders = ({ | |||
res.header('Content-Length', data.size.toString()); | |||
}; | |||
|
|||
const handlePartialDataResponse = (log: Logger, rangeHeader: string, res: Response, data: ContiguousData, dataAttributes: ContiguousDataAttributes | undefined ) => { | |||
const totalSize = data.size; | |||
const parts = rangeHeader.replace(/bytes=/, "").split("-"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should probably use a range request parsing library here. We've used https://github.com/jshttp/range-parser elsewhere internally and it's worked well.
const chunkSize = end - start + 1; | ||
|
||
// Check if the range is valid | ||
if (start >= 0 && end < totalSize && start <= end) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should try to handle multiple ranges too since that's part of the spec. https://www.npmjs.com/package/byte-range-stream should work for that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added the library but my testing with curl doesn't seem to be making the multi range work. I still just get back the content of the first part of the range
@djwhitt I'll handle the smaller requests you've noted in terms of code structure this evening. I agree #3 is a good optimization but a bit out of my depth right now. Do you have any pointers on where to start with a test? Go/Python is my normal backend toolchain so some of this is a bit new for me. |
@pescennius Sounds good. It's fine to hold off on tests. I'll try to get some basic test scaffolding in place for HTTP handlers this week that you can build off of. |
Also, re #3 - no worries, I figured that would end up landing on us, but wanted to offer just in case. I think we can merge without it too. It's really just a performance and memory optimization. It doesn't need to hold up the core functionality. |
@pescennius Finally found some time to add an example test here: 1538561 I'll try to spend some more time on this next week too, adding more tests and possibly getting into the lower level refactoring. |
@djwhitt let me know how I can best be helpful from here, including handing things off to you to wrap up. |
@pescennius No worries. I'll play around with it this week and see if I can make some progress. |
@pescennius Give this branch a try when you find time and let me know that it works okay for your use case: https://github.com/ar-io/ar-io-node/tree/PE-4908-range-request-integration It includes a bit of refactoring and improved error handling, but ignores multiple ranges for now (handles them with 200 + full data). I started to add multiple ranges too but realized it would be terribly inefficient till I got range handling integrated into the lower layers of the data stack. I didn't want to hold up integration for that though, so I'll do that in a future branch/PR. |
@djwhitt took a look at a branch and the changes look good 👍 . My only thought is that the multi range should raise a |
@pescennius Thanks for the feedback. I think you're right. I switched it to Closing this PR since the integration work happened in #64. I'll try to get support for multiple ranges added soon too, but realistically it might not land till the new year. Thanks again for the contribution! |
Pull Request: HTTP Range Header Support
Change Summary
This pull request introduces support for the HTTP Range header in our project, offering enhanced capabilities for node runners. The primary motivation behind this change is to enable developers to leverage SQLite and DuckDB more seamlessly with Arweave. Both of these technologies rely on range requests, enabling the browser to function as an edge compute platform for file-based databases.
Motivation
The introduction of HTTP Range header support brings several benefits to users:
Empowering Developers: This change opens up exciting opportunities for developers to take full advantage of technologies such as sql.js-httpvfs and DuckDB Wasm. It allows the browser to efficiently manage data retrieval, optimizing the user experience.
Efficient Resource Utilization: With support for the HTTP Range header, users can intelligently manage client applications, reducing egress and conserving bandwidth. This enhancement promotes efficient resource utilization. In the long run, pushing down the data slicing optimizations further could potentially increase some efficiency of the node itself.
How to Verify
To confirm that HTTP Range header support is functioning as expected, you can easily perform a test. Here's an example using the
curl
command:Open your terminal.
Execute the following commands with the ar-io folder as the current directory:
docker build . -t ar-io-core:http_range docker run -p 4000:4000 -v ar-io-data:/app/data ar-io-core:http-range curl -r 100-199 http://localhost:4000/IjdF4EVfdpTGvLymQjFqbMYOIaBn0Q9oUSz3tMTiQkw/
Resources
For more information about the technologies I'm enabling, please refer to the following resources: