Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MATLAB] Add ability to construct RecordBatchStreamReader from uint8 array #45263

Closed
kevingurney opened this issue Jan 14, 2025 · 1 comment
Closed

Comments

@kevingurney
Copy link
Member

Describe the enhancement requested

To enable more workflows using the IPC Stream format in the MATLAB interface, we should add the ability to construct a RecordBatchStreamReader from a MATLAB uint8 array. This will be helpful, for example, to enable Arrow-over-HTTP workflows in conjunction with the MATLAB webread function (which can return a uint8 array from an HTTP request).

This is a followup issue to #44923.

Component(s)

MATLAB

@kevingurney kevingurney self-assigned this Jan 14, 2025
@github-project-automation github-project-automation bot moved this to Backlog in Arrow MATLAB Jan 14, 2025
kevingurney added a commit that referenced this issue Jan 17, 2025
… from `uint8` array (#45274)

### Rationale for this change

To enable more workflows using the IPC Stream format in the MATLAB interface, this pull request adds the ability to construct a `RecordBatchStreamReader` from a MATLAB `uint8` array.

This is helpful, for example, to enable Arrow-over-HTTP workflows in conjunction with the [MATLAB `webread` function](https://www.mathworks.com/help/matlab/ref/webread.html) (which can return a `uint8` array from an HTTP request).

This is a followup issue to #44923.

### What changes are included in this PR?

1. Added a new `static` "construction function" `arrow.io.ipc.RecordBatchStreamReader.fromBytes(bytes)`.
2. Added a new `static` "construction function" `arrow.io.ipc.RecordBatchStreamReader.fromFile(filename)`.
3. Changed the signature of the `arrow.io.ipc.RecordBatchStreamReader` constructor to no longer directly accept a `filename` as an input. Instead, a `arrow.io.ipc.RecordBatchStreamReader` can now only be directly constructed from a `libmexclass.proxy.Proxy` instance. This mirrors the design of other MATLAB classes which wrap `Proxy` instances in the MATLAB interface. To construct `RecordBatchStreamReader` objects from an Arrow IPC Stream file on disk, users can instead use the new `static` "construction function" `arrow.io.ipc.RecordBatchStreamReader.fromFile(filename)`.

### Are these changes tested?

Yes.

1. Updated tests in `arrow/matlab/test/arrow/io/ipc/tRecordBatchStreamReader.m` to be parameterized over the `fromFile` and `fromBytes` "construction functions".
2. Added a new test to verify that an appropriate error is thrown if the `RecordBatchStreamReader` constructor is called directly with an input that is not a `libmexclass.proxy.Proxy` instance.

### Are there any user-facing changes?

Yes.

1. Users can now create `arrow.io.ipc.RecordBatchStreamReader` objects from an Arrow IPC Stream file on disk using the new `static` "construction function" `arrow.io.ipc.RecordBatchStreamReader.fromFile(filename)`.
2. Users can now create `arrow.io.ipc.RecordBatchStreamReader` objects from an in-memory MATLAB `uint8` "bytes" array using the new `static` "construction function" `arrow.io.ipc.RecordBatchStreamReader.fromBytes(bytes)`.

**This PR includes breaking changes to public APIs.**

This PR changes the signature of the public `arrow.io.ipc.RecordBatchStreamReader` constructor to no longer directly accept a `filename` as an input. Instead, a `arrow.io.ipc.RecordBatchStreamReader` can now only be directly constructed from a `libmexclass.proxy.Proxy` instance. This mirrors the design of other MATLAB classes which wrap `Proxy` instances in the MATLAB interface. To construct `RecordBatchStreamReader` objects from an Arrow IPC Stream file on disk, users can instead use the new `static` "construction function" `arrow.io.ipc.RecordBatchStreamReader.fromFile(filename)`.

### Future Directions

1. Use the new `static` "construction function" `arrow.io.ipc.RecordBatchStreamReader.fromBytes(bytes)` in an example to demonstrate how to read an Arrow IPC Stream from an HTTP endpoint as part of [apache/arrow-experiments](https://github.com/apache/arrow-experiments/tree/main/http/get_simple).

### Notes

1. Thank you @ sgilmore10 for your help with this pull request!
* GitHub Issue: #45263

Authored-by: Kevin Gurney <[email protected]>
Signed-off-by: Kevin Gurney <[email protected]>
@kevingurney
Copy link
Member Author

Issue resolved by pull request 45274
#45274

@kevingurney kevingurney added this to the 20.0.0 milestone Jan 17, 2025
@github-project-automation github-project-automation bot moved this from Backlog to Done in Arrow MATLAB Jan 17, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Done
Development

No branches or pull requests

1 participant