Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GH-45263: [MATLAB] Add ability to construct RecordBatchStreamReader from uint8 array #45274

Merged
merged 10 commits into from
Jan 17, 2025

Conversation

kevingurney
Copy link
Member

@kevingurney kevingurney commented Jan 15, 2025

Rationale for this change

To enable more workflows using the IPC Stream format in the MATLAB interface, this pull request adds the ability to construct a RecordBatchStreamReader from a MATLAB uint8 array.

This is helpful, for example, to enable Arrow-over-HTTP workflows in conjunction with the MATLAB webread function (which can return a uint8 array from an HTTP request).

This is a followup issue to #44923.

What changes are included in this PR?

  1. Added a new static "construction function" arrow.io.ipc.RecordBatchStreamReader.fromBytes(bytes).
  2. Added a new static "construction function" arrow.io.ipc.RecordBatchStreamReader.fromFile(filename).
  3. Changed the signature of the arrow.io.ipc.RecordBatchStreamReader constructor to no longer directly accept a filename as an input. Instead, a arrow.io.ipc.RecordBatchStreamReader can now only be directly constructed from a libmexclass.proxy.Proxy instance. This mirrors the design of other MATLAB classes which wrap Proxy instances in the MATLAB interface. To construct RecordBatchStreamReader objects from an Arrow IPC Stream file on disk, users can instead use the new static "construction function" arrow.io.ipc.RecordBatchStreamReader.fromFile(filename).

Are these changes tested?

Yes.

  1. Updated tests in arrow/matlab/test/arrow/io/ipc/tRecordBatchStreamReader.m to be parameterized over the fromFile and fromBytes "construction functions".
  2. Added a new test to verify that an appropriate error is thrown if the RecordBatchStreamReader constructor is called directly with an input that is not a libmexclass.proxy.Proxy instance.

Are there any user-facing changes?

Yes.

  1. Users can now create arrow.io.ipc.RecordBatchStreamReader objects from an Arrow IPC Stream file on disk using the new static "construction function" arrow.io.ipc.RecordBatchStreamReader.fromFile(filename).
  2. Users can now create arrow.io.ipc.RecordBatchStreamReader objects from an in-memory MATLAB uint8 "bytes" array using the new static "construction function" arrow.io.ipc.RecordBatchStreamReader.fromBytes(bytes).

This PR includes breaking changes to public APIs.

This PR changes the signature of the public arrow.io.ipc.RecordBatchStreamReader constructor to no longer directly accept a filename as an input. Instead, a arrow.io.ipc.RecordBatchStreamReader can now only be directly constructed from a libmexclass.proxy.Proxy instance. This mirrors the design of other MATLAB classes which wrap Proxy instances in the MATLAB interface. To construct RecordBatchStreamReader objects from an Arrow IPC Stream file on disk, users can instead use the new static "construction function" arrow.io.ipc.RecordBatchStreamReader.fromFile(filename).

Future Directions

  1. Use the new static "construction function" arrow.io.ipc.RecordBatchStreamReader.fromBytes(bytes) in an example to demonstrate how to read an Arrow IPC Stream from an HTTP endpoint as part of apache/arrow-experiments.

Notes

  1. Thank you @sgilmore10 for your help with this pull request!

@kevingurney kevingurney marked this pull request as ready for review January 15, 2025 21:33
@github-actions github-actions bot added awaiting changes Awaiting changes and removed awaiting committer review Awaiting committer review labels Jan 16, 2025
Copy link
Member

@kou kou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

@github-actions github-actions bot added awaiting merge Awaiting merge and removed awaiting changes Awaiting changes labels Jan 17, 2025
@kevingurney
Copy link
Member Author

+1

@github-actions github-actions bot added awaiting changes Awaiting changes and removed awaiting merge Awaiting merge labels Jan 17, 2025
@kevingurney kevingurney merged commit 1fe27fe into apache:main Jan 17, 2025
14 checks passed
@kevingurney kevingurney removed the awaiting changes Awaiting changes label Jan 17, 2025
@kevingurney kevingurney deleted the GH-45263 branch January 17, 2025 21:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants