Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GH-41110: [C#] Handle empty stream in ArrowStreamReaderImplementation #43939

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

voidstar69
Copy link
Contributor

@voidstar69 voidstar69 commented Sep 3, 2024

Rationale for this change

Implementing this under the assumption that fixing #41110 is a good idea. Please do let me know if there are any contrary opinions on this subject.

What changes are included in this PR?

Handle empty stream in ArrowStreamReaderImplementation. I have not made similar changes to ArrowMemoryReaderImplementation or ArrowFileReaderImplementation.

Are these changes tested?

I have created two basic unit tests covering this new behaviour. This might not be sufficient to cover all cases where an empty stream should be handled without an exception occurring.

Copy link

github-actions bot commented Sep 3, 2024

⚠️ GitHub issue #41110 has been automatically assigned in GitHub to PR creator.

@@ -55,6 +55,9 @@ public override RecordBatch ReadNextRecordBatch()

protected async ValueTask<RecordBatch> ReadRecordBatchAsync(CancellationToken cancellationToken = default)
{
if (BaseStream.Length == 0)
return null;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any downside to checking the length of the stream? This might require all data to be read from e.g. a network socket, when all we really want to know here is whether there will be any data at all. Is there a better way to determine this?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it's even possible to check the length of a network stream.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the tests show the same.
I could instead try to e.g. read a single byte from the stream, to see if it produces any data. But then the processing of the schema would need that byte, and I cannot "push" that byte back onto the stream, unless I wrap it into some other type of stream.
Maybe this feature is not worth the cost?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this would have to work by remembering whether or not we're at the beginning of a possible message and then swallowing the end-of-stream exception if and only if that was the position.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code now avoids throwing an exception if zero bytes are read when reading the schema, instead it simply returns without reading the message.
Is this sufficient? Will a schema always be read before each message is read? Does something similar need to be implemented for reading from files and from memory buffers?

@github-actions github-actions bot added awaiting committer review Awaiting committer review and removed awaiting review Awaiting review labels Sep 3, 2024
@voidstar69 voidstar69 marked this pull request as draft September 3, 2024 21:37
@voidstar69 voidstar69 marked this pull request as ready for review September 19, 2024 21:37
@voidstar69
Copy link
Contributor Author

@CurtHagenlocher are you happy with the current state of this PR?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants