-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GH-41110: [C#] Handle empty stream in ArrowStreamReaderImplementation #43939
base: main
Are you sure you want to change the base?
Conversation
|
@@ -55,6 +55,9 @@ public override RecordBatch ReadNextRecordBatch() | |||
|
|||
protected async ValueTask<RecordBatch> ReadRecordBatchAsync(CancellationToken cancellationToken = default) | |||
{ | |||
if (BaseStream.Length == 0) | |||
return null; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there any downside to checking the length of the stream? This might require all data to be read from e.g. a network socket, when all we really want to know here is whether there will be any data at all. Is there a better way to determine this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think it's even possible to check the length of a network stream.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, the tests show the same.
I could instead try to e.g. read a single byte from the stream, to see if it produces any data. But then the processing of the schema would need that byte, and I cannot "push" that byte back onto the stream, unless I wrap it into some other type of stream.
Maybe this feature is not worth the cost?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this would have to work by remembering whether or not we're at the beginning of a possible message and then swallowing the end-of-stream exception if and only if that was the position.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This code now avoids throwing an exception if zero bytes are read when reading the schema, instead it simply returns without reading the message.
Is this sufficient? Will a schema always be read before each message is read? Does something similar need to be implemented for reading from files and from memory buffers?
@CurtHagenlocher are you happy with the current state of this PR? |
Rationale for this change
Implementing this under the assumption that fixing #41110 is a good idea. Please do let me know if there are any contrary opinions on this subject.
What changes are included in this PR?
Handle empty stream in
ArrowStreamReaderImplementation
. I have not made similar changes toArrowMemoryReaderImplementation
orArrowFileReaderImplementation
.Are these changes tested?
I have created two basic unit tests covering this new behaviour. This might not be sufficient to cover all cases where an empty stream should be handled without an exception occurring.