-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GH-43408: [C++] IO: Make Advance to virtual and naive implement it #43409
base: main
Are you sure you want to change the base?
Conversation
|
This is just a naive impl, I'll test and refactor after interface is stable. Currently I think we may change Status Advance(int64_t nbytes); To Result<int64_t> Advance(int64_t nbytes); |
// TODO(mwish): Considering using raw_->Advance if available, | ||
// currently we don't have a way to know if the underlying stream supports fast | ||
// skipping. So we just read and discard the data. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe some code like:
/// \brief Return true if InputStream is capable of zero copy Buffer reads
///
/// Zero copy reads imply the use of Buffer-returning Read() overloads.
virtual bool supports_zero_copy() const;
Would help, a supports_fast_advance
like https://github.com/ClickHouse/ClickHouse/blob/1b2fd51e090214deb340a76833bab7b4985eecfc/src/Disks/IO/ReadBufferFromRemoteFSGather.h#L19 might work
// TODO(mwish): Considering using raw_->Advance if available, | ||
// currently we don't have a way to know if the underlying stream supports fast | ||
// skipping. So we just read and discard the data. | ||
auto result = Read(remain_skip_bytes); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not simply call Advance
? The default implementation calls Read
anyway.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not simply call Advance? The default implementation calls Read anyway.
If underlying don't have supports_fast_advance
, it would call "read" without buffering, and might be a low-efficient direct read and less efficent than "do large buffer and read"
return Status::OK(); | ||
} | ||
|
||
if (nbytes < bytes_buffered_) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if (nbytes < bytes_buffered_) { | |
if (nbytes <= bytes_buffered_) { |
Yes, that sounds reasonable. |
I would change this, thanks! |
By the way, why do you need this? |
Emmm I'm using stream to read a local file, when I trying to I can avoid calling read by re-construct a stream, however, maybe enhance advance is more intuitive |
Hmm, I see we're using Advance in Parquet to skip data pages according to statistics, so perhaps this would be useful after all. arrow/cpp/src/parquet/column_reader.cc Lines 477 to 481 in e54ad41
|
Sigh, I don't like SkipPage api here, I found it might help decoding in arrow-ipc format |
Rationale for this change
What changes are included in this PR?
Are these changes tested?
Are there any user-facing changes?