-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GH-43694: [C++] Add an ExecContext
Option to arrow::dataset::ScanOptions
#43698
base: main
Are you sure you want to change the base?
Conversation
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This general looks ok but maybe I need check whether we've previously similiar api for this
cpp/src/arrow/util/thread_pool.h
Outdated
@@ -20,6 +20,7 @@ | |||
#include <cstdint> | |||
#include <memory> | |||
#include <queue> | |||
#include <thread> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just curious where this is used?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh that was just my mistake, I removed it.
ExecContext
Option to arrow::dataset::ScanOptions
ExecContext
Option to arrow::dataset::ScanOptions
I remember https://github.com/apache/arrow/pull/35464/files a long time ago, maybe I should take a careful round |
options->exec_context = | ||
::arrow::ExecContext(::arrow::default_memory_pool(), pools.back().get()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How can we ensure this is being called?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't need to ensure that options->exec_context
is set because initially the ScanOptions
is constructed with the default ExecContext
constructor that uses the default MemoryPool
and nullptr for the executor. Thus, we need to check everywhere if the exec_context->executor
is null and use the default CPU pool if true.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@westonpace I didn't familiar with the design here, generally the io_context for cpu is ok for me, but is here exists better place for this?
@mapleFU @westonpace Sorry for the delay, I addressed your comments. |
It seems like the MacOS 13 C++ AMD64 build failed while downloading an ORC C++ library. Should I retrigger the CI or wait after a review? |
Just leave it here or rebase firstly, sorry for delaying |
@mapleFU No worries, I'll just leave it here then. Just was hoping to get this in for Arrow 19. |
Rationale for this change
(See #43694)
What changes are included in this PR?
Added the option
ExecContext exec_context
toarrow::dataset::ScanOptions
and modified the scanner and sub-functions to either use the internally specified thread pool or the default internal pool when necessary.Are these changes tested?
Added a Parquet scanner test that uses the new ExecContext using a separate thread pool for each fragment.
Are there any user-facing changes?
Yes, adds a new option. I'm not sure how to update the documentation though