Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GH-43694: [C++] Add an ExecContext Option to arrow::dataset::ScanOptions #43698

Open
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

srilman
Copy link
Contributor

@srilman srilman commented Aug 14, 2024

Rationale for this change

(See #43694)

What changes are included in this PR?

Added the option ExecContext exec_context to arrow::dataset::ScanOptions and modified the scanner and sub-functions to either use the internally specified thread pool or the default internal pool when necessary.

Are these changes tested?

Added a Parquet scanner test that uses the new ExecContext using a separate thread pool for each fragment.

Are there any user-facing changes?

Yes, adds a new option. I'm not sure how to update the documentation though

Copy link

⚠️ GitHub issue #43694 has been automatically assigned in GitHub to PR creator.

Copy link
Member

@mapleFU mapleFU left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This general looks ok but maybe I need check whether we've previously similiar api for this

@@ -20,6 +20,7 @@
#include <cstdint>
#include <memory>
#include <queue>
#include <thread>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just curious where this is used?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh that was just my mistake, I removed it.

@github-actions github-actions bot added awaiting committer review Awaiting committer review and removed awaiting review Awaiting review labels Aug 14, 2024
@kou kou changed the title GH-43694: Add an ExecContext Option to arrow::dataset::ScanOptions GH-43694: [C++] Add an ExecContext Option to arrow::dataset::ScanOptions Aug 14, 2024
@mapleFU mapleFU self-requested a review August 15, 2024 13:10
@mapleFU
Copy link
Member

mapleFU commented Aug 16, 2024

I remember https://github.com/apache/arrow/pull/35464/files a long time ago, maybe I should take a careful round

Comment on lines +934 to +935
options->exec_context =
::arrow::ExecContext(::arrow::default_memory_pool(), pools.back().get());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How can we ensure this is being called?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't need to ensure that options->exec_context is set because initially the ScanOptions is constructed with the default ExecContext constructor that uses the default MemoryPool and nullptr for the executor. Thus, we need to check everywhere if the exec_context->executor is null and use the default CPU pool if true.

Copy link
Member

@mapleFU mapleFU left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@westonpace I didn't familiar with the design here, generally the io_context for cpu is ok for me, but is here exists better place for this?

@srilman srilman requested a review from mapleFU December 10, 2024 20:00
@srilman
Copy link
Contributor Author

srilman commented Dec 10, 2024

@mapleFU @westonpace Sorry for the delay, I addressed your comments.

@srilman
Copy link
Contributor Author

srilman commented Dec 11, 2024

It seems like the MacOS 13 C++ AMD64 build failed while downloading an ORC C++ library. Should I retrigger the CI or wait after a review?

@mapleFU
Copy link
Member

mapleFU commented Dec 11, 2024

Just leave it here or rebase firstly, sorry for delaying

@srilman
Copy link
Contributor Author

srilman commented Dec 17, 2024

@mapleFU No worries, I'll just leave it here then. Just was hoping to get this in for Arrow 19.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants