Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add classes for Task and SharedTask in threadpool #5391

Merged
merged 10 commits into from
Dec 9, 2024

Conversation

ypatia
Copy link
Member

@ypatia ypatia commented Nov 29, 2024

Currently ThreadPool::Task is a typedef to std:: future<Status>. This PR:

  1. Replaces this with a full class that imposes things like wait for our async processes to go through our ThreadPool's wait method. In that way, we never use a std::future's wait method directly and we avoid possible deadlocks because only the ThreadPool's wait will yield.
  2. Adjusts the structure and relationship of tasks and the threadpool so that the caller can rely on the task's internal functions and doesn't need to keep track of the relationship between the ThreadPool and each task manually throughout the codebase.
  3. Adds a new first class citizen of our ThreadPool: SharedTask, that is encapsulating std::shared_future which allows multiple threads to wait on an async operation result to become available. This is needed for the upcoming work on parallelizing IO and compute operations in the codebase even further.

This is heavily influenced from similar work by @Shelnutt2.


TYPE: IMPROVEMENT
DESC: Add classes for Task and SharedTask in threadpool

@ypatia ypatia requested a review from Shelnutt2 November 29, 2024 13:54
@ypatia ypatia force-pushed the yt/sc-59606/threadpool_with_tasks branch from 411e836 to e3aff67 Compare November 29, 2024 13:57
@ypatia ypatia force-pushed the yt/sc-59606/threadpool_with_tasks branch from 0b11b03 to 2d0a155 Compare November 29, 2024 14:10
@ypatia ypatia force-pushed the yt/sc-59606/threadpool_with_tasks branch 3 times, most recently from 49dfadf to 7c9acb3 Compare December 2, 2024 09:02
@ypatia
Copy link
Member Author

ypatia commented Dec 2, 2024

Passes CI, the MANYLINUX failure is a known issue.

@ypatia ypatia force-pushed the yt/sc-59606/threadpool_with_tasks branch from 7c9acb3 to bdd4b48 Compare December 2, 2024 11:01
tiledb/common/thread_pool/thread_pool.h Outdated Show resolved Hide resolved
tiledb/common/thread_pool/thread_pool.h Outdated Show resolved Hide resolved
tiledb/common/thread_pool/thread_pool.h Outdated Show resolved Hide resolved
tiledb/common/thread_pool/thread_pool.h Show resolved Hide resolved
tiledb/common/thread_pool/thread_pool.h Outdated Show resolved Hide resolved
@@ -432,7 +432,7 @@ Status DenseReader::dense_read() {
// prevent using too much memory when the budget is small and doesn't allow
// to process more than one batch at a time.
if (wait_compute_task_before_read && compute_task.valid()) {
throw_if_not_ok(resources_.compute_tp().wait(compute_task));
throw_if_not_ok(resources_.compute_tp().wait(&compute_task));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the plan to replace all calls to tp.wait(task) with task.wait()?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, I can do it already if you think it should be in this PR, or in the following work as I was planning.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, on the one hand I prefer the existing APIs because the action primarily happens on the thread pool instead of the task, but on the other hand there is already a wait() method on std::future that we must be careful not to call because of deadlocks. 🤔

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, and also having to pass around a reference to the threadpool object everywhere is quite cumbersome.

tiledb/common/thread_pool/thread_pool.h Outdated Show resolved Hide resolved
tiledb/common/thread_pool/thread_pool.h Outdated Show resolved Hide resolved
@ypatia ypatia force-pushed the yt/sc-59606/threadpool_with_tasks branch from e30927b to c92a83c Compare December 3, 2024 14:56
@ypatia ypatia merged commit 2637682 into dev Dec 9, 2024
63 checks passed
@ypatia ypatia deleted the yt/sc-59606/threadpool_with_tasks branch December 9, 2024 08:36
@ihnorton ihnorton requested a review from rroelke December 12, 2024 15:31
}

/**
* Is this task valid. Wait can only be called on vaid tasks.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo: valid

}

/**
* The encapsulated std::shared_future
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The PR description mentions that it is not safe to use the wait method of this but that doesn't live in any of these comments. I think it probably should. Either that or std::future isn't really the right thing to use (in the hypothetical future state, for now it makes sense transiently)

*/
std::future_status wait_for(
const std::chrono::milliseconds timeout_duration) const {
return f_.wait_for(timeout_duration);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This blocks, doesn't it?
So it will prevent other tasks in the thread pool from running on this thread until the duration has elapsed.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But from what I've read our thread pool doesn't really have much support for event-based task wake-ups right?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes it blocks. The current thread pool is pretty basic. Ideally we should evolve it (maybe transitioning to coroutines?) to be able to put back in the queue a task that is blocked waiting. This will also solve the deadlocks that are very much possible today in our system with that stacking of blocked tasks. Let's discuss more in a call if you'd like.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll follow up in 2025 about coroutines. They're definitely great; but hopefully we can avoid inventing a new runtime

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed, any more incremental a change would be welcome. I thought about multiple task queues and work stealing as well, or a way to suspend a thread when it's blocked and put it back to the single task queue that we have today, but I am not sure if/how the latter could be done without coroutines. Let's discuss further and we could also put together some document with ideas.

@@ -368,7 +368,7 @@ Status DenseReader::dense_read() {
// This is as far as we should go before implementing this properly in a task
// graph, where the start and end of every piece of work can clearly be
// identified.
ThreadPool::Task compute_task;
ThreadPool::SharedTask compute_task;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's easy to see that this is waited upon multiple times - but if it worked fine before, why is it necessary for it to be shared now?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants