Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ReusableGoroutinesPool: Add optional action when Go is called after Close #606

Closed
wants to merge 1 commit into from

Conversation

julienduchesne
Copy link
Member

@julienduchesne julienduchesne commented Oct 10, 2024

Default behavior remains the same

There is a data race in mimir where the pool can be closed and Go is still called on it:
https://github.com/grafana/mimir/blob/0c6070552517bda1ccb97b8fc84ca50c591a71f7/pkg/distributor/distributor.go#L528-L532

Rather than handling this in Mimir, this can be handled in the util directly

In Mimir, we'll be able to define this pool as wp := concurrency.NewReusableGoroutinesPool(cfg.ReusableIngesterPushWorkers).WithClosedAction(concurrency.ErrorWhenClosed) and we can check for errors wherever we call Go on it.

Note: By expanding this utility a bit more, I can also reuse it for the memberlist TCP transport (#525). It's a very similar worker pool

@julienduchesne julienduchesne marked this pull request as ready for review October 10, 2024 16:13
@julienduchesne julienduchesne changed the title ReusableGoroutinesPool: Add protection to Close ReusableGoroutinesPool: Add optional action when Go is called after Close Oct 10, 2024
@julienduchesne julienduchesne requested a review from a team October 10, 2024 16:20
@colega
Copy link
Contributor

colega commented Oct 10, 2024

I feel it's too much logic just to handle the fact that our service shutdown process doesn't have proper dependencies. Can we just fix it in mimir, and instead of this being a subservice, close the pool when distributor's service is closed?

It feels wrong to me that this component now has almost more logic to handle the fact that someone can't properly handle it's lifecycle than it's main business logic.

There is a data race in mimir where the pool can be closed and `Go` is still called on it:
https://github.com/grafana/mimir/blob/0c6070552517bda1ccb97b8fc84ca50c591a71f7/pkg/distributor/distributor.go#L528-L532

Rather than handling this in Mimir, this can be handled in the util directly

In Mimir, we'll be able to define this pool as `wp := concurrency.NewReusableGoroutinesPool(cfg.ReusableIngesterPushWorkers).WithClosedAction(concurrency.ErrorWhenClosed)` and we can check for errors wherever we call `Go` on it.
@colega
Copy link
Contributor

colega commented Oct 10, 2024

OTOH, can you explain what's the data race here? I would assume this component would just spawn goroutines instead of using workers once closed. How that causes a data race?

@julienduchesne
Copy link
Member Author

OTOH, can you explain what's the data race here? I would assume this component would just spawn goroutines instead of using workers once closed. How that causes a data race?

Seems to be on the close sequence. Closing the channel before the distributors are done pushing. Seems to be an issue with closing the services in the correct order

I feel it's too much logic just to handle the fact that our service shutdown process doesn't have proper dependencies. Can we just fix it in mimir, and instead of this being a subservice, close the pool when distributor's service is closed?

Yeah, definitely possible. I thought that making this component more robust would be nice, rather than playing with dependency orders in Mimir (making sure this service is closed after the distributor), but either is possible

@julienduchesne
Copy link
Member Author

julienduchesne commented Oct 10, 2024

I think this just needs a mutex, not all the new logic (like this: grafana/mimir#9585). I redid this PR here: #607

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants