Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix panic when updating the envoy-gateway-config configMap #5066

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

zhaohuabing
Copy link
Member

Fixes #5065

Release Notes: Yes

@zhaohuabing zhaohuabing requested a review from a team as a code owner January 15, 2025 03:58
@zhaohuabing zhaohuabing marked this pull request as draft January 15, 2025 04:05
@zhaohuabing zhaohuabing force-pushed the fix-5065 branch 5 times, most recently from 75c7eed to 72475f4 Compare January 15, 2025 05:43
Copy link

codecov bot commented Jan 15, 2025

Codecov Report

Attention: Patch coverage is 42.30769% with 15 lines in your changes missing coverage. Please review.

Project coverage is 66.89%. Comparing base (18207c1) to head (acdfb45).

Files with missing lines Patch % Lines
internal/infrastructure/runner/runner.go 0.00% 5 Missing ⚠️
internal/cmd/server.go 0.00% 4 Missing ⚠️
internal/provider/runner/runner.go 0.00% 3 Missing ⚠️
internal/provider/kubernetes/controller.go 60.00% 2 Missing ⚠️
internal/provider/kubernetes/kubernetes.go 80.00% 0 Missing and 1 partial ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #5066      +/-   ##
==========================================
+ Coverage   66.79%   66.89%   +0.09%     
==========================================
  Files         211      211              
  Lines       32928    32932       +4     
==========================================
+ Hits        21995    22029      +34     
+ Misses       9601     9580      -21     
+ Partials     1332     1323       -9     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@zhaohuabing zhaohuabing force-pushed the fix-5065 branch 2 times, most recently from e1c233c to 7dcfe0f Compare January 15, 2025 07:16
@zhaohuabing zhaohuabing changed the title fix panic after updating the envoy-gateway-config configMap fix panic when updating the envoy-gateway-config configMap Jan 15, 2025
@zhaohuabing zhaohuabing marked this pull request as ready for review January 15, 2025 07:56
@zhaohuabing zhaohuabing force-pushed the fix-5065 branch 3 times, most recently from d9edb07 to 36b31e2 Compare January 15, 2025 11:36
test/e2e/e2e_test.go Outdated Show resolved Hide resolved
tools/make/kube.mk Outdated Show resolved Hide resolved
@@ -128,6 +128,10 @@ func getConfigByPath(cfgPath string) (*config.Server, error) {
// setupRunners starts all the runners required for the Envoy Gateway to
// fulfill its tasks.
func setupRunners(ctx context.Context, cfg *config.Server) (err error) {
// The Elected channel is used to block the tasks that are waiting for the leader to be elected.
// It will be closed once the leader is elected in the controller manager.
cfg.Elected = make(chan struct{})
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we cannot use one channel to block multiple threads (status updater and infra runner)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The channel is closed after an EG instance is elected, so all the blocked threads will be notified.

@arkodg
Copy link
Contributor

arkodg commented Jan 15, 2025

@zhaohuabing can you help explain why can't we use a waitgroup anymore ?

@zhaohuabing
Copy link
Member Author

zhaohuabing commented Jan 16, 2025

@zhaohuabing can you help explain why can't we use a waitgroup anymore ?

A WaitGroup blocks the goroutines until WaitGroup.Done() is called. The blocked goroutines can't be terminated after reloading the envoy-gateway-config ConfigMap, result in goroutine leaks.

Using a channel allows the blocked goroutines select on both the Elected and Context.Done channels, so they can be terminated after envoy-gateway-config ConfigMap is reloaded by cancelling the Context.

@zhaohuabing zhaohuabing requested review from arkodg and zirain January 16, 2025 00:30
@arkodg
Copy link
Contributor

arkodg commented Jan 16, 2025

thanks for the explaination @zhaohuabing , can we achieve the same by

cc @zirain

@zhaohuabing
Copy link
Member Author

zhaohuabing commented Jan 16, 2025

thanks for the explaination @zhaohuabing , can we achieve the same by

There will be two issues in this approach:

  • We don't know if cfg.Elected.Done() has been called or not, and calling it repeately will result in panic.
  • Unblocking it will result in the gorotines to continue execution when they shouldn't.

cc @zirain

Goroutine leaks.

Channel is a common approach for blocking multiple goroutines while they wait for a signal. This allows these goroutines to exit gracefully when the associated context is canceled. The k8s controller manager also uses a similar approach to broadcast the Elected status.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Envoy Gateway panic when updating the envoy-gateway-config configMap
3 participants