Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor maestro agent flags. #112

Merged
merged 3 commits into from
Jun 11, 2024

Conversation

morvencao
Copy link
Contributor

@morvencao morvencao commented Jun 6, 2024

Refactor maestro agent flags:

  1. Enable manifest & manifestbundle codec for maestro agent.
  2. Disable leader election by default, but can be enabled explicitly by flags when running agent in multiple instances.

@@ -70,7 +70,6 @@ func runAgent(cmd *cobra.Command, args []string) {
cfg := spoke.NewWorkAgentConfig(commonOptions, agentOption)
cmdConfig := commonOptions.CommoOpts.
NewControllerCommandConfig("maestro-agent", version.Get(), cfg.RunWorkloadAgent)
cmdConfig.DisableLeaderElection = true
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@clyang82 I found this was added in #20, any reason to disable leader election explicitly?

Copy link
Contributor

@skeeey skeeey Jun 6, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would you consider to use cmdConfig.NewCommandWithContext(context.TODO()) to build command, e.g. https://github.com/open-cluster-management-io/ocm/blob/main/pkg/cmd/spoke/work.go#L24

so we can get more basic flags from library go

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no specific reason to disable it at the beginning. just make it works for initial version. We need to enable it based on the current case.

Copy link
Contributor Author

@morvencao morvencao Jun 6, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated this to following https://github.com/open-cluster-management-io/ocm/blob/main/pkg/cmd/spoke/work.go#L24

One question: we have added some flag alias for maestro agent, which are different from work-agent. Eg. --consumer-name instead of --spoke-cluster-name? do we still need these alias flags?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what does election leader provide that advisory locks from postgres do not?

locks give us concurrency and parallelism in execution. leader election ties us to a single pod at a time and has proven troublesome with CS in the past.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah, i think this is in the context of the agent, which would be in a cluster without the postgres database. nevermind my question.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, the agent is using in-cluster leader election.

@morvencao morvencao force-pushed the br_enhance_agent branch 2 times, most recently from 5ef1b8f to 5507940 Compare June 6, 2024 10:40
@markturansky
Copy link
Contributor

i'd advise against election leader and instead use the advisory locks provided by postgres.

@morvencao morvencao changed the title enable leader election and manifest & manifestbundle codec for agent. refactor maestro agent flags. Jun 7, 2024
@morvencao morvencao force-pushed the br_enhance_agent branch 2 times, most recently from 5844132 to 8a14c54 Compare June 7, 2024 03:32
@morvencao
Copy link
Contributor Author

morvencao commented Jun 7, 2024

Did some testing with new agent:

  1. No permission(RBAC) issue.
  2. DisableLeaderElection flags works.
  3. MaxJSONRawLength works.

/cc @clyang82 @skeeey

@clyang82
Copy link
Contributor

clyang82 commented Jun 7, 2024

how about standalone maestro agent, does it work?

agentOption.CloudEventsClientCodecs = []string{"manifest", "manifestbundle"}
cfg := spoke.NewWorkAgentConfig(commonOptions, agentOption)
cmdConfig := commonOptions.CommoOpts.
NewControllerCommandConfig("work-agent", version.Get(), cfg.RunWorkloadAgent)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe use maestro-agent here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated to maestro-agent.

[]string{"manifest"}, "The codecs of the agent client. The valid codecs are manifest and manifestbundle")

fs.BoolVar(&commonOptions.CommoOpts.CmdConfig.DisableLeaderElection, "disable-leader-election",
true, "Disable leader election.")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How long will take to start the agent if enable leader election by default?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't see difference from the case when leader election is disabled.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so can we enable the leader election by default?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as we discussed, we disable leader election by default.

@morvencao
Copy link
Contributor Author

@clyang82
maestro-agent can run as standalone:

# go run ./cmd/maestro/main.go agent --kubeconfig=/tmp/kubeconfig  --workload-source-config=/tmp/config.yaml --workload-source-driver=mqtt --cloudevents-client-id=cluster1-agent --consumer-name=cluster1
W0607 08:08:33.125558  385318 cmd.go:245] Using insecure, self-signed certificates
I0607 08:08:33.544479  385318 observer_polling.go:159] Starting file observer
W0607 08:08:33.545032  385318 builder.go:260] unable to identify the current namespace for events: open /var/run/secrets/kubernetes.io/serviceaccount/namespace: no such file or directory
W0607 08:08:33.554245  385318 builder.go:267] unable to get owner reference (falling back to namespace): pods "test" not found
I0607 08:08:33.554465  385318 builder.go:299] maestro-agent version v0.0.0-master+$Format:%H$-$Format:%H$
I0607 08:08:34.038681  385318 secure_serving.go:57] Forcing use of http/1.1 only
W0607 08:08:34.038755  385318 secure_serving.go:69] Use of insecure cipher 'TLS_ECDHE_ECDSA_WITH_AES_128_CBC_SHA256' detected.
W0607 08:08:34.038763  385318 secure_serving.go:69] Use of insecure cipher 'TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA256' detected.
I0607 08:08:34.042255  385318 factory.go:58] Executor caches enabled: false
{"level":"info","ts":1717747714.0423555,"logger":"fallback","caller":"[email protected]/protocol.go:124","msg":"subscribing to topics: map[sources/maestro/consumers/cluster/sourceevents:{1 0 false false}]"}
I0607 08:08:34.043134  385318 requestheader_controller.go:169] Starting RequestHeaderAuthRequestController
I0607 08:08:34.043191  385318 shared_informer.go:311] Waiting for caches to sync for RequestHeaderAuthRequestController
I0607 08:08:34.043207  385318 configmap_cafile_content.go:202] "Starting controller" name="client-ca::kube-system::extension-apiserver-authentication::client-ca-file"
I0607 08:08:34.043255  385318 configmap_cafile_content.go:202] "Starting controller" name="client-ca::kube-system::extension-apiserver-authentication::requestheader-client-ca-file"
I0607 08:08:34.043282  385318 shared_informer.go:311] Waiting for caches to sync for client-ca::kube-system::extension-apiserver-authentication::client-ca-file
I0607 08:08:34.043309  385318 shared_informer.go:311] Waiting for caches to sync for client-ca::kube-system::extension-apiserver-authentication::requestheader-client-ca-file
I0607 08:08:34.043818  385318 dynamic_serving_content.go:132] "Starting controller" name="serving-cert::/tmp/serving-cert-491928347/tls.crt::/tmp/serving-cert-491928347/tls.key"
W0607 08:08:34.044270  385318 agentoptions.go:63] the agent broadcast topic not set, fall back to the agent events topic
I0607 08:08:34.044494  385318 base_controller.go:67] Waiting for caches to sync for AvailableStatusController
I0607 08:08:34.044524  385318 base_controller.go:67] Waiting for caches to sync for AppliedManifestWorkFinalizer
I0607 08:08:34.044554  385318 base_controller.go:67] Waiting for caches to sync for ManifestWorkAddFinalizerController
I0607 08:08:34.044617  385318 base_controller.go:67] Waiting for caches to sync for AppliedManifestWorkController
I0607 08:08:34.044639  385318 base_controller.go:67] Waiting for caches to sync for ManifestWorkAgent
I0607 08:08:34.044655  385318 base_controller.go:67] Waiting for caches to sync for UnManagedAppliedManifestWork
I0607 08:08:34.044671  385318 base_controller.go:67] Waiting for caches to sync for ManifestWorkFinalizer
I0607 08:08:34.044811  385318 secure_serving.go:213] Serving securely on [::]:8443
I0607 08:08:34.044865  385318 tlsconfig.go:240] "Starting DynamicServingCertificateController"
W0607 08:08:34.045282  385318 agentoptions.go:63] the agent broadcast topic not set, fall back to the agent events topic
...

Signed-off-by: morvencao <[email protected]>
@clyang82
Copy link
Contributor

/ok-to-test

Copy link
Contributor

@clyang82 clyang82 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@clyang82 clyang82 merged commit ae149df into openshift-online:main Jun 11, 2024
5 checks passed
@morvencao morvencao deleted the br_enhance_agent branch June 11, 2024 06:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants