Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MGMT-19120: Use service net to connect to hosted API server #7090

Conversation

jhernand
Copy link
Contributor

@jhernand jhernand commented Dec 13, 2024

There are several situations where assisted service needs to connect to the API server of a spoke cluster. To do so it uses the kubeconfig generated during the installation, and that usually contains the external URL of the API server, and that means that the cluster where assisted service runs needs to be configured with a proxy that allows that. But for HyperShift clusters this can be avoided: assisted service can instead connect via the service network, using the kube-apiserver.my-cluster.svc host name, as the API server runs as a pod in the same cluster. Doing that reduces the number of round trips and the potential proxy configuration issues. In order to achieve that this patch changes the spoke client factory so that it checks if the cluster is a HyperShift cluster, and then it replaces the API server URL with https://kube-apiserver.my-cluster.svc:6443.

List all the issues related to this PR

https://issues.redhat.com/browse/MGMT-19120

  • New Feature
  • Enhancement
  • Bug fix
  • Tests
  • Documentation
  • CI/CD

What environments does this code impact?

  • Automation (CI, tools, etc)
  • Cloud
  • Operator Managed Deployments
  • None

How was this code tested?

  • assisted-test-infra environment
  • dev-scripts environment
  • Reviewer's test appreciated
  • Waiting for CI to do a full test run
  • Manual (Elaborate on how it was tested)
  • No tests needed

Checklist

  • Title and description added to both, commit and PR.
  • Relevant issues have been associated (see CONTRIBUTING guide)
  • This change does not require a documentation update (docstring, docs, README, etc)
  • Does this change include unit-tests (note that code changes require unit-tests)

Reviewers Checklist

  • Are the title and description (in both PR and commit) meaningful and clear?
  • Is there a bug required (and linked) for this change?
  • Should this PR be backported?

@jhernand jhernand changed the title Use service network to talk to hosted api server MGMT-19120: Use service net to connect to hosted API server Dec 13, 2024
@openshift-ci-robot
Copy link

openshift-ci-robot commented Dec 13, 2024

@jhernand: This pull request references MGMT-19120 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the epic to target the "4.19.0" version, but no target version was set.

In response to this:

There are several situations where assisted service needs to connect to the API server of a spoke cluster. To do so it uses the kubeconfig generated during the installation, and that usually contains the external URL of the API server, and that means that the cluster where assisted service runs needs to be configured with a proxy that allows that. But for HyperShift clusters this can be avoided: assisted service can instead connect via the service network, using the kube-apiserver.my-cluster.svc host name, as the API server runs as a pod in the same cluster. During that reduces the number of round trips and the potential proxy configuration issues. In order to achieve that this patch changes the spoke client factory so that it checks if the cluster is a HyperShift cluster, and then it replaces the API server URL
with https://kube-apiserver.my-cluster.svc:6443.

List all the issues related to this PR

https://issues.redhat.com/browse/MGMT-19120

  • New Feature
  • Enhancement
  • Bug fix
  • Tests
  • Documentation
  • CI/CD

What environments does this code impact?

  • Automation (CI, tools, etc)
  • Cloud
  • Operator Managed Deployments
  • None

How was this code tested?

  • assisted-test-infra environment
  • dev-scripts environment
  • Reviewer's test appreciated
  • Waiting for CI to do a full test run
  • Manual (Elaborate on how it was tested)
  • No tests needed

Checklist

  • Title and description added to both, commit and PR.
  • Relevant issues have been associated (see CONTRIBUTING guide)
  • This change does not require a documentation update (docstring, docs, README, etc)
  • Does this change include unit-tests (note that code changes require unit-tests)

Reviewers Checklist

  • Are the title and description (in both PR and commit) meaningful and clear?
  • Is there a bug required (and linked) for this change?
  • Should this PR be backported?

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci-robot openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label Dec 13, 2024
@jhernand jhernand marked this pull request as draft December 13, 2024 20:18
@openshift-ci openshift-ci bot added size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. labels Dec 13, 2024
Copy link

openshift-ci bot commented Dec 13, 2024

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: jhernand

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Dec 13, 2024
@jhernand jhernand force-pushed the use_service_network_to_talk_to_hosted_api_server branch from 63c3673 to 390c90d Compare December 16, 2024 16:06
@openshift-ci openshift-ci bot added size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. and removed size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels Dec 16, 2024
@gamli75
Copy link
Contributor

gamli75 commented Dec 16, 2024

@eranco74 can you review this PR?

@openshift-ci-robot
Copy link

openshift-ci-robot commented Dec 16, 2024

@jhernand: This pull request references MGMT-19120 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the epic to target the "4.19.0" version, but no target version was set.

In response to this:

There are several situations where assisted service needs to connect to the API server of a spoke cluster. To do so it uses the kubeconfig generated during the installation, and that usually contains the external URL of the API server, and that means that the cluster where assisted service runs needs to be configured with a proxy that allows that. But for HyperShift clusters this can be avoided: assisted service can instead connect via the service network, using the kube-apiserver.my-cluster.svc host name, as the API server runs as a pod in the same cluster. Doing that reduces the number of round trips and the potential proxy configuration issues. In order to achieve that this patch changes the spoke client factory so that it checks if the cluster is a HyperShift cluster, and then it replaces the API server URL
with https://kube-apiserver.my-cluster.svc:6443.

List all the issues related to this PR

https://issues.redhat.com/browse/MGMT-19120

  • New Feature
  • Enhancement
  • Bug fix
  • Tests
  • Documentation
  • CI/CD

What environments does this code impact?

  • Automation (CI, tools, etc)
  • Cloud
  • Operator Managed Deployments
  • None

How was this code tested?

  • assisted-test-infra environment
  • dev-scripts environment
  • Reviewer's test appreciated
  • Waiting for CI to do a full test run
  • Manual (Elaborate on how it was tested)
  • No tests needed

Checklist

  • Title and description added to both, commit and PR.
  • Relevant issues have been associated (see CONTRIBUTING guide)
  • This change does not require a documentation update (docstring, docs, README, etc)
  • Does this change include unit-tests (note that code changes require unit-tests)

Reviewers Checklist

  • Are the title and description (in both PR and commit) meaningful and clear?
  • Is there a bug required (and linked) for this change?
  • Should this PR be backported?

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci-robot
Copy link

openshift-ci-robot commented Dec 16, 2024

@jhernand: This pull request references MGMT-19120 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the epic to target the "4.19.0" version, but no target version was set.

In response to this:

There are several situations where assisted service needs to connect to the API server of a spoke cluster. To do so it uses the kubeconfig generated during the installation, and that usually contains the external URL of the API server, and that means that the cluster where assisted service runs needs to be configured with a proxy that allows that. But for HyperShift clusters this can be avoided: assisted service can instead connect via the service network, using the kube-apiserver.my-cluster.svc host name, as the API server runs as a pod in the same cluster. Doing that reduces the number of round trips and the potential proxy configuration issues. In order to achieve that this patch changes the spoke client factory so that it checks if the cluster is a HyperShift cluster, and then it replaces the API server URL with https://kube-apiserver.my-cluster.svc:6443.

List all the issues related to this PR

https://issues.redhat.com/browse/MGMT-19120

  • New Feature
  • Enhancement
  • Bug fix
  • Tests
  • Documentation
  • CI/CD

What environments does this code impact?

  • Automation (CI, tools, etc)
  • Cloud
  • Operator Managed Deployments
  • None

How was this code tested?

  • assisted-test-infra environment
  • dev-scripts environment
  • Reviewer's test appreciated
  • Waiting for CI to do a full test run
  • Manual (Elaborate on how it was tested)
  • No tests needed

Checklist

  • Title and description added to both, commit and PR.
  • Relevant issues have been associated (see CONTRIBUTING guide)
  • This change does not require a documentation update (docstring, docs, README, etc)
  • Does this change include unit-tests (note that code changes require unit-tests)

Reviewers Checklist

  • Are the title and description (in both PR and commit) meaningful and clear?
  • Is there a bug required (and linked) for this change?
  • Should this PR be backported?

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@jhernand jhernand force-pushed the use_service_network_to_talk_to_hosted_api_server branch from 390c90d to 3e7178a Compare December 16, 2024 19:39
@jhernand jhernand mentioned this pull request Dec 17, 2024
20 tasks
@jhernand jhernand force-pushed the use_service_network_to_talk_to_hosted_api_server branch 2 times, most recently from 4cd5768 to 92b8bac Compare December 18, 2024 11:58
@openshift-ci openshift-ci bot added size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. and removed size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. labels Dec 18, 2024
@jhernand jhernand marked this pull request as ready for review December 18, 2024 15:54
@openshift-ci openshift-ci bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Dec 18, 2024
Copy link

codecov bot commented Dec 18, 2024

Codecov Report

Attention: Patch coverage is 82.85714% with 18 lines in your changes missing coverage. Please review.

Project coverage is 67.80%. Comparing base (7f77988) to head (167ff48).
Report is 3 commits behind head on master.

Files with missing lines Patch % Lines
...nal/controller/controllers/bmh_agent_controller.go 52.94% 6 Missing and 2 partials ⚠️
...nternal/controller/controllers/agent_controller.go 62.50% 5 Missing and 1 partial ⚠️
internal/spoke_k8s_client/factory.go 95.45% 2 Missing and 1 partial ⚠️
internal/spoke_k8s_client/spoke_k8s_client.go 0.00% 1 Missing ⚠️
Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #7090      +/-   ##
==========================================
+ Coverage   67.70%   67.80%   +0.10%     
==========================================
  Files         296      296              
  Lines       40459    40514      +55     
==========================================
+ Hits        27391    27472      +81     
+ Misses      10607    10576      -31     
- Partials     2461     2466       +5     
Files with missing lines Coverage Δ
...oller/controllers/clusterdeployments_controller.go 72.89% <100.00%> (ø)
...rollers/hypershiftagentserviceconfig_controller.go 76.58% <100.00%> (ø)
...ernal/controller/controllers/spoke_client_cache.go 85.18% <100.00%> (ø)
internal/spoke_k8s_client/spoke_k8s_client.go 35.29% <0.00%> (+35.29%) ⬆️
internal/spoke_k8s_client/factory.go 91.78% <95.45%> (+47.33%) ⬆️
...nternal/controller/controllers/agent_controller.go 76.24% <62.50%> (-0.20%) ⬇️
...nal/controller/controllers/bmh_agent_controller.go 76.68% <52.94%> (-0.43%) ⬇️

... and 1 file with indirect coverage changes

@jhernand
Copy link
Contributor Author

/retest-required

@jhernand jhernand force-pushed the use_service_network_to_talk_to_hosted_api_server branch from 92b8bac to 7b9e4dc Compare December 19, 2024 08:27
}

// SetHubClient sets the client that will be used to call the API of the hub cluster. This is mandatory.
func (b *SpokeK8sClientFactoryBuilder) SetHubClient(value ctrlclient.Client) *SpokeK8sClientFactoryBuilder {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

@tsorya tsorya Dec 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry my fault missed
https://github.com/openshift/assisted-service/pull/7090/files#diff-c444f711e9191b53952edb65bfd8c644419fc7695c62611dc0fb304b4fb197d6R625

Though it seems like this is a must parameter and we will get error in build if it was not set, so why not to provide it as param to New? Same actually for logger

Copy link
Contributor Author

@jhernand jhernand Dec 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is just a way to make the code cleaner, avoiding long lists of parameters. We could pass the the logger, the client (and the transport wrapper, only used currently for tests) as parameters to the "New..." function, but over time that results in long lists of parameters like this.

api = NewManager(common.GetTestLog(), db, testing.GetDummyNotificationStream(ctrl), mockEventApi, nil, nil, nil, nil, &config, &leader.DummyElector{}, nil, nil, true, nil, nil, false)

It is already useful to avoid setting the transport wrapper parameter to nil.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But this is required params and in this case you left them as optional so i don't understand actually why it is good

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe it is good for several reasons:

  1. It is consistent: all the parameters (required or optional) are provided in the same way.

  2. It makes it clearer what each parameter means. Not in this case, but if you had two parameters that are strings it is not the same to see this:

whatever, err := NewWhatever("foo", "bar")

Than this:

whatever, err := NewWhatever(().
        SetUserName("foo").
        SetPassword("bar").
        Build()

In the first case you have to deep digger to find out what is the meaning of the parameters, and in the second it is explicit.

  1. It gives room for documenting each parameter separately: the documentation goes in the "Set..." method of the builder.

  2. It simplifies building the object in multiple steps, if needed, for example:

builder := NewWhatever()
builder.SetUserName("foo")
if shouldUsePassword {
        builder.SetPassword("bar")
}
whatever, err := builder.Build()
  1. It simplifies adding multiple values for the same parameter:
whatever, err := NewWahtever().
        SetUserName("foo").
        SetUserName("foo-alias").
        Build()
  1. It allows adding new optional parameters without having to change the call sites.

I don't want to bore you with my opinions about this. If you find this unacceptable I will change it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Idk, maybe it just me, I just believe that if parameter is required it should be provided as part of function call another way if someone will write
whatever, err := NewWahtever().Build()
it will pass compilation but will fail on the run an i think better to find such error in compilation.
Though it is my personal opinion

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to be sure, i like your proposition i just don't think it should be that way with required params

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand your point of view, and still think that the benefits outweigh the drawbacks. As that is not the key point of this pull request I am changing it to a plain list of parameters. We can have this discussion another time.

if err != nil {
cf.log.WithError(err).Warnf("Getting kuberenetes config for cluster")
return nil, nil, err
func (f *spokeK8sClientFactory) kubeConfigFromSecret(secret *corev1.Secret) (result []byte, err error) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we make it common function? We have at least 2 more place that do the same

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, at least we have one less place now: I removed similar logic from the spoke client cache in a previous patch. I will try to find where we are doing this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will do this in a different patch.

// Try to find the cluster deployment. If we can't, for whatever the reason, explain it in the log and assume
// it isn't a hosted cluster.
clusterDeployment, err := f.findClusterDeploymentForKubeconfigSecret(ctx, kubeconfigSecret)
if err != nil || clusterDeployment == nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Error and not having clusterDeployment seems to be different issues, maybe we should split the logging at least?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, will do.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@jhernand jhernand force-pushed the use_service_network_to_talk_to_hosted_api_server branch from 7b9e4dc to cb06b53 Compare December 19, 2024 12:55
log: cf.log,
// findClusterDeploymentForKubeconfigSecret finds the cluster deployment that corresponds to the given kubeconfig
// secret. It returns nil if there is no such cluster deployment.
func (f *spokeK8sClientFactory) findClusterDeploymentForKubeconfigSecret(ctx context.Context,
Copy link
Member

@carbonin carbonin Dec 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we ever in a situation where the caller of this factory doesn't already have a reference to the cluster deployment?

Since (based on the naming) we're talking about "spoke" clusters it seems likely that this could be simplified by either the caller supplying the cluster deployment or by this logic living outside this factory (then we would have an option like "useHubServiceNetwork" or something when creating the client).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, here we don't know what is the cluster deployment:

spokeClient, err := hr.SpokeClients.Get(kubeconfigSecret)
.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, thanks.

Side note though ... can we delete the HASC CRD and controller yet?
@gamli75 that effort isn't happening now, right?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not familiar with that effort. maybe @CrystalChun

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not familiar with it either, maybe @danielerez?

@jhernand jhernand force-pushed the use_service_network_to_talk_to_hosted_api_server branch from cb06b53 to 4d0cd06 Compare December 19, 2024 19:38
@jhernand
Copy link
Contributor Author

jhernand commented Jan 7, 2025

@carbonin @tsorya I think I addressed your concerns. Anything else that needs to be changed before merging?

@jhernand
Copy link
Contributor Author

jhernand commented Jan 7, 2025

/test edge-e2e-ai-operator-disconnected-capi

@gamli75
Copy link
Contributor

gamli75 commented Jan 7, 2025

/test edge-e2e-ai-operator-disconnected-capi

this job is broken again, see: https://issues.redhat.com/browse/MGMT-19358


// findClusterDeploymentForKubeconfigSecret finds the cluster deployment that corresponds to the given kubeconfig
// secret. It returns nil if there is no such cluster deployment.
func (f *spokeK8sClientFactory) findClusterDeploymentForKubeconfigSecret(ctx context.Context,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you think about passing in a ClusterDeployment pointer as a parameter instead of the ctx?

That way we don't need to know the hub client and the ctx, and we don't need this spoke client factory to find the ClusterDeployment. The work will be on the caller to pass in the ClusterDeployment.

So even if the function caller doesn't have a ClusterDeployment, we can already assume it's not a HostedCluster and just use the secret as we already have.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To clarify my point the function ClientAndSetFromSecret would be

func (f *spokeK8sClientFactory) ClientAndSetFromSecret(clusterDeployment *hivev1.ClusterDeployment, secret *corev1.Secret) (SpokeK8sClient,
	*kubernetes.Clientset, error) {

Then in isFromHostedCluster

func (f *spokeK8sClientFactory) isFromHostedCluster(clusterDeployment *hivev1.ClusterDeployment, kubeconfigSecret *corev1.Secret) bool {
        if clusterDeployment == nil { 
                return false
        }
	// We assume this is a hosted cluster if it has the 'agentClusterRef' label, no matter the value.
	_, result := clusterDeployment.Labels["agentClusterRef"]
	return result
}

Just thought maybe this could make it simpler and offload the responsibility of getting a ClusterDeployment to the caller as opposed to this client factory.

Let me know your thoughts or if you want me to clarify more offline!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that specifically for context the idiomatic way to pass context is to ensure that it is the first parameter of a function. It is expected that context will be passed from function to function this way.

As this is so specific and idiomatic, I don't think we should hide the context inside a struct.

I do agree with the other encapsulations though, just not the hiding of context.

Copy link
Contributor Author

@jhernand jhernand Jan 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@CrystalChun what you suggest makes sense. My only concern would be that adding another optional parameter (the pointer to the cluster deployment) means that it will be easier for someone to write ClientAndSetFromSecret(nil, mySecret), maybe without thinking if the cluster deployment is available or not. That would result in wrong results if that missing cluster deployment was actually a HyperShift cluster. But I am going to apply your suggestion anyhow: seems like a good way to move this forward.

@paul-maidment I agree with you, but I guess that @CrystalChun means that if I apply her suggestion then context parameter isn't strictly needed, as there will be no operation in those functions that supports cancelation.

hack/start_db.sh Outdated

# We store the data files in the `/tmp` directory assuming that it will be a `tmpfs` and therefore
# very fast. But in some environments it may not be, as it may be mapped to a real persistent file
# system. Disabling `fsync` speeds things up in those environment, at the cost of risking data loss,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe add a tiny bit more detail on the purpose of fysnc to this comment.

The context that this is ensuring a file is fully written even in the case of a node crash/power failure/reboot is relevant here as we don't expect or need our data to survive a reboot. This is the real reason we don't care about "data loss"

Copy link
Contributor Author

@jhernand jhernand Jan 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The meaning of the fsync parameter is described in the documentation of PostgreSQL. For example, for version 12 it is here: https://www.postgresql.org/docs/12/runtime-config-wal.html . I was trying to explain that we don't care about data loss because this is used only in unit tests, and crash, power failure or reboot is not a concern in unit tests. Anyhow, this is in this pull request by accident, not really needed. I will remove it in the next version to avoid noise.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure. I think my point was just to mention what you say about unit tests, the original text doesn't fully explain this and leaves some room for interpretation.

@jhernand jhernand force-pushed the use_service_network_to_talk_to_hosted_api_server branch 2 times, most recently from b92a3ac to 5451fb1 Compare January 13, 2025 19:16
@jhernand
Copy link
Contributor Author

@CrystalChun I applied your suggestion.

@carbonin @tsorya @paul-maidment you may want to review again.

Comment on lines +581 to +584
if k8serrors.IsNotFound(err) {
clusterDeployment = nil
err = nil
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you think we should still fail and propagate the error here when the cluster deployment is not found?
Reason is I'm not sure we should continue to create a spoke client if the cluster doesn't exist

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd prefer to not change the behavior of this code, as I am not sure if the existence of a cluster deployment is a requirement here. If it is I assume that it would be checked before this point.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gotcha, didn't realize this was the behavior before. That's alright then!

@CrystalChun
Copy link
Contributor

Looks good to me! Thanks Juan! Will wait for others to review

@jhernand
Copy link
Contributor Author

/retest

1 similar comment
@jhernand
Copy link
Contributor Author

/retest

hack/start_db.sh Outdated
@@ -8,4 +8,5 @@ mkdir -p /tmp/postgres/data
mkdir -p /tmp/postgres/sockets

initdb -D /tmp/postgres/data -U postgres

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not much going on in this file anymore, I suspect you can just remove this newline to omit it from the commit

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is completely right, I will.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

There are several situations where assisted service needs to connect to
the API server of a spoke cluster. To do so it uses the kubeconfig
generated during the installation, and that usually contains the
external URL of the API server, and that means that the cluster where
assisted service runs needs to be configured with a proxy that allows
that. But for HyperShift clusters this can be avoided: assisted service
can instead connect via the service network, using the
`kube-apiserver.my-cluster.svc` host name, as the API server runs as a
pod in the same cluster. Doing that reduces the number of round trips
and the potential proxy configuration issues. In order to achive that
this patch changes the spoke client factory so that it checks if the
cluster is a HyperShift cluster, and then it replaces the API server URL
with `https://kube-apiserver.my-cluster.svc:6443`.

Related: https://issues.redhat.com/browse/MGMT-19120
Signed-off-by: Juan Hernandez <[email protected]>
@jhernand jhernand force-pushed the use_service_network_to_talk_to_hosted_api_server branch from 5451fb1 to 167ff48 Compare January 16, 2025 12:03
@jhernand
Copy link
Contributor Author

/retest

1 similar comment
@jhernand
Copy link
Contributor Author

/retest

Copy link

openshift-ci bot commented Jan 17, 2025

@jhernand: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/edge-e2e-ai-operator-disconnected-capi 167ff48 link false /test edge-e2e-ai-operator-disconnected-capi

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@paul-maidment
Copy link
Contributor

paul-maidment commented Jan 19, 2025

Looks good to me, will leave this for further review by @tsorya @carbonin

@tsorya
Copy link
Contributor

tsorya commented Jan 19, 2025

/lgtm

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Jan 19, 2025
@openshift-ci-robot
Copy link

/retest-required

Remaining retests: 0 against base HEAD fa31933 and 2 for PR HEAD 167ff48 in total

@openshift-merge-bot openshift-merge-bot bot merged commit f1dc05f into openshift:master Jan 19, 2025
18 of 19 checks passed
@openshift-bot
Copy link
Contributor

[ART PR BUILD NOTIFIER]

Distgit: ose-agent-installer-api-server
This PR has been included in build ose-agent-installer-api-server-container-v4.19.0-202501191937.p0.gf1dc05f.assembly.stream.el9.
All builds following this will include this PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. lgtm Indicates that a PR is ready to be merged. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants