-
Notifications
You must be signed in to change notification settings - Fork 227
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
MGMT-19120: Use service net to connect to hosted API server #7090
MGMT-19120: Use service net to connect to hosted API server #7090
Conversation
@jhernand: This pull request references MGMT-19120 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the epic to target the "4.19.0" version, but no target version was set. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: jhernand The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
63c3673
to
390c90d
Compare
@eranco74 can you review this PR? |
@jhernand: This pull request references MGMT-19120 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the epic to target the "4.19.0" version, but no target version was set. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
@jhernand: This pull request references MGMT-19120 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the epic to target the "4.19.0" version, but no target version was set. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
390c90d
to
3e7178a
Compare
4cd5768
to
92b8bac
Compare
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #7090 +/- ##
==========================================
+ Coverage 67.70% 67.80% +0.10%
==========================================
Files 296 296
Lines 40459 40514 +55
==========================================
+ Hits 27391 27472 +81
+ Misses 10607 10576 -31
- Partials 2461 2466 +5
|
/retest-required |
92b8bac
to
7b9e4dc
Compare
internal/spoke_k8s_client/factory.go
Outdated
} | ||
|
||
// SetHubClient sets the client that will be used to call the API of the hub cluster. This is mandatory. | ||
func (b *SpokeK8sClientFactoryBuilder) SetHubClient(value ctrlclient.Client) *SpokeK8sClientFactoryBuilder { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why we can't provide client while creating the object?
we have it here https://github.com/openshift/assisted-service/pull/7090/files#diff-891ba2cfffd82a8ae4131c88beb092d2a88149f579c931e3a1f7ca77fbfc82a5L166 no?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sorry my fault missed
https://github.com/openshift/assisted-service/pull/7090/files#diff-c444f711e9191b53952edb65bfd8c644419fc7695c62611dc0fb304b4fb197d6R625
Though it seems like this is a must parameter and we will get error in build if it was not set, so why not to provide it as param to New? Same actually for logger
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is just a way to make the code cleaner, avoiding long lists of parameters. We could pass the the logger, the client (and the transport wrapper, only used currently for tests) as parameters to the "New..." function, but over time that results in long lists of parameters like this.
api = NewManager(common.GetTestLog(), db, testing.GetDummyNotificationStream(ctrl), mockEventApi, nil, nil, nil, nil, &config, &leader.DummyElector{}, nil, nil, true, nil, nil, false)
It is already useful to avoid setting the transport wrapper parameter to nil.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But this is required params and in this case you left them as optional so i don't understand actually why it is good
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe it is good for several reasons:
-
It is consistent: all the parameters (required or optional) are provided in the same way.
-
It makes it clearer what each parameter means. Not in this case, but if you had two parameters that are strings it is not the same to see this:
whatever, err := NewWhatever("foo", "bar")
Than this:
whatever, err := NewWhatever(().
SetUserName("foo").
SetPassword("bar").
Build()
In the first case you have to deep digger to find out what is the meaning of the parameters, and in the second it is explicit.
-
It gives room for documenting each parameter separately: the documentation goes in the "Set..." method of the builder.
-
It simplifies building the object in multiple steps, if needed, for example:
builder := NewWhatever()
builder.SetUserName("foo")
if shouldUsePassword {
builder.SetPassword("bar")
}
whatever, err := builder.Build()
- It simplifies adding multiple values for the same parameter:
whatever, err := NewWahtever().
SetUserName("foo").
SetUserName("foo-alias").
Build()
- It allows adding new optional parameters without having to change the call sites.
I don't want to bore you with my opinions about this. If you find this unacceptable I will change it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Idk, maybe it just me, I just believe that if parameter is required it should be provided as part of function call another way if someone will write
whatever, err := NewWahtever().Build()
it will pass compilation but will fail on the run an i think better to find such error in compilation.
Though it is my personal opinion
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just to be sure, i like your proposition i just don't think it should be that way with required params
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I understand your point of view, and still think that the benefits outweigh the drawbacks. As that is not the key point of this pull request I am changing it to a plain list of parameters. We can have this discussion another time.
internal/spoke_k8s_client/factory.go
Outdated
if err != nil { | ||
cf.log.WithError(err).Warnf("Getting kuberenetes config for cluster") | ||
return nil, nil, err | ||
func (f *spokeK8sClientFactory) kubeConfigFromSecret(secret *corev1.Secret) (result []byte, err error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we make it common function? We have at least 2 more place that do the same
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, at least we have one less place now: I removed similar logic from the spoke client cache in a previous patch. I will try to find where we are doing this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will do this in a different patch.
internal/spoke_k8s_client/factory.go
Outdated
// Try to find the cluster deployment. If we can't, for whatever the reason, explain it in the log and assume | ||
// it isn't a hosted cluster. | ||
clusterDeployment, err := f.findClusterDeploymentForKubeconfigSecret(ctx, kubeconfigSecret) | ||
if err != nil || clusterDeployment == nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Error and not having clusterDeployment seems to be different issues, maybe we should split the logging at least?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, will do.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
7b9e4dc
to
cb06b53
Compare
internal/spoke_k8s_client/factory.go
Outdated
log: cf.log, | ||
// findClusterDeploymentForKubeconfigSecret finds the cluster deployment that corresponds to the given kubeconfig | ||
// secret. It returns nil if there is no such cluster deployment. | ||
func (f *spokeK8sClientFactory) findClusterDeploymentForKubeconfigSecret(ctx context.Context, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are we ever in a situation where the caller of this factory doesn't already have a reference to the cluster deployment?
Since (based on the naming) we're talking about "spoke" clusters it seems likely that this could be simplified by either the caller supplying the cluster deployment or by this logic living outside this factory (then we would have an option like "useHubServiceNetwork" or something when creating the client).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, here we don't know what is the cluster deployment:
assisted-service/internal/controller/controllers/hypershiftagentserviceconfig_controller.go
Line 342 in 9a1b9ec
spokeClient, err := hr.SpokeClients.Get(kubeconfigSecret) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, thanks.
Side note though ... can we delete the HASC CRD and controller yet?
@gamli75 that effort isn't happening now, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not familiar with that effort. maybe @CrystalChun
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not familiar with it either, maybe @danielerez?
cb06b53
to
4d0cd06
Compare
/test edge-e2e-ai-operator-disconnected-capi |
this job is broken again, see: https://issues.redhat.com/browse/MGMT-19358 |
internal/spoke_k8s_client/factory.go
Outdated
|
||
// findClusterDeploymentForKubeconfigSecret finds the cluster deployment that corresponds to the given kubeconfig | ||
// secret. It returns nil if there is no such cluster deployment. | ||
func (f *spokeK8sClientFactory) findClusterDeploymentForKubeconfigSecret(ctx context.Context, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you think about passing in a ClusterDeployment
pointer as a parameter instead of the ctx
?
That way we don't need to know the hub client and the ctx, and we don't need this spoke client factory to find the ClusterDeployment. The work will be on the caller to pass in the ClusterDeployment.
So even if the function caller doesn't have a ClusterDeployment, we can already assume it's not a HostedCluster and just use the secret as we already have.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To clarify my point the function ClientAndSetFromSecret
would be
func (f *spokeK8sClientFactory) ClientAndSetFromSecret(clusterDeployment *hivev1.ClusterDeployment, secret *corev1.Secret) (SpokeK8sClient,
*kubernetes.Clientset, error) {
Then in isFromHostedCluster
func (f *spokeK8sClientFactory) isFromHostedCluster(clusterDeployment *hivev1.ClusterDeployment, kubeconfigSecret *corev1.Secret) bool {
if clusterDeployment == nil {
return false
}
// We assume this is a hosted cluster if it has the 'agentClusterRef' label, no matter the value.
_, result := clusterDeployment.Labels["agentClusterRef"]
return result
}
Just thought maybe this could make it simpler and offload the responsibility of getting a ClusterDeployment to the caller as opposed to this client factory.
Let me know your thoughts or if you want me to clarify more offline!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that specifically for context the idiomatic way to pass context is to ensure that it is the first parameter of a function. It is expected that context will be passed from function to function this way.
As this is so specific and idiomatic, I don't think we should hide the context inside a struct.
I do agree with the other encapsulations though, just not the hiding of context.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@CrystalChun what you suggest makes sense. My only concern would be that adding another optional parameter (the pointer to the cluster deployment) means that it will be easier for someone to write ClientAndSetFromSecret(nil, mySecret)
, maybe without thinking if the cluster deployment is available or not. That would result in wrong results if that missing cluster deployment was actually a HyperShift cluster. But I am going to apply your suggestion anyhow: seems like a good way to move this forward.
@paul-maidment I agree with you, but I guess that @CrystalChun means that if I apply her suggestion then context parameter isn't strictly needed, as there will be no operation in those functions that supports cancelation.
hack/start_db.sh
Outdated
|
||
# We store the data files in the `/tmp` directory assuming that it will be a `tmpfs` and therefore | ||
# very fast. But in some environments it may not be, as it may be mapped to a real persistent file | ||
# system. Disabling `fsync` speeds things up in those environment, at the cost of risking data loss, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe add a tiny bit more detail on the purpose of fysnc to this comment.
The context that this is ensuring a file is fully written even in the case of a node crash/power failure/reboot is relevant here as we don't expect or need our data to survive a reboot. This is the real reason we don't care about "data loss"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The meaning of the fsync
parameter is described in the documentation of PostgreSQL. For example, for version 12 it is here: https://www.postgresql.org/docs/12/runtime-config-wal.html . I was trying to explain that we don't care about data loss because this is used only in unit tests, and crash, power failure or reboot is not a concern in unit tests. Anyhow, this is in this pull request by accident, not really needed. I will remove it in the next version to avoid noise.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure. I think my point was just to mention what you say about unit tests, the original text doesn't fully explain this and leaves some room for interpretation.
b92a3ac
to
5451fb1
Compare
@CrystalChun I applied your suggestion. @carbonin @tsorya @paul-maidment you may want to review again. |
if k8serrors.IsNotFound(err) { | ||
clusterDeployment = nil | ||
err = nil | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you think we should still fail and propagate the error here when the cluster deployment is not found?
Reason is I'm not sure we should continue to create a spoke client if the cluster doesn't exist
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd prefer to not change the behavior of this code, as I am not sure if the existence of a cluster deployment is a requirement here. If it is I assume that it would be checked before this point.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Gotcha, didn't realize this was the behavior before. That's alright then!
Looks good to me! Thanks Juan! Will wait for others to review |
/retest |
1 similar comment
/retest |
hack/start_db.sh
Outdated
@@ -8,4 +8,5 @@ mkdir -p /tmp/postgres/data | |||
mkdir -p /tmp/postgres/sockets | |||
|
|||
initdb -D /tmp/postgres/data -U postgres | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not much going on in this file anymore, I suspect you can just remove this newline to omit it from the commit
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is completely right, I will.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
There are several situations where assisted service needs to connect to the API server of a spoke cluster. To do so it uses the kubeconfig generated during the installation, and that usually contains the external URL of the API server, and that means that the cluster where assisted service runs needs to be configured with a proxy that allows that. But for HyperShift clusters this can be avoided: assisted service can instead connect via the service network, using the `kube-apiserver.my-cluster.svc` host name, as the API server runs as a pod in the same cluster. Doing that reduces the number of round trips and the potential proxy configuration issues. In order to achive that this patch changes the spoke client factory so that it checks if the cluster is a HyperShift cluster, and then it replaces the API server URL with `https://kube-apiserver.my-cluster.svc:6443`. Related: https://issues.redhat.com/browse/MGMT-19120 Signed-off-by: Juan Hernandez <[email protected]>
5451fb1
to
167ff48
Compare
/retest |
1 similar comment
/retest |
@jhernand: The following test failed, say
Full PR test history. Your PR dashboard. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
/lgtm |
f1dc05f
into
openshift:master
[ART PR BUILD NOTIFIER] Distgit: ose-agent-installer-api-server |
There are several situations where assisted service needs to connect to the API server of a spoke cluster. To do so it uses the kubeconfig generated during the installation, and that usually contains the external URL of the API server, and that means that the cluster where assisted service runs needs to be configured with a proxy that allows that. But for HyperShift clusters this can be avoided: assisted service can instead connect via the service network, using the
kube-apiserver.my-cluster.svc
host name, as the API server runs as a pod in the same cluster. Doing that reduces the number of round trips and the potential proxy configuration issues. In order to achieve that this patch changes the spoke client factory so that it checks if the cluster is a HyperShift cluster, and then it replaces the API server URL withhttps://kube-apiserver.my-cluster.svc:6443
.List all the issues related to this PR
https://issues.redhat.com/browse/MGMT-19120
What environments does this code impact?
How was this code tested?
Checklist
docs
, README, etc)Reviewers Checklist