Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

clusterctl inside cluster in pod cannot find management cluster #6286

Open
steve-fraser opened this issue Mar 10, 2022 · 29 comments · May be fixed by #10729
Open

clusterctl inside cluster in pod cannot find management cluster #6286

steve-fraser opened this issue Mar 10, 2022 · 29 comments · May be fixed by #10729
Labels
area/clusterctl Issues or PRs related to clusterctl help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. kind/bug Categorizes issue or PR as related to a bug. priority/backlog Higher priority than priority/awaiting-more-evidence. triage/accepted Indicates an issue or PR is ready to be actively worked on.

Comments

@steve-fraser
Copy link

What steps did you take and what happened:
[A clear and concise description on how to REPRODUCE the bug.]

  1. Deploy Pod in cluster
  2. Install vsphere provider
  3. Generate configuration
    clusterctl generate cluster $(TEST_CLUSTER_NAME)
    --infrastructure vsphere
    -n $(TEST_CLUSTER_NAME)
    --control-plane-machine-count 1
    --worker-machine-count 0 > /tmp/vsphere-test-cluster.yaml

Error: management cluster not available. Cannot auto-discover target namespace. Please specify a target namespace: invalid kubeconfig file; clusterctl requires a valid kubeconfig file to connect to the management cluster: no configuration has been provided, try setting KUBERNETES_MASTER environment variable

What did you expect to happen:

It is supposed to find the local capi installation

Anything else you would like to add:
[Miscellaneous information that will assist in solving the issue.]

Environment:

  • Cluster-api version: v1.1.2
  • Minikube/KIND version:
  • Kubernetes version: (use kubectl version): v1.21.8
  • OS (e.g. from /etc/os-release):

runner@mvm-runner-2:~$ cat /etc/os-release
NAME="Ubuntu"
VERSION="20.04.3 LTS (Focal Fossa)"
ID=ubuntu
ID_LIKE=debian
PRETTY_NAME="Ubuntu 20.04.3 LTS"
VERSION_ID="20.04"
HOME_URL="https://www.ubuntu.com/"
SUPPORT_URL="https://help.ubuntu.com/"
BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
VERSION_CODENAME=focal
UBUNTU_CODENAME=focal

/kind bug
[One or more /area label. See https://github.com/kubernetes-sigs/cluster-api/labels?q=area for the list of labels]

@k8s-ci-robot k8s-ci-robot added the kind/bug Categorizes issue or PR as related to a bug. label Mar 10, 2022
@sbueringer
Copy link
Member

just fyi @Jont828

/area clusterctl

@k8s-ci-robot k8s-ci-robot added the area/clusterctl Issues or PRs related to clusterctl label Mar 10, 2022
@killianmuldoon
Copy link
Contributor

Just to clarify - are you running the clusterctl binary inside a container and pod in the Kubernetes cluster? Have you supplied it with a kubeconfig so it knows the address of the API server and has access to the certs?

@steve-fraser
Copy link
Author

steve-fraser commented Mar 10, 2022

Just to clarify - are you running the clusterctl binary inside a container and pod in the Kubernetes cluster? Have you supplied it with a kubeconfig so it knows the address of the API server and has access to the certs?

Yes I am running the clusterctl binary inside the mgmt cluster. Specifically I am using this to run a github runner inside the management cluster. This may be more of a feature request but I thought I would to not need the kubeconfig specifically instead it would behave like the kubectl binary would. Kubectl binary will work without dropping the config into the pod by using the kube api service account and env vars.

@sbueringer
Copy link
Member

Agree. I think it would be nice if clusterctl just does in cluster discovery as controllers do too.

It's not really nice if folks have to generate a kubeconfig somehow even though a Pod has the ServiceAccount credentials injected.

@fabriziopandini
Copy link
Member

/milestone v1.2

@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle stale
  • Mark this issue or PR as rotten with /lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jun 8, 2022
@fabriziopandini
Copy link
Member

/lifecycle frozen

@k8s-ci-robot k8s-ci-robot added lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Jun 9, 2022
@Jont828
Copy link
Contributor

Jont828 commented Jun 14, 2022

So applications like cluster autoscaler that run in the cluster initialize their client with InClusterConfig() which gets the kubeconfig of the current cluster or returns an ErrNotInCluster. We could modify the entry code for clusterctl to detect if it's in a cluster, and if it is, go ahead and use the in cluster config. Wdyt @fabriziopandini @sbueringer?

@sbueringer
Copy link
Member

I think something like the following should be fine:

  • if --kubeconfig is set use that one
  • if in cluster use in cluster config

Not sure at which point we should check for the default kubeconfig, but that might be already handled by the client-go util funcs which are usually used for this.

@Jont828
Copy link
Contributor

Jont828 commented Jun 16, 2022

So for the in-cluster config there are two approaches we could take. We could take the approach you outlined where we check for it, and if we get an ErrNotInCluster we suppress it and move on to the default kubeconfig discovery rules. Alternatively, we could add a flag to pass in the in-cluster config and if it's set, we skip the default kubeconfig discovery rules. I think the benefit of the latter approach is that developers trying to initialize the client can handle the ErrNotInCluster cases themselves instead of having it done in the background. Wdyt?

@sbueringer
Copy link
Member

sbueringer commented Jun 17, 2022

I would really prefer if it's just auto-discovery and simply works out of the box without anyone having to specify a special flag for it.

Let's take a look at how kubectl does it. Afaik it automatically works in a Pod / on a local env

@Jont828
Copy link
Contributor

Jont828 commented Jun 23, 2022

Sounds good. I'll take a look at kubectl's implementation when I get the chance and follow up here.

@Jacobious52
Copy link

We're also in need of this issue. We want to use clusterctl backup in a CronJob in the management cluster.
As @sbueringer mentioned I'd expect this to work like most other k8s clients using https://github.com/kubernetes/client-go/blob/master/rest/config.go#L512 that works out the box if it's running inside the cluster.

@Jont828
Copy link
Contributor

Jont828 commented Jul 6, 2022

@sbueringer I'm happy to take a stab at this issue but I'll probably need some help since I'm not very familiar with this code.

I looked at Cluster Autoscaler and here they have some logic that uses the in cluster config. I believe their idea is to have an interface that has one implementation using a kubeconfig file and another implementation using info from the InClusterConfig().

The closest thing I can find is proxy.go where we have an interface that implements certain functions like GetConfig() and CurrentNamespace(). Do you know if we could simply make another implementation of the Proxy interface, or is there other code we would want to change as well?

@Jont828
Copy link
Contributor

Jont828 commented Jul 6, 2022

As for kubectl, I tried running it on a pod but it seems like it doesn't work out of the box.

root@capi-test-control-plane:/# kubectl get pods -A
Error from server (Forbidden): pods is forbidden: User "system:serviceaccount:default:default" cannot list resource "pods" in API group "" at the cluster scope

It seems like we need to set up permissions for it to work, and as a result I'm not too clear on how find the relevant code in their repo.

@sbueringer
Copy link
Member

I did a bit more research and I think in general the behavior of controller-runtime matches relatively closely to what we want for clusterctl: https://github.com/kubernetes-sigs/controller-runtime/blob/master/pkg/client/config/config.go#L43-L61 (unfortunately except the --kubeconfig flag because clusterctl has its own)

I think the clusterctl library (cluster.New) should ideally take a *rest.Config as input parameter instead of the path to a kubeconfig file. This way it can be used in various scenarios and it doesn't depend on a literal file on a disk.

But I have no idea if a change like this is acceptable and how much refactoring this would require.

@killianmuldoon
Copy link
Contributor

killianmuldoon commented Jul 7, 2022

Kubernetes has a kubernetes.NewForConfig(rest.Config) function that does this - we could copy that and add a new function to cover over the case where we want to create a clusterctl client from the rest.config i.e. cluster.NewForConfig(rest.Config)

@sbueringer
Copy link
Member

Maybe we can keep the external API the same, by:

  • keeping cluster.New as is
  • adding cluster.NewForConfig which takes rest.Config

And then refactoring internally behind the API that we don't have to write a temporary kubeconfig with credentials somewhere?

@killianmuldoon
Copy link
Contributor

I'll take a look at this and see what's possible (looking at the code it's not as trivial as I thought 😆

/assign

@Jont828
Copy link
Contributor

Jont828 commented Jul 7, 2022

@killianmuldoon Sounds good! I started hacking on some ideas on my end. In proxy.go it seems like if we refactor to initialize it with a *rest.Config we could rework the other functions. One thing I'm not sure about is if we have access to a kubecontext from the rest.Config. For some of the other Proxy interface functions we could try to do something like this (from cluster autoscaler):

// CurrentNamespace returns the namespace from the current context in the kubeconfig file.
func (k *inClusterProxy) CurrentNamespace() (string, error) {
	// This way assumes you've set the POD_NAMESPACE environment variable using the downward API.
	// This check has to be done first for backwards compatibility with the way InClusterConfig was originally set up
	if ns := os.Getenv("POD_NAMESPACE"); ns != "" {
		return ns, nil
	}

	// Fall back to the namespace associated with the service account token, if available
	if data, err := ioutil.ReadFile("/var/run/secrets/kubernetes.io/serviceaccount/namespace"); err == nil {
		if ns := strings.TrimSpace(string(data)); len(ns) > 0 {
			return ns, nil
		}
	}

	return "default", nil
}

@fabriziopandini fabriziopandini added the triage/accepted Indicates an issue or PR is ready to be actively worked on. label Jul 29, 2022
@fabriziopandini fabriziopandini removed this from the v1.2 milestone Jul 29, 2022
@fabriziopandini fabriziopandini removed the triage/accepted Indicates an issue or PR is ready to be actively worked on. label Jul 29, 2022
@fabriziopandini fabriziopandini added this to the v1.3 milestone Aug 5, 2022
@fabriziopandini
Copy link
Member

/triage accepted

@k8s-ci-robot k8s-ci-robot added the triage/accepted Indicates an issue or PR is ready to be actively worked on. label Aug 5, 2022
@fabriziopandini fabriziopandini removed this from the v1.3 milestone Nov 2, 2022
@fabriziopandini
Copy link
Member

dropping from the milestone because not blocking, but nice to have as soon as someone has bandwidth
/help

@k8s-ci-robot
Copy link
Contributor

@fabriziopandini:
This request has been marked as needing help from a contributor.

Guidelines

Please ensure that the issue body includes answers to the following questions:

  • Why are we solving this issue?
  • To address this issue, are there any code changes? If there are code changes, what needs to be done in the code and what places can the assignee treat as reference points?
  • Does this issue have zero to low barrier of entry?
  • How can the assignee reach out to you for help?

For more details on the requirements of such an issue, please see here and ensure that they are met.

If this request no longer meets these requirements, the label can be removed
by commenting with the /remove-help command.

In response to this:

dropping from the milestone because not blocking, but nice to have as soon as someone has bandwidth
/help

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot added the help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. label Nov 2, 2022
@robbie-demuth
Copy link

Our organization is looking to create vclusters in our CI/CD pipeline, which runs jobs as Kubernetes pods, and clusterctl not being able to detect it's running in a pod like kubectl is somewhat blocking us from doing so (we can use vcluster directly)

@fabriziopandini
Copy link
Member

@robbie-demuth it would be great if someone from your organization could help in getting this fixed, I will be happy to help in getting this over the line

@k8s-triage-robot
Copy link

This issue has not been updated in over 1 year, and should be re-triaged.

You can:

  • Confirm that this issue is still relevant with /triage accepted (org members only)
  • Close this issue with /close

For more details on the triage process, see https://www.kubernetes.dev/docs/guide/issue-triage/

/remove-triage accepted

@k8s-ci-robot k8s-ci-robot added needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. and removed triage/accepted Indicates an issue or PR is ready to be actively worked on. labels Jan 27, 2024
@mjnovice
Copy link

mjnovice commented Mar 2, 2024

Any updates on this ?

@fabriziopandini
Copy link
Member

/priority backlog

@k8s-ci-robot k8s-ci-robot added the priority/backlog Higher priority than priority/awaiting-more-evidence. label Apr 12, 2024
@fabriziopandini
Copy link
Member

The Cluster API project currently lacks enough contributors to adequately respond to all issues and PRs.

We keep this issue around since folks asked about it also recently, but if no-one shows up volunteering for the job most probably we will close it at the next iteration

/triage accepted
/remove-lifecycle frozen

@k8s-ci-robot k8s-ci-robot added triage/accepted Indicates an issue or PR is ready to be actively worked on. and removed lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Apr 23, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/clusterctl Issues or PRs related to clusterctl help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. kind/bug Categorizes issue or PR as related to a bug. priority/backlog Higher priority than priority/awaiting-more-evidence. triage/accepted Indicates an issue or PR is ready to be actively worked on.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

10 participants