Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Could not install Gefyra #173

Open
mbonaci opened this issue May 9, 2023 · 17 comments
Open

Could not install Gefyra #173

mbonaci opened this issue May 9, 2023 · 17 comments

Comments

@mbonaci
Copy link

mbonaci commented May 9, 2023

I'm trying to run this on Windows 11 (with WSL2).
It reads my kubeconfig correctly, connects to the cluster and I can choose a namespace, image and a pod to copy the env from:

image

The extension reports:

  • Kubeconfig changed, restarting Gefyra
  • Checking cluster for existing Gefyra installation
  • Gefyra Operator not found. Installing now
  • Could not install Gefyra

And that's where it ends:

image

I wasn't able to find any errors in the docker extensions log, but I'm not sure I was looking in the right place.
If you specify the log file location (on Windows 11) I can provide more info.

Thank you.

@Schille
Copy link

Schille commented May 9, 2023

Hi @mbonaci
Thank you for reporting this issue. We're going to look into it.
What's your Docker Desktop version? Are you using the built-in Kubernetes? - or any other option?

@mbonaci
Copy link
Author

mbonaci commented May 9, 2023

@Schille thanks for the quick response.
I'm not using the built-in k8s, but a k8s cluster (through a VPN).

kubectl versions:

Client version: v1.24.13 (WSL2 Ubuntu-20.04)
Server version: v1.23.15

Docker Desktop version:
image

I probably should've mentioned this in the initial issue, but I forgot, I only ever use kubectl from my WSL, not the one in Windows that comes with Docker Desktop.

Although both, the kubectl from WSL and the one from Windows can access the cluster and successfully run e.g. kubectl get pods, the one on Windows is a newer version, outside of allowable version skew, so maybe that's what causing the issue:

> kubectl version --short
Flag --short has been deprecated, and will be removed in the future. The --short output will become the default.
Client Version: v1.25.4
Kustomize Version: v4.5.7
Server Version: v1.23.15
WARNING: version difference between client (1.25) and server (1.23) exceeds the supported minor version skew of +/-1

@mbonaci
Copy link
Author

mbonaci commented May 9, 2023

Just to mention that my colleagues reported they were able run Gefyra from the command line to debug a Java server app running within our cluster. E.g.:

gefyra up --host 10.4.35.248 # control-plane,master
gefyra run -i st/web-api:latest -N web-api --env-from pod/web-api-7c7d5dd4df-zmhsr/web-api --expose 5006:5006 --rm --env TINI_SUBREAPER=true

@Schille
Copy link

Schille commented May 10, 2023

Thank you for the follow-up. Well, a VPN in WSL2 seems a bit like a challenge, though it should work, too.
Just one simple question. Did you set the IP in the initial screen under advanced cluster settings?
image
Indeed, the connection is initiated from within WSL2, but only if you run Docker Desktop with the WSL2 backend. Gefyra's extension copies a Windows executable on the host. But, the connection is established from a Wireguard endpoint in your local's Docker network. I am not sure if that network (look for gefyra in docker_ network ls) is part of your VPN setup.
I'd be interested in looking deeper into your setup. If you need help it's probably a better idea to jump on a short call to investigate the issue properly.

@mbonaci
Copy link
Author

mbonaci commented May 10, 2023

Hi @Schille,
I did try entering the IP in Advanced Cluster Settings to see whether that would fix this issue, but that did not help, so I just kept it empty.

Docker WSL2 backend ✔️
image

Gefyra docker network ✔️

$ docker network ls
...
4160cdb57cda   gefyra               bridge    local
...

I'm fine with just working from the WSL2 command line, but I could do a call.

@Schille
Copy link

Schille commented May 17, 2023

Hi @mbonaci. Well, it seems there is a regression in Gefyra's lib concerning WSL2 (or it never really worked at all). Gefyra uses Wireguard to establish a secure VPN connection into the cluster. The default WSL2 kernel is pre-built by Microsoft and they disabled an important feature (Netfilter Conntrack, see: microsoft/WSL#8149). That's a pity.
The good news is, I found a workaround that would enable at least wireguard-go (that Gefyra employs) to run on WSL2 without compiling a personal build of the Linux kernel for WSL2.

Long story short: there will be a release of the Gefyra CLI soon that should run on WSL2.
By the way, the Gefyra CLI for Windows should work nonetheless. The Docker Desktop extension of Gefyra is not affected by this issue since it is running the Windows built of Gefyra's lib.

@mbonaci
Copy link
Author

mbonaci commented May 18, 2023

Hi @Schille,
thanks for the info.
If you ping me here after that release I'd gladly try it out and provide feedback.

@SteinRobert
Copy link
Contributor

The latest release includes some fixes which (should) resolve this one. @mbonaci we'd be super happy to hear your feedback on this. Thank you so much for making us aware of the issue in the first place. We're looking forward to your input on this!

@mbonaci
Copy link
Author

mbonaci commented May 30, 2023

Hi @SteinRobert,
unfortunately the same issue is still present in my case:
image

As I mentioned before, everything goes fine until this step. I'm able to choose the context, the namespace, copy env from, the image... everything gets populated correctly.

@SteinRobert
Copy link
Contributor

By any chance - we've seen similar behavior due to some weird Docker configuration (in ~/.docker/config.json).
Could you please check your credhelper / credshelper key?
I found this answer on Stackexchange was actually helping.

However, this is more or less guessing. We're working on making the actual errors that occur more visible and helpful.

@mbonaci
Copy link
Author

mbonaci commented May 31, 2023

I don't heave any of those keys in that file.

We're working on making the actual errors that occur more visible and helpful.

You can notify me when that happens and I'll retry.

@SteinRobert
Copy link
Contributor

I am afraid this will take a few days or even weeks! In the meantime:
Are you able to build a custom image on your machine? Just from the screenshot it looks like the cargo image cannot be build.
When you create a simple Dockerfile:

FROM alpine
RUN ls

and run docker build . on that - the command works without any problems? Sorry for the back and forth. Just trying to help you so you can continue working with Gefyra.
Thank you so much for your feedback!

@mbonaci
Copy link
Author

mbonaci commented May 31, 2023

I build new images fairly often (in my WSL terminal and in Docker UI), so that shouldn't be the problem here.
I'm not blocked by this, since I can still used Gefyra on the command line (if such a need arises) so I can wait until that update happens, no worries.

@SteinRobert
Copy link
Contributor

Okay, thank you!

@SteinRobert
Copy link
Contributor

We released version 1.2.12 which now displays errors (if available) during the installation process.

@mbonaci
Copy link
Author

mbonaci commented Jun 1, 2023

Here's what it says:

Error: Credentials store error: StoreError('docker-credential-gcloud not installed or not available in PATH') - Couldn't install Gefyra.

I searched online and tried a few suggestions from SO and Docker forum.

The original version of Docker's config.json on WSL:

cat ~/.docker/config.json
{
  "auths": {
    "https://index.docker.io/v1/": {}
  },
  "credsStore": "desktop.exe"
}

The original version on Win:

{
  "credHelpers": {
    "gcr.io": "gcloud",
    "us.gcr.io": "gcloud",
    "eu.gcr.io": "gcloud",
    "asia.gcr.io": "gcloud",
    "staging-k8s.gcr.io": "gcloud",
    "marketplace.gcr.io": "gcloud"
  },
  "credsStore": "desktop"
}

I tried renaming credsStore to credStore, first WSL, then in Win too.
I tried removing credsStore, first WSL, then in Win too.

During all those attempts the error message did not change.

Then I renamed both files to config.json.bkp, which got me over that error, but then this happened:

Waiting for stowaway to become ready.
Could not confirm Stowaway - fatal error.

Then I decided to install Gefyra on WSL in order to see whether I have some higher level issue that's unrelated to Docker Desktop so I ran:

> gefyra up --host 10.4.35.248
[INFO] Installing Gefyra Operator
[INFO] Created network 'gefyra' (747e5f5fe28f)
[INFO] Container image "quay.io/gefyra/operator:1.1.1" already present on machine
[INFO] Pulling image "quay.io/gefyra/stowaway:1.1.1"
[INFO] Successfully pulled image "quay.io/gefyra/stowaway:1.1.1" in 2.63552402s
[INFO] Operator became ready in 10.5670 seconds
[INFO] Deploying Cargo (network sidecar) with IP 172.30.0.149

> gefyra down
[INFO] Removing running bridges
[INFO] Uninstalling Operator
[INFO] Removing Cargo
[INFO] Removing Docker network gefyra

So all good there, it seems.

Then I added the same, master node's IP to the first page of Gefyra extension's Advanced Cluster Settings and re-run it again and this time it seems I've gotten over this stowaway fatal error, but then it just froze on the following line for 10 minutes before I stopped it:

Cargo not found - starting Cargo now...

We're moving forward here and that's the important thing :)

@SteinRobert
Copy link
Contributor

Thank you for the super detailed feedback! Great thing the error was actually displayed!
I'll dive into the given error message later. Hopefully we'll manage to resolve that one as well. I'll get back to you asap.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants