-
Notifications
You must be signed in to change notification settings - Fork 624
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
vz: Has been stuck in Waiting for the essential requirement 1 of 3: "ssh" on Start lima #1200
Comments
I just tried this template, it works fine for me. Have you tried from source (or) the release build ? |
I tried to release and compile v0.14.0-beta.1, and the problem is the same. What can I do to help this issue? |
Is running |
I actually do experience something similar and I suspect it is related to GVisor network. I have only experienced a frozen startup once or twice, but what I do experience more frequently is frozen SSH sessions. When it happens, everything still works with the exception of SSH and everything linked to it e.g. port forwarding. I am still unable to pin-point the cause, and it does happen with multiple distros including the default vz template. |
Ah !! That's bad. There might be a very slight possibility that the issue is related to dgram sockpair that is created as well. |
@kj-creater - Thanks for the video. i was able to reproduce this issue. Looks like the disk size is too low for vm to start even. I was able to reproduce this same issue with qemu driver as well with disk size being 2GiB. Please increase the disk size to maybe 5GiB. This worked fine for me. |
I have this issue and I believe the same reason with @abiosoft
This hang has not occurred in network communication without gvisor-tap-vsock (e.g. examples in vz repository) |
Facing this as well. These seem related:
FWIW, while tinkering with start / stop, I managed to SIGSEGV:
|
Hang on VM stop from #1358 OUTPUT:
|
I tried to reproduce this with #1383 and in a random case current running instance froze (ssh didn't work) Looks like its related to gvisor (as abiosoft mentioned). As once freezed i tried to start a new instance but even that didn't work. The connection was successful with gvisor-tap-vsock but no response Will try to debug more with this information Edit: |
I had a somewhat similar error while stopping colima. In some cases, that I unfortunately can't reproduce, this happends while running in the background too. Sometimes every 10 minutes, sometimes after 2 hours.
|
So I saw this, too, for the first time. Same template that had been working fine suddenly would no longer start with this error. Prepared a new instance under another name and that worked fine. It seems as though something is retained about the network state of the instance which the new one doesn't have. Unfortunately I couldn't find anything relevant besides old pid files and logs which don't seem to affect this at all 🤔 |
Any news on this ? I have the same problem while using trellis new lima based vm |
@GianlucaCesari Is vm stuck during start-up ?? Could you share your lima configuration if yes. If possible, give a try with latest lima master as well. |
Yes, i use the command This is the lima.yaml of my vm vmType: "vz"
rosetta:
enabled: false
images:
- location: https://cloud-images.ubuntu.com/jammy/current/jammy-server-cloudimg-amd64.img
arch: x86_64
- location: https://cloud-images.ubuntu.com/jammy/current/jammy-server-cloudimg-arm64.img
arch: aarch64
mounts:
- location: <path>/site
mountPoint: <path>/current
writable: true
mountType: "virtiofs"
networks:
- vzNAT: true
containerd:
user: false
provision:
- mode: system
script: |
#!/bin/bash
echo "127.0.0.1 $(hostname)" >> /etc/hosts
|
Just kidding, it gave me a 4hour of error-free work and now it hangs, the vm says its running but if i try to shell into the vm it keeps hanging, and when i try to limactl start after limactl stop -f this is what it's giving me Running command => limactl start whuis.com
INFO[0000] Using the existing instance "whuis.com"
INFO[0000] [hostagent] Starting VZ (hint: to watch the boot progress, see "/Users/gianlucacesari/.lima/whuis.com/serial.log")
INFO[0000] SSH Local Port: 49530
INFO[0000] [hostagent] new connection from to
INFO[0000] [hostagent] [VZ] - vm state change: running
INFO[0000] [hostagent] Waiting for the essential requirement 1 of 3: "ssh"
INFO[0003] [hostagent] 2023/05/10 12:10:19 tcpproxy: for incoming conn 127.0.0.1:49532, error dialing "192.168.5.15:22": connect tcp 192.168.5.15:22: no route to host
INFO[0013] [hostagent] Waiting for the essential requirement 1 of 3: "ssh"
INFO[0013] [hostagent] 2023/05/10 12:10:29 tcpproxy: for incoming conn 127.0.0.1:49533, error dialing "192.168.5.15:22": connect tcp 192.168.5.15:22: connection was refused
INFO[0023] [hostagent] Waiting for the essential requirement 1 of 3: "ssh"
INFO[0023] [hostagent] 2023/05/10 12:10:39 tcpproxy: for incoming conn 127.0.0.1:49534, error dialing "192.168.5.15:22": connect tcp 192.168.5.15:22: connection was refused
INFO[0033] [hostagent] Waiting for the essential requirement 1 of 3: "ssh"
INFO[0033] [hostagent] 2023/05/10 12:10:49 tcpproxy: for incoming conn 127.0.0.1:49535, error dialing "192.168.5.15:22": connect tcp 192.168.5.15:22: connection was refused
INFO[0043] [hostagent] Waiting for the essential requirement 1 of 3: "ssh"
INFO[0043] [hostagent] 2023/05/10 12:10:59 tcpproxy: for incoming conn 127.0.0.1:49536, error dialing "192.168.5.15:22": connect tcp 192.168.5.15:22: connection was refused
INFO[0053] [hostagent] Waiting for the essential requirement 1 of 3: "ssh"
|
This doesn't look like a network issue, instead looks more of like disk corruption. Unfortunately in vz we have a open issue that early boot log are not captured. Below we are changing the vmType to QEMU to identify boot failure with the same disk booted via vz. Just for testing purpose can you try the following,
|
Just for the sake of finding out if the problem is vz (qemu is not yet supported by trellis so i didn't have high hopes) i tried what you asked me, but even changing to qemu, removing mountType and networks:
- vzNAT: true cause it was throwing errors, it still hangs with this message: INFO[0000] Using the existing instance "whuis.com"
INFO[0000] [hostagent] Starting QEMU (hint: to watch the boot progress, see "/Users/gianlucacesari/.lima/whuis.com/serial.log")
INFO[0000] SSH Local Port: 50548
INFO[0000] [hostagent] Waiting for the essential requirement 1 of 5: "ssh"
INFO[0017] [hostagent] Waiting for the essential requirement 1 of 5: "ssh"
INFO[0027] [hostagent] Waiting for the essential requirement 1 of 5: "ssh"
INFO[0037] [hostagent] Waiting for the essential requirement 1 of 5: "ssh"
INFO[0047] [hostagent] Waiting for the essential requirement 1 of 5: "ssh" This is the ha.stderr.log if it can be helpful |
Could you please share the serial.log file. |
It's empty, the only lines in it are these:
|
That's odd, after changing to qemu we should get initial boot logs in serial.log file |
serial.log |
Thanks. Unfortunately from serial.log it looks like successfully booted :( |
Experiencing the same as op after upgrading to the latest released versions of colima / lima, then creating a new vz vm.
|
@terev |
@balajiv113 yes thats right. though it may be slightly different:
edit: @balajiv113 fixed thanks to the suggestion to update to >= 13.3.1 here abiosoft/colima#684 (comment) |
Fixed with the help from @balajiv113 . Thanks to the suggestion to update to >= 13.3.1 here abiosoft/colima#684 (comment) |
I'm seeing this problem with the latest macOS 13.5.1 using colima 0.5.5 and lima 0.17.2 installed from Homebrew. |
Description
oracle-vz.zip
OS: macOS 13.0.1 M1
The text was updated successfully, but these errors were encountered: