Skip to content

Add QEMU on Windows to CI #3475

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Jun 18, 2025
Merged

Add QEMU on Windows to CI #3475

merged 1 commit into from
Jun 18, 2025

Conversation

arixmkii
Copy link
Contributor

@arixmkii arixmkii commented Apr 26, 2025

For now it will use additional templates, because of incompatible mounts.

This is probably not for 1.1.0.

It is possible to use default.yaml for Windows with changes from #3318 (this would need rebase first, but I checked it using a rebased patch in a forked repo - example run https://github.com/arixmkii/qcw/actions/runs/14681726480/job/41205771124).

@arixmkii
Copy link
Contributor Author

arixmkii commented Apr 26, 2025

time="2025-04-26T17:20:45Z" level=fatal msg="failed to validate YAML file "C:\\a\\lima\\lima\\templates\\experimental\\default-windows.yaml": can't parse builtin Lima version "cfbffd8": cfbffd8 is not in dotted-tri format"

make/git on Windows incorrectly resolve version. I will check it (no such issues, when checkout and build are done with msys2 tools). fixed

Another topic to check - use chocolatey to install QEMU, because msys2 QEMU installation feels slow.

@arixmkii arixmkii marked this pull request as draft April 26, 2025 17:23
@arixmkii
Copy link
Contributor Author

Probably would need to move mounts-windows under _default to not fail validation script.

@arixmkii
Copy link
Contributor Author

Chocolatey QEMU package is not well maintained, so, I chose winget instead, which is a great alternative. There is a known limitation that it is not available out of the box in Windows Server 2022, so, there is a hacky action to add it, which is now archived and will not be needed at all after migration to Windows Server 2025, this is highlighted by the comment.

@arixmkii arixmkii marked this pull request as ready for review April 28, 2025 18:23
@arixmkii
Copy link
Contributor Author

@jandubois @AkihiroSuda I would like to know your opinions on how reasonable is it to extend CI to support this (to not overload CI and not increase costs significantly). From my side there is no rush and I can see reasons to postpone this until #3316 is addressed (via #3318 refresh or other means). Also it might be reasonable to wait for migration to WinServer 2025 to not use now archived https://github.com/Cyberboss/install-winget action.

I authored it now to have proof of concept confirmed and potentially creating reference starting point for its introduction.

@@ -175,6 +175,44 @@ jobs:
$env:_LIMA_WINDOWS_EXTRA_PATH = 'C:\Program Files\Git\usr\bin'
bash.exe -c "./hack/test-templates.sh templates/experimental/wsl2.yaml"

windows-qemu:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we now drop these lines?

if runtime.GOOS == "windows" && runtime.GOARCH == "amd64" {
// https://github.com/lima-vm/lima/pull/3487#issuecomment-2846253560
// > #931 intentionally prevented the code from setting it to max when running on Windows,
// > and kept it at qemu64.
//
// TODO: remove this if "max" works with the latest qemu
defaultX8664 = "qemu64"
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From my experience "max" just didn't work well with WHPX acceleration. I tested it on 3 different machines in the past. I only was able to make it work by disabling specific CPU features, which were different on every machine. It was not user friendly default. I can do some canary testing to compare how it works now with newer QEMU/Windows versions and if the failures are as common as they were before.

@arixmkii
Copy link
Contributor Author

arixmkii commented May 2, 2025

I tried to limit both Windows builds to windows-2025 standard. I see QEMU one failed with errors mounting SSHFS (I observed this instability before with standard runners, they are definitely recurring and could very persistent restarting job). WSL2 faced some error and now test in a locked state (it will be terminated after 30 minutes time out, because I can't cancel it manually). I can say that WSL2 was less stable (comparing to Lima 8-cores runner), when I used standard runners, but I mostly faced errors from sysmtemd, this one is new.

I will restart build setting both to windows-2025-8-cores to compare.

@arixmkii
Copy link
Contributor Author

arixmkii commented May 2, 2025

It didn't help for QEMU

time="2025-05-02T19:26:57Z" level=info msg="[hostagent] :/c/Users/runneradmin: No such file or directory"
time="2025-05-02T19:26:57Z" level=warning msg="[hostagent] failed to confirm whether /c/Users/runneradmin [remote] is successfully mounted" error="failed to execute script \"wait-for-remote-ready\": stdout=\"\", stderr=\"mux_client_request_session: read from master failed: Connection reset by peer\\r\\nControlSocket /c/Users/runneradmin/.lima/default/ssh.sock already exists, disabling multiplexing\\r\\nsshfs does not seem to be mounted on /c/Users/runneradmin\\n\": exit status 1"

SSHFS is weird on Windows in general and inside runners specifically. Giving some insights on my experience testing this in GH runners for a month or so. It always (or almost always) failed to mount $TEMP, but most of the time managed to mount $HOME, the situation with $TEMP - if temp was tried, but was not mounted the integration tests will still pass.

Troubleshooting the $TEMP issue locally I first managed to replicate it on my dev machine, but the fix was to clean the content of $TEMP folder. It looked like sftp-server might be sensitive to the folder contents, but I didn't try to test this in details.

I'm thinking I will test the standard runners and disable mount tests on Windows platform with a comment of them being flaky - which they indeed are.

Will experiment in my repo on isolated examples and then will update this PR once again.

@jandubois
Copy link
Member

It always (or almost always) failed to mount $TEMP, but most of the time managed to mount $HOME

Is this just another instance of #302? Because $TEMP will be located at $HOME\AppData\Temp?

I always thought the issue was the overlap in the guest, but maybe the overlap on the host is the real problem?

At the time I filed #302 we did not yet have support for specifying a different mountPoint, so it was impossible to tell which side was causing the issue.

@AkihiroSuda AkihiroSuda modified the milestones: v1.1.0, v1.1.x (?) May 12, 2025
@AkihiroSuda
Copy link
Member

What's the current status of this?

@arixmkii
Copy link
Contributor Author

I believe I managed (up to my current understanding of the flaky part) to address this running it locally. I will experiment in GH CI in a forked repo and will post the status here this week.

Signed-off-by: Arthur Sengileyev <[email protected]>
name: "Windows tests"
runs-on: windows-2022-8-cores
name: "Windows tests (WSL2)"
runs-on: windows-2025-8-cores
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe

Suggested change
runs-on: windows-2025-8-cores
runs-on: windows-2025

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can try this. From my observations it was less stable in a forked repo, where I have only default runners. I will apply this change after I will check the comment with qemucpu to max one.

If this will result in unstable builds it can be reverted later.

@AkihiroSuda AkihiroSuda modified the milestones: v1.1.x (?), v1.1.2 Jun 18, 2025
Copy link
Member

@AkihiroSuda AkihiroSuda left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks

@AkihiroSuda AkihiroSuda merged commit db2c41a into lima-vm:master Jun 18, 2025
38 checks passed
@arixmkii
Copy link
Contributor Author

I spent all day doing experiments with CI to make SSHFS behave on Windows. What I tried

  • setting HOME and TEMP to different locations to have reduced FS tree for sftp-server
  • setting HOME and TEMP to not nested setup (default on windows is TEMP nested deep inside HOME)
  • trying different drives C: and D: as locations - may be the disks have different I/O settings in the runner

What I failed to test

  • running test as a different user to fully isolate from build using psexec of paexec. I gave up on this idea because it is difficult to pass all needed environment configurations to another user session and there is no good way to redirect logs to stdout for traceability of the test runs

For now it seems that disabling sshfs tests on Windows was a way to go.

I have success to run them in github CI in a forked repo, where I chain 2 VMs - one creating build in a zip form and another one unzipping artifact and running tests on it. As Lima build is I/O heavy it might be that it hits sort of I/O throttling after the build, but I have no way to confirm it. And this chaining strategy giving better results still sometimes result in failures. I will continue to experiment with different setups for sshfs tests in a forked repo.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants