-
-
Notifications
You must be signed in to change notification settings - Fork 365
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WIP] Improve networking options for libvirtd target #922
Conversation
Otherwise, deployments with multiple VMs try to write to the same image. Also, do the temp_image_path calculation only once.
`MachineState.run_command()` passes SSH flags to `self.ssh.run_command()`. However, `self.get_ssh_flags()` is already registered as a `ssh_flag_fun` in the class `__init__()` function, so `ssh_util.SSH` already uses it to get the flags when initiating a connection. This lead to the SSH flags being duplicated, which causes an error for some flags (e.g. the `-J` flag, which can only be specified once).
Offers better separation, especially when additional features will be added.
This helps in situations when there's no network connectivity to the guest, e.g. when the hypervisor host can be reached via a VPN, but the guest host cannot.
This mainly adds driver support for virtio drivers to the initial image of the guest provisioning.
This is a WIP! - Replaced `deployment.libvirtd.networks` option with a submodule to allow not only (libvirt) network names, but other networking types as well. - Domain XML was adjusted accordingly to incorporate the parameters from the new `networks` submodule. - Added the qemu guest agent to guests to allow for out-of-band communication (no need for network connectivity) with the hypervisor. - Guest IP (for provisioning after guest has started) is no longer determined by waiting for the guest to get a DHCP lease in the hypervisor libvirt network. If the guest has a static IP, it won't ask for a DHCP lease. Also, for bridged networking, we probably will not have access to the DHCP server. - Instead, the address of the first interface is retrieved from libvirt using the `VIR_DOMAIN_INTERFACE_ADDRESSES_SRC_AGENT` method, which can now be done because of the newly added qemu guest agent.
type = types.listOf (types.submodule (import ./network-options.nix { | ||
inherit lib; | ||
})); | ||
default = []; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
`default = [{ source = "default"; type= "bridge"; }];` might be best for backwards compatibility.
k.get("value") | ||
for k in x.findall("attr[@name='networks']/list/string")] | ||
LibvirtdNetwork.from_xml(n) | ||
for n in x.findall("attr[@name='networks']/list/*")] | ||
assert len(self.networks) > 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With the current default we hit the assert
+∞ thanks. Just a quick tip for other testers, you can setup your networks with
In fact you have to define at least one network otherwise you'll hit the assert len(self.networks) > 0. |
Did you have the problem
I have tried adding EDIT: Found problem see review |
@teto No I'm sorry... in my setup everything works just fine with the current state. You might search the qemu sources for this error message and try to work your way backwards to the cause. RE reviewing this PR: As you suggested, I'm going to do more work once #824 is merged, so I can be sure I don't have to rebase several times. |
maybe_mac(n), | ||
' <source network="{0}"/>', | ||
' <interface type="{interface_type}">', | ||
' <source {interface_type}="{source}"/>', | ||
' </interface>', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have problems when configuring bridges like Cannot get interface MTU on 'default': No such device
As a quick hack as I work with bridges I kinda reverted the fix with this hybrid change:
def iface(n):
return "\n".join([
' <interface type="network">',
' <source network="{0}"/>',
' </interface>',
]).format(n.source)
which works
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nevermind I used the wrong type. I should have used virtual
instead (bridge seems to expect the host bridge name while virtual expects the libvirt networn name IIUC). I reverted back my changes.
I found a fix thanks ;) |
@teto Can you post this fix here? |
@teto By
do you mean it should be added as a nixos module? |
Basically I reverted some of you change when generating the interface xml:
~~I guess you are using the When booting I also get
this might be related to my iface change. I wonder how you setup your VM ? could you share your deployment.libvirtd.networks/nixpkgs version please ? |
systemd.services.qemu-guest-agent = { | ||
description = "QEMU Guest Agent"; | ||
bindsTo = [ "dev-virtio\\x2dports-org.qemu.guest_agent.0.device" ]; | ||
after = [ "dev-virtio\\x2dports-org.qemu.guest_agent.0.device" ]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For some reason this creates a problem with my config
I found the culprit for some reason theagent won't work mostly because of
I've opened a PR here NixOS/nixpkgs#39099 and with it I am able to make use of your PR \o/ |
|
||
addrs = first_iface.get('addrs', []) | ||
|
||
return addrs[0]['addr'] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
* client: 'NoneType' object has no attribute '__getitem__'
* server: 'NoneType' object has no attribute '__getitem__'
sometimes addrs[0] won't have 'addr' (if DHCP request was not received) which generates the previous error. I guess this should be return addrs[0].get('addr', None)
Is it possible to make virtio by default? Change |
How to fix error:
|
NixOS/nixpkgs#39099 (comment) got merged. |
@teto Thanks. Bridge mode work.
|
<link xlink:href="https://libvirt.org/storage.html">storage pool</link> which | ||
usually corresponds to the <filename>/var/lib/libvirt/images</filename> | ||
directory. You can choose another storage pool with the | ||
<code>deployment.libvirtd.storagePool</code> option: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When trying out this PR, I got:
libvirt: Storage Driver error : Storage pool not found: no storage pool with matching name 'default'
This looks like this: simon3z/virt-deploy#8
And indeed for me:
% virsh pool-list
Name State Autostart
-------------------------------------------
Can we make this work also without the default storage pool or do we need to instruct the user to set this default
storage pool up?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also when trying this out, I get this:
node-3..> uploading disk image...
node-3..> starting...
libvirt: QEMU Driver error : Guest agent is not responding: QEMU guest agent is not connected
node-3..> .libvirt: QEMU Driver error : Guest agent is not responding: QEMU guest agent is not connected
node-3..> .libvirt: QEMU Driver error : Guest agent is not responding: QEMU guest agent is not connected
node-3..> .libvirt: QEMU Driver error : Guest agent is not responding: QEMU guest agent is not connected
Looking at the machine with a shell in virt-manager
, qemu-guest-agent.service
doens't seem to exist in systemd yet (I think only the base image is running at that point).
Not sure if I'm using it wrong.
@@ -334,7 +334,8 @@ def run_command(self, command, **kwargs): | |||
# mainly operating in a chroot environment. | |||
if self.state == self.RESCUE: | |||
command = "export LANG= LC_ALL= LC_TIME=; " + command | |||
return self.ssh.run_command(command, self.get_ssh_flags(), **kwargs) | |||
|
|||
return self.ssh.run_command(command, **kwargs) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is self.get_ssh_flags()
removed here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See the commit message for 574ba39. Although this is quick-and-dirty WIP code, I always try to write comprehensible commit messages, so git blame
can help people understand changes.
MachineState.run_command()
passes SSH flags toself.ssh.run_command()
.
However,self.get_ssh_flags()
is already registered as assh_flag_fun
in the
class__init__()
function, sossh_util.SSH
already uses it to get the flags
when initiating a connection. This lead to the SSH flags being duplicated, which
causes an error for some flags (e.g. the-J
flag, which can only be specified once).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My bad -- I didn't pay enough attention. Great commit message!
return super_flags + ["-o", "StrictHostKeyChecking=no", | ||
"-i", self.get_ssh_private_key_file()] | ||
"-i", self.get_ssh_private_key_file(), | ||
"-J", jumphost] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For me, jumphost
seems to be ""
, so I get Invalid -J argument
.
first_iface = next(v for k, v in ifaces.iteritems() | ||
if v.get('hwaddr', None) == first_iface_mac) | ||
|
||
addrs = first_iface.get('addrs', []) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the point behind this logic?
You're saying "if .addrs
doesn't exist, default to []
", and then below immediately access [0]
on it, which will fail if the default you have given actually happens (because [][0]
can't work).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, you're absolutely right. See my other comment for an explanation: this PR is just a quick-and-dirty POC.
@nh2 As the title says, this is WIP code, which I only pushed after multiple requests from people really wanting to see it. Neither did I put much work in it (currently, I'm not even using this, as my time for NixOS is fairly limited), nor is it any better than quick-and-dirty WIP code. I tried if this could possibly work the way I imagined, created a POC for my extremely limited use case and pushed this for some people to take a look at it. If you want to improve it, feel free to build upon it -- but as I said I currently have no time for work on NixOS, so I won't be able to incorporate your suggestions. |
@nh2 Not wanting to be rude... this PR has just developed its own dynamics, while I got quite short on time for bringing this from POC to production-ready in the meantime ;-). So if anyone wants to move this forward and doesn't want to wait until I'm able to get back to it -- feel free :-). |
@mbrgm No problem! I think it's great that you share your WIP code, that's exactly how it should be done. I'm testing it because I hope it'll give me some hints on how to get around https://github.com/NixOS/nixops/issues/973. My comments carry no negative connotation, I'm just writing down what I find to work / not work to save myself or others some time for later :) |
I've added some commits on top of networking part
This is pretty good work overall and not at all
:) |
@sorki the underlying PR this was based on just got merged. I would be interest in seeing your PR merged too ! |
An update of NixOS/nixops#922 - Replaced `deployment.libvirtd.networks` option with a submodule to allow not only (libvirt) network names, but other networking types as well. - Domain XML was adjusted accordingly to incorporate the parameters from the new `networks` submodule. - Added the qemu guest agent to guests to allow for out-of-band communication (no need for network connectivity) with the hypervisor. - Guest IP (for provisioning after guest has started) is no longer determined by waiting for the guest to get a DHCP lease in the hypervisor libvirt network. If the guest has a static IP, it won't ask for a DHCP lease. Also, for bridged networking, we probably will not have access to the DHCP server. - Instead, the address of the first interface is retrieved from libvirt using the `VIR_DOMAIN_INTERFACE_ADDRESSES_SRC_AGENT` method, which can now be done because of the newly added qemu guest agent.
An update of NixOS/nixops#922 - Replaced `deployment.libvirtd.networks` option with a submodule to allow not only (libvirt) network names, but other networking types as well. - Domain XML was adjusted accordingly to incorporate the parameters from the new `networks` submodule. - Added the qemu guest agent to guests to allow for out-of-band communication (no need for network connectivity) with the hypervisor. - Guest IP (for provisioning after guest has started) is no longer determined by waiting for the guest to get a DHCP lease in the hypervisor libvirt network. If the guest has a static IP, it won't ask for a DHCP lease. Also, for bridged networking, we probably will not have access to the DHCP server. - Instead, the address of the first interface is retrieved from libvirt using the `VIR_DOMAIN_INTERFACE_ADDRESSES_SRC_AGENT` method, which can now be done because of the newly added qemu guest agent.
Hello! Thank you for this PR. In the past several months, some major changes have taken place in
This is all accumulating in to what I hope will be a NixOps 2.0 My hope is that by adding types and more thorough automated testing, However, because of the major changes, it has become likely that this If you would like to see this merge, please bring it up to date with Thank you again for the work you've done here, I am sorry to be Graham |
This builds upon #824 and adds the possibility to connect libvirt guests via bridged networking. It also adds the qemu guest agent to guests, which could be helpful for #881.
There's still work to be done, especially documentation-wise, but also checking whether this introduces regressions and especially keep backwards-compatibility with the existing
deployment.libvirtd.networks
option.However, it already works quite well (at least as far as I tested for bridged networking).
/cc @erosennin @teto