feat: support additional virtualized NVMe disks #85

tserong · 2024-11-04T09:02:04Z

Any node for which nvme_size is specified in settings.yml will automatically have one additional virtual NVMe disk created, which will appear as /dev/nvme0n1. This can be useful for testing additional disks with Longhorn. The way it works is a bit obscure because libvirt only seems to support virtio, scsi and sata disks, so we have to resort to hacking qemu arguments in as described in this blog post:

http://blog.frankenmichl.de/2018/02/13/add-nvme-device-to-vm/

This means we have to go create the volume manually first with virsh then pass its path to qemu. Matters are complicated further by volumes being owned by root by default, while libvirt runs VMs as the qemu user. For normal libvirt volumes, file ownership changes happen automatically, but for these volumes we're hacking in by hand, that doesn't happen, so we have to explicitly specify ownership, and to do that we have to define the volume via XML, hence the nasty virsh vol-create-as ... | sed ... | virsh vol-create invocation.

The other wrinkle is that we need to know the exact path to the disk image. Once a volume has been created, you can run virsh vol-path --pool default VOLUME_NAME to get the path, but we need to know the path when setting libvirt.qemuargs, whereas we don't want to actually create the volume until the vagrant up trigger for the node in question. If we create the volume outside the trigger, it gets created on every evaluation of the Vagrant file, even for unrelated VMs... So, we call virsh pool-dumpxml default and get the pool path from that, then stick the volume name on the end.

To test, try something like this in settings.yml:

harvester_network_config:
  cluster:
    - ip: 192.168.0.30
      # [...]
      nvme_size: 10G

Any node for which nvme_size is specified in settings.yml will automatically have one additional virtual NVMe disk created, which will appear as /dev/nvme0n1. This can be useful for testing additional disks with Longhorn. The way it works is a bit obscure because libvirt only seems to support virtio, scsi and sata disks, so we have to resort to hacking qemu arguments in as described in this blog post: http://blog.frankenmichl.de/2018/02/13/add-nvme-device-to-vm/ This means we have to go create the volume manually first with `virsh` then pass its path to qemu. Matters are complicated further by volumes being owned by root by default, while libvirt runs VMs as the qemu user. For normal libvirt volumes, file ownership changes happen automatically, but for these volumes we're hacking in by hand, that doesn't happen, so we have to explicitly specify ownership, and to do _that_ we have to define the volume via XML, hence the nasty `virsh vol-create-as ... | sed ... | virsh vol-create` invocation. The other wrinkle is that we need to know the exact path to the disk image. Once a volume has been created, you can run `virsh vol-path --pool default VOLUME_NAME` to get the path, but we need to know the path when setting libvirt.qemuargs, whereas we don't want to actually create the volume until the `vagrant up` trigger for the node in question. If we create the volume outside the trigger, it gets created on _every_ evaluation of the Vagrant file, even for unrelated VMs... So, we call `virsh pool-dumpxml default` and get the pool path from that, then stick the volume name on the end. To test, try something like this in settings.yml: ``` harvester_network_config: cluster: - ip: 192.168.0.30 # [...] nvme_size: 10G ``` Signed-off-by: Tim Serong <[email protected]>

bk201 · 2024-11-11T01:59:54Z

Does the network interface name change after adding an NVME disk? I have an impression that the ens5/ens6 or something changed after adding the second disk.

tserong · 2024-11-11T04:27:37Z

Doesn't seem to make any difference. I just gave it a quick test checking ip a in the installer, first without an NVMe:

rancher@rancher:~> ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
2: ens5: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
    link/ether 52:54:00:71:b5:1c brd ff:ff:ff:ff:ff:ff
    altname enp0s5
    inet 192.168.121.53/24 brd 192.168.121.255 scope global ens5
       valid_lft forever preferred_lft forever
3: ens6: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
    link/ether 02:00:00:0d:62:e2 brd ff:ff:ff:ff:ff:ff
    altname enp0s6

...and second with an NVMe:

rancher@rancher:~> ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
2: ens5: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
    link/ether 52:54:00:5d:85:07 brd ff:ff:ff:ff:ff:ff
    altname enp0s5
    inet 192.168.121.66/24 brd 192.168.121.255 scope global ens5
       valid_lft forever preferred_lft forever
3: ens6: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
    link/ether 02:00:00:0d:62:e2 brd ff:ff:ff:ff:ff:ff
    altname enp0s6

tserong · 2024-11-11T04:52:30Z

@m-ildefons you might be interested in this, as I see you're adding multiple disk support in #86 :-)

bk201

Thanks Tim

votdev · 2024-11-11T08:33:56Z

When running the setup script i got this error:

$ ./setup_harvester.sh
~/git/ipxe-examples/vagrant-pxe-harvester ~/git/ipxe-examples/vagrant-pxe-harvester
[WARNING]: No inventory was parsed, only implicit localhost is available
[WARNING]: provided hosts list is empty, only localhost is available. Note that the implicit localhost does not match
'all'

PLAY [Setup Harvester] *************************************************************************************************

TASK [debug] ***********************************************************************************************************
ok: [localhost] => 
  msg: Installation Started

TASK [install PXE server] **********************************************************************************************
fatal: [localhost]: FAILED! => changed=true 
  cmd: |-
    vagrant up pxe_server
  delta: '0:00:00.550344'
  end: '2024-11-11 08:27:47.004519'
  msg: non-zero return code
  rc: 1
  start: '2024-11-11 08:27:46.454175'
  stderr: |-
    error: failed to get pool 'default'
    error: Storage pool not found: no storage pool with matching name 'default'
    Vagrant failed to initialize at a very early stage:
  
    There was an error loading a Vagrantfile. The file being loaded
    and the error message are shown below. This is usually caused by
    a syntax error.
  
    Path: <provider config: libvirt>
    Line number: 114
    Message: NoMethodError: undefined method `content' for nil:NilClass
  
              pool_path = Nokogiri::XML(%x(virsh pool-dumpxml default)).at_xpath('/pool/target/path').content
                                                                                                     ^^^^^^^^
  stderr_lines: <omitted>
  stdout: ''
  stdout_lines: <omitted>

PLAY RECAP *************************************************************************************************************
localhost                  : ok=1    changed=0    unreachable=0    failed=1    skipped=0    rescued=0    ignored=0

I have no default pool on my system:

$ virsh pool-list
 Name   State   Autostart
---------------------------

votdev · 2024-11-11T08:37:26Z

I would suggest that you add the line # nvme_size: 10G to each predefined cluster node in settings.yaml, so that everyone can see at first glance that there is an optional setting here.

albinsun · 2024-11-11T08:44:06Z

When running the setup script i got this error:
...

Same error as @votdev
@tserong do you have any addtional settings for vagrant to work with virsh?

albinsun · 2024-11-11T08:53:10Z

By the way I think we can have a more general naming cuz the created qcow2 file can also be used as other disk types, not only NVMe

tserong · 2024-11-12T01:46:39Z

OK, that's weird. I don't think I did anything special to be able to see the default pool... I am in the libvirt group if that makes a difference...?

tserong@ra:~> virsh pool-list
 Name      State    Autostart
-------------------------------
 default   active   no
 images    active   yes

tserong@ra:~> id 
uid=1000(tserong) gid=100(users) groups=100(users),108(libvirt),466(docker)

tserong · 2024-11-12T02:01:14Z

Oh, wait a minute, my ~/.bashrc includes this:

export LIBVIRT_DEFAULT_URI=qemu:///system

I'm going to have to experiment a bit more here. Out of curiosity @votdev @albinsun, what do you see if you run virsh uri (I'm guessing it's qemu:///session). What about if you run LIBVIRT_DEFAULT_URI=qemu:///system virsh pool-list? Does that get a pool list? And, does that command require any additional authentication?

tserong · 2024-11-12T03:38:07Z

Actually, if you don't mind :-) please try the updated version of this PR which explicitly specifies qemu:///system as the virsh connection URI. This should do the trick, I hope. I tested by manually unsetting my LIBVIRT_DEFAULT_URI environment variable, the when I ran the setup scripts I saw the same failure you both saw, then I applied this latest change and it ran through OK.

votdev · 2024-11-12T07:44:13Z

Oh, wait a minute, my ~/.bashrc includes this:
export LIBVIRT_DEFAULT_URI=qemu:///system
I'm going to have to experiment a bit more here. Out of curiosity @votdev @albinsun, what do you see if you run virsh uri (I'm guessing it's qemu:///session). What about if you run LIBVIRT_DEFAULT_URI=qemu:///system virsh pool-list? Does that get a pool list? And, does that command require any additional authentication?

$ virsh uri
qemu:///session

$ LIBVIRT_DEFAULT_URI=qemu:///system virsh pool-list
 Name      State    Autostart
-------------------------------
 default   active   yes

votdev · 2024-11-12T08:15:24Z

I'm sorry, it fails again.

TASK [boot Harvester Node 0] **************************************************************************************************
fatal: [localhost]: FAILED! => changed=true 
  cmd: |-
    vagrant up harvester-node-0
  delta: '0:00:00.606143'
  end: '2024-11-12 09:07:36.016111'
  msg: non-zero return code
  rc: 1
  start: '2024-11-12 09:07:35.409968'
  stderr: |-
    ==> harvester-node-0: Trigger run failed
    ==> harvester-node-0: A script exited with an unacceptable exit code 2.
    A script exited with an unacceptable exit code 2.
  stderr_lines: <omitted>
  stdout: |-
    Bringing machine 'harvester-node-0' up with 'libvirt' provider...
    ==> harvester-node-0: Running action triggers before up ...
    ==> harvester-node-0: Running trigger...
    ==> harvester-node-0: Creating volume vagrant-pxe-harvester_harvester-node-0-nvme0.qcow2
        harvester-node-0: Running local: Inline script
        harvester-node-0: sh -c "              export LIBVIRT_DEFAULT_URI=qemu:///system               virsh vol-info --pool=default vagrant-pxe-harvester_harvester-node-0-nvme0.qcow2 >/dev/null 2>&1 ||               virsh vol-create-as default vagrant-pxe-harvester_harvester-node-0-nvme0.qcow2                     --capacity 10G                     --format qcow2 --print-xml |               sed '/<target>/a <permissions><owner>0</owner><group>0</group></permissions>' |               virsh vol-create default /dev/stdin 2>&1"
  stdout_lines: <omitted>

PLAY RECAP ********************************************************************************************************************
localhost                  : ok=10   changed=3    unreachable=0    failed=1    skipped=0    rescued=0    ignored=0

tserong · 2024-11-12T09:15:07Z

I'm sorry, it fails again.

All good, at least we got slightly further :-) but it's annoying that there's not more detail in the output as to what's failing now :-/ I'm not sure offhand how to further instrument that trigger script... I don't suppose you could try running this nasty thing directly from a shell and see if it prints anything interesting...?

export LIBVIRT_DEFAULT_URI=qemu:///system ; virsh vol-info --pool=default vagrant-pxe-harvester_harvester-node-0-nvme0.qcow2 >/dev/null 2>&1 || virsh vol-create-as default vagrant-pxe-harvester_harvester-node-0-nvme0.qcow2 --capacity 10G --format qcow2 --print-xml |sed '/<target>/a <permissions><owner>0</owner><group>0</group></permissions>' | virsh vol-create default /dev/stdin 2>&1

votdev · 2024-11-12T13:56:41Z

I'm sorry, it fails again.

All good, at least we got slightly further :-) but it's annoying that there's not more detail in the output as to what's failing now :-/ I'm not sure offhand how to further instrument that trigger script... I don't suppose you could try running this nasty thing directly from a shell and see if it prints anything interesting...?

export LIBVIRT_DEFAULT_URI=qemu:///system ; virsh vol-info --pool=default vagrant-pxe-harvester_harvester-node-0-nvme0.qcow2 >/dev/null 2>&1 || virsh vol-create-as default vagrant-pxe-harvester_harvester-node-0-nvme0.qcow2 --capacity 10G --format qcow2 --print-xml |sed '/<target>/a <permissions><owner>0</owner><group>0</group></permissions>' | virsh vol-create default /dev/stdin 2>&1

This command succeeds when running in CLI. Will try how it behaves when there are no nodes deployed.

Signed-off-by: Tim Serong <[email protected]>

albinsun · 2024-12-02T07:00:07Z

Latest code successfully create nvme0.qcow2 but seems not in hardware list yet.
Does it need to be manually Add Hardware -> Storage?

Tried to manually add disk but fail with

tserong requested review from Vicente-Cheng, bk201 and albinsun November 11, 2024 00:00

tserong requested a review from m-ildefons November 11, 2024 04:53

bk201 approved these changes Nov 11, 2024

View reviewed changes

fix: Use qemu:///system URI with virsh invocations

d539797

Signed-off-by: Tim Serong <[email protected]>

tserong force-pushed the wip-virtual-nvme branch from 066cd34 to d539797 Compare November 26, 2024 00:48

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: support additional virtualized NVMe disks #85

feat: support additional virtualized NVMe disks #85

tserong commented Nov 4, 2024

bk201 commented Nov 11, 2024

tserong commented Nov 11, 2024

tserong commented Nov 11, 2024

bk201 left a comment

votdev commented Nov 11, 2024

votdev commented Nov 11, 2024 •

edited

Loading

albinsun commented Nov 11, 2024

albinsun commented Nov 11, 2024

tserong commented Nov 12, 2024

tserong commented Nov 12, 2024

tserong commented Nov 12, 2024

votdev commented Nov 12, 2024

votdev commented Nov 12, 2024

tserong commented Nov 12, 2024 •

edited

Loading

votdev commented Nov 12, 2024

albinsun commented Dec 2, 2024

feat: support additional virtualized NVMe disks #85

Are you sure you want to change the base?

feat: support additional virtualized NVMe disks #85

Conversation

tserong commented Nov 4, 2024

bk201 commented Nov 11, 2024

tserong commented Nov 11, 2024

tserong commented Nov 11, 2024

bk201 left a comment

Choose a reason for hiding this comment

votdev commented Nov 11, 2024

votdev commented Nov 11, 2024 • edited Loading

albinsun commented Nov 11, 2024

albinsun commented Nov 11, 2024

tserong commented Nov 12, 2024

tserong commented Nov 12, 2024

tserong commented Nov 12, 2024

votdev commented Nov 12, 2024

votdev commented Nov 12, 2024

tserong commented Nov 12, 2024 • edited Loading

votdev commented Nov 12, 2024

albinsun commented Dec 2, 2024

votdev commented Nov 11, 2024 •

edited

Loading

tserong commented Nov 12, 2024 •

edited

Loading