Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: support additional virtualized NVMe disks #85

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

tserong
Copy link
Contributor

@tserong tserong commented Nov 4, 2024

Any node for which nvme_size is specified in settings.yml will automatically have one additional virtual NVMe disk created, which will appear as /dev/nvme0n1. This can be useful for testing additional disks with Longhorn. The way it works is a bit obscure because libvirt only seems to support virtio, scsi and sata disks, so we have to resort to hacking qemu arguments in as described in this blog post:

http://blog.frankenmichl.de/2018/02/13/add-nvme-device-to-vm/

This means we have to go create the volume manually first with virsh then pass its path to qemu. Matters are complicated further by volumes being owned by root by default, while libvirt runs VMs as the qemu user. For normal libvirt volumes, file ownership changes happen automatically, but for these volumes we're hacking in by hand, that doesn't happen, so we have to explicitly specify ownership, and to do that we have to define the volume via XML, hence the nasty virsh vol-create-as ... | sed ... | virsh vol-create invocation.

The other wrinkle is that we need to know the exact path to the disk image. Once a volume has been created, you can run virsh vol-path --pool default VOLUME_NAME to get the path, but we need to know the path when setting libvirt.qemuargs, whereas we don't want to actually create the volume until the vagrant up trigger for the node in question. If we create the volume outside the trigger, it gets created on every evaluation of the Vagrant file, even for unrelated VMs... So, we call virsh pool-dumpxml default and get the pool path from that, then stick the volume name on the end.

To test, try something like this in settings.yml:

harvester_network_config:
  cluster:
    - ip: 192.168.0.30
      # [...]
      nvme_size: 10G

Any node for which nvme_size is specified in settings.yml will
automatically have one additional virtual NVMe disk created,
which will appear as /dev/nvme0n1.  This can be useful for
testing additional disks with Longhorn.  The way it works
is a bit obscure because libvirt only seems to support virtio,
scsi and sata disks, so we have to resort to hacking qemu
arguments in as described in this blog post:

  http://blog.frankenmichl.de/2018/02/13/add-nvme-device-to-vm/

This means we have to go create the volume manually first with
`virsh` then pass its path to qemu.  Matters are complicated
further by volumes being owned by root by default, while libvirt
runs VMs as the qemu user.  For normal libvirt volumes, file
ownership changes happen automatically, but for these volumes
we're hacking in by hand, that doesn't happen, so we have to
explicitly specify ownership, and to do _that_ we have to
define the volume via XML, hence the nasty `virsh vol-create-as
... | sed ... | virsh vol-create` invocation.

The other wrinkle is that we need to know the exact path to
the disk image.  Once a volume has been created, you can run
`virsh vol-path --pool default VOLUME_NAME` to get the path,
but we need to know the path when setting libvirt.qemuargs,
whereas we don't want to actually create the volume until the
`vagrant up` trigger for the node in question.  If we create
the volume outside the trigger, it gets created on _every_
evaluation of the Vagrant file, even for unrelated VMs...
So, we call `virsh pool-dumpxml default` and get the pool
path from that, then stick the volume name on the end.

To test, try something like this in settings.yml:

```
harvester_network_config:
  cluster:
    - ip: 192.168.0.30
      # [...]
      nvme_size: 10G
```

Signed-off-by: Tim Serong <[email protected]>
@bk201
Copy link
Member

bk201 commented Nov 11, 2024

Does the network interface name change after adding an NVME disk? I have an impression that the ens5/ens6 or something changed after adding the second disk.

@tserong
Copy link
Contributor Author

tserong commented Nov 11, 2024

Doesn't seem to make any difference. I just gave it a quick test checking ip a in the installer, first without an NVMe:

rancher@rancher:~> ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
2: ens5: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
    link/ether 52:54:00:71:b5:1c brd ff:ff:ff:ff:ff:ff
    altname enp0s5
    inet 192.168.121.53/24 brd 192.168.121.255 scope global ens5
       valid_lft forever preferred_lft forever
3: ens6: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
    link/ether 02:00:00:0d:62:e2 brd ff:ff:ff:ff:ff:ff
    altname enp0s6

...and second with an NVMe:

rancher@rancher:~> ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
2: ens5: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
    link/ether 52:54:00:5d:85:07 brd ff:ff:ff:ff:ff:ff
    altname enp0s5
    inet 192.168.121.66/24 brd 192.168.121.255 scope global ens5
       valid_lft forever preferred_lft forever
3: ens6: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
    link/ether 02:00:00:0d:62:e2 brd ff:ff:ff:ff:ff:ff
    altname enp0s6

@tserong
Copy link
Contributor Author

tserong commented Nov 11, 2024

@m-ildefons you might be interested in this, as I see you're adding multiple disk support in #86 :-)

@tserong tserong requested a review from m-ildefons November 11, 2024 04:53
Copy link
Member

@bk201 bk201 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Tim

@votdev
Copy link
Member

votdev commented Nov 11, 2024

When running the setup script i got this error:

$ ./setup_harvester.sh
~/git/ipxe-examples/vagrant-pxe-harvester ~/git/ipxe-examples/vagrant-pxe-harvester
[WARNING]: No inventory was parsed, only implicit localhost is available
[WARNING]: provided hosts list is empty, only localhost is available. Note that the implicit localhost does not match
'all'

PLAY [Setup Harvester] *************************************************************************************************

TASK [debug] ***********************************************************************************************************
ok: [localhost] => 
  msg: Installation Started

TASK [install PXE server] **********************************************************************************************
fatal: [localhost]: FAILED! => changed=true 
  cmd: |-
    vagrant up pxe_server
  delta: '0:00:00.550344'
  end: '2024-11-11 08:27:47.004519'
  msg: non-zero return code
  rc: 1
  start: '2024-11-11 08:27:46.454175'
  stderr: |-
    error: failed to get pool 'default'
    error: Storage pool not found: no storage pool with matching name 'default'
    Vagrant failed to initialize at a very early stage:
  
    There was an error loading a Vagrantfile. The file being loaded
    and the error message are shown below. This is usually caused by
    a syntax error.
  
    Path: <provider config: libvirt>
    Line number: 114
    Message: NoMethodError: undefined method `content' for nil:NilClass
  
              pool_path = Nokogiri::XML(%x(virsh pool-dumpxml default)).at_xpath('/pool/target/path').content
                                                                                                     ^^^^^^^^
  stderr_lines: <omitted>
  stdout: ''
  stdout_lines: <omitted>

PLAY RECAP *************************************************************************************************************
localhost                  : ok=1    changed=0    unreachable=0    failed=1    skipped=0    rescued=0    ignored=0  

I have no default pool on my system:

$ virsh pool-list
 Name   State   Autostart
---------------------------

@votdev
Copy link
Member

votdev commented Nov 11, 2024

I would suggest that you add the line # nvme_size: 10G to each predefined cluster node in settings.yaml, so that everyone can see at first glance that there is an optional setting here.

@albinsun
Copy link
Contributor

When running the setup script i got this error:
...

Same error as @votdev
@tserong do you have any addtional settings for vagrant to work with virsh?
image

@albinsun
Copy link
Contributor

By the way I think we can have a more general naming cuz the created qcow2 file can also be used as other disk types, not only NVMe

@tserong
Copy link
Contributor Author

tserong commented Nov 12, 2024

OK, that's weird. I don't think I did anything special to be able to see the default pool... I am in the libvirt group if that makes a difference...?

tserong@ra:~> virsh pool-list
 Name      State    Autostart
-------------------------------
 default   active   no
 images    active   yes

tserong@ra:~> id 
uid=1000(tserong) gid=100(users) groups=100(users),108(libvirt),466(docker)

@tserong
Copy link
Contributor Author

tserong commented Nov 12, 2024

Oh, wait a minute, my ~/.bashrc includes this:

export LIBVIRT_DEFAULT_URI=qemu:///system

I'm going to have to experiment a bit more here. Out of curiosity @votdev @albinsun, what do you see if you run virsh uri (I'm guessing it's qemu:///session). What about if you run LIBVIRT_DEFAULT_URI=qemu:///system virsh pool-list? Does that get a pool list? And, does that command require any additional authentication?

@tserong
Copy link
Contributor Author

tserong commented Nov 12, 2024

Actually, if you don't mind :-) please try the updated version of this PR which explicitly specifies qemu:///system as the virsh connection URI. This should do the trick, I hope. I tested by manually unsetting my LIBVIRT_DEFAULT_URI environment variable, the when I ran the setup scripts I saw the same failure you both saw, then I applied this latest change and it ran through OK.

@votdev
Copy link
Member

votdev commented Nov 12, 2024

Oh, wait a minute, my ~/.bashrc includes this:

export LIBVIRT_DEFAULT_URI=qemu:///system

I'm going to have to experiment a bit more here. Out of curiosity @votdev @albinsun, what do you see if you run virsh uri (I'm guessing it's qemu:///session). What about if you run LIBVIRT_DEFAULT_URI=qemu:///system virsh pool-list? Does that get a pool list? And, does that command require any additional authentication?

$ virsh uri
qemu:///session
$ LIBVIRT_DEFAULT_URI=qemu:///system virsh pool-list
 Name      State    Autostart
-------------------------------
 default   active   yes

@votdev
Copy link
Member

votdev commented Nov 12, 2024

I'm sorry, it fails again.

TASK [boot Harvester Node 0] **************************************************************************************************
fatal: [localhost]: FAILED! => changed=true 
  cmd: |-
    vagrant up harvester-node-0
  delta: '0:00:00.606143'
  end: '2024-11-12 09:07:36.016111'
  msg: non-zero return code
  rc: 1
  start: '2024-11-12 09:07:35.409968'
  stderr: |-
    ==> harvester-node-0: Trigger run failed
    ==> harvester-node-0: A script exited with an unacceptable exit code 2.
    A script exited with an unacceptable exit code 2.
  stderr_lines: <omitted>
  stdout: |-
    Bringing machine 'harvester-node-0' up with 'libvirt' provider...
    ==> harvester-node-0: Running action triggers before up ...
    ==> harvester-node-0: Running trigger...
    ==> harvester-node-0: Creating volume vagrant-pxe-harvester_harvester-node-0-nvme0.qcow2
        harvester-node-0: Running local: Inline script
        harvester-node-0: sh -c "              export LIBVIRT_DEFAULT_URI=qemu:///system               virsh vol-info --pool=default vagrant-pxe-harvester_harvester-node-0-nvme0.qcow2 >/dev/null 2>&1 ||               virsh vol-create-as default vagrant-pxe-harvester_harvester-node-0-nvme0.qcow2                     --capacity 10G                     --format qcow2 --print-xml |               sed '/<target>/a <permissions><owner>0</owner><group>0</group></permissions>' |               virsh vol-create default /dev/stdin 2>&1"
  stdout_lines: <omitted>

PLAY RECAP ********************************************************************************************************************
localhost                  : ok=10   changed=3    unreachable=0    failed=1    skipped=0    rescued=0    ignored=0   

@tserong
Copy link
Contributor Author

tserong commented Nov 12, 2024

I'm sorry, it fails again.

All good, at least we got slightly further :-) but it's annoying that there's not more detail in the output as to what's failing now :-/ I'm not sure offhand how to further instrument that trigger script... I don't suppose you could try running this nasty thing directly from a shell and see if it prints anything interesting...?

export LIBVIRT_DEFAULT_URI=qemu:///system ; virsh vol-info --pool=default vagrant-pxe-harvester_harvester-node-0-nvme0.qcow2 >/dev/null 2>&1 || virsh vol-create-as default vagrant-pxe-harvester_harvester-node-0-nvme0.qcow2 --capacity 10G --format qcow2 --print-xml |sed '/<target>/a <permissions><owner>0</owner><group>0</group></permissions>' | virsh vol-create default /dev/stdin 2>&1

@votdev
Copy link
Member

votdev commented Nov 12, 2024

I'm sorry, it fails again.

All good, at least we got slightly further :-) but it's annoying that there's not more detail in the output as to what's failing now :-/ I'm not sure offhand how to further instrument that trigger script... I don't suppose you could try running this nasty thing directly from a shell and see if it prints anything interesting...?

export LIBVIRT_DEFAULT_URI=qemu:///system ; virsh vol-info --pool=default vagrant-pxe-harvester_harvester-node-0-nvme0.qcow2 >/dev/null 2>&1 || virsh vol-create-as default vagrant-pxe-harvester_harvester-node-0-nvme0.qcow2 --capacity 10G --format qcow2 --print-xml |sed '/<target>/a <permissions><owner>0</owner><group>0</group></permissions>' | virsh vol-create default /dev/stdin 2>&1

This command succeeds when running in CLI. Will try how it behaves when there are no nodes deployed.

@albinsun
Copy link
Contributor

albinsun commented Dec 2, 2024

Latest code successfully create nvme0.qcow2 but seems not in hardware list yet.
Does it need to be manually Add Hardware -> Storage?
image

Tried to manually add disk but fail with
image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants