Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Device: Don't remove an existing target directory when unmounting a disk device if the original dir hasn't been created by LXD #12700

Open
wants to merge 14 commits into
base: main
Choose a base branch
from

Conversation

gabrielmougard
Copy link
Contributor

@gabrielmougard gabrielmougard commented Jan 5, 2024

closes #12648
closes #12716


Overall description

When a disk device is removed while relying on a host directory and mapped to a target within a container or a VM,
we detect if the target directory has been created by LXD or not in order to not delete the content of a target
directory during the unmount. Additionaly, with the VM case, we cleanly unmount the target path inside the VM.

Container case

Example:

  • Let say we mounted a custom host empty directory (test) on the existing /opt directory of the container,
    when unmounted (lxc device remove ...), the target /opt won't be removed, because we marked it as NOT being created
    by LXD at mount time.

  • If, however, we created a custom empty target directory at mount time:
    lxc config device add u1 test disk source=/home/user/test path=/new_dir,
    the directory new_dir will be created on the target instance and if we decide to unmount test,
    the target /new_dir will be removed because is has been created by LXD

Particular case for VMs

In addition to that, this fix also cleanly unmount the target directory inside the VM through an new LXD-agent API call. Before that, here is what would have happened:

mkdir /tmp/empty-dir
lxc launch ubuntu:jammy v1 --vm
lxc config device add v1 empty-dir disk source=/tmp/empty-dir path=/opt
lxc config device remove v1 empty-dir

lxc shell v1 -- stat /opt
stat: cannot statx '/opt': Transport endpoint is not connected

This happens because the mounted device and its associated char device were removed using QEMU's QMP without unmounting the target.

Now, we inspect for the mounts and the over-mounts if any on the VM, and unmount them in the right order.

Benchmark

Each time, there are 10 runs on each benchmark case (10 starts, 10 stops)

master branch (CONTAINER):

Startup time with: Stop time (--force) with:

  • 0 disk: 598.2 ms ± 14.3 ms * 0 disk: 883.5 ms ± 37.1 ms
  • 5 disk: 595.0 ms ± 8.2 ms * 5 disk: 892.6 ms ± 23.3 ms
  • 10 disks: 601.0 ms ± 8.5 ms * 10 disks: 900.1 ms ± 22.0 ms
  • 20 disks: 606.9 ms ± 12.7 ms * 20 disks: 886.8 ms ± 23.4 ms
  • 100 disks: 634.3 ms ± 10.1 ms * 100 disks: 934.1 ms ± 16.4 ms

Our fix branch (CONTAINER):

Startup time with: Stop time (--force) with:

  • 0 disk: 632.8 ms ± 21.3 ms * 0 disk: 870.3 ms ± 25.6 ms
  • 5 disk: 624.4 ms ± 15.4 ms * 5 disk: 879.9 ms ± 23.6 ms
  • 10 disks: 626.6 ms ± 17.2 ms * 10 disks: 877.1 ms ± 19.7 ms
  • 20 disks: 641.0 ms ± 23.3 ms * 20 disks: 932.0 ms ± 148.6 ms
  • 100 disks: 660.0 ms ± 18.3 ms * 100 disks: 942.7 ms ± 16.9 ms

@gabrielmougard
Copy link
Contributor Author

Working on the integration tests now.

@github-actions github-actions bot added the Documentation Documentation needs updating label Jan 5, 2024
@gabrielmougard gabrielmougard force-pushed the fix/rm-disk-dev-deletes-empty--dir branch from 9ee6693 to 20ba221 Compare January 5, 2024 11:52
@gabrielmougard gabrielmougard marked this pull request as ready for review January 5, 2024 11:52
@gabrielmougard
Copy link
Contributor Author

Tests should be ready

@gabrielmougard gabrielmougard force-pushed the fix/rm-disk-dev-deletes-empty--dir branch from 20ba221 to 06cfd51 Compare January 8, 2024 14:20
@github-actions github-actions bot added the API Changes to the REST API label Jan 8, 2024
@gabrielmougard
Copy link
Contributor Author

Also, @tomponline, the deviceVolatileSetFunc function, which calls the VolatileSet, updates the DB. I think this is good to persist this information though. Removing an existing target dir like /opt for example (see the original post of the issue author) through an umount seems quite bad.

Or, we can introduce a new device option to say "not to track that kind of changes" so that I can set d.volatileSetPersistDisable = true before my volatile set call and then restore it to false after. This will surely be faster (no DB call) but the user accept a potential data loss on the target instance if he mounted on an existing target dir.

@gabrielmougard gabrielmougard force-pushed the fix/rm-disk-dev-deletes-empty--dir branch 2 times, most recently from 9504543 to db861a7 Compare January 8, 2024 17:08
ru-fu
ru-fu previously approved these changes Jan 9, 2024
Copy link
Contributor

@ru-fu ru-fu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Docs look good now. :)

@gabrielmougard gabrielmougard force-pushed the fix/rm-disk-dev-deletes-empty--dir branch from db861a7 to eedfede Compare January 9, 2024 17:52
@gabrielmougard gabrielmougard marked this pull request as draft January 9, 2024 17:54
@gabrielmougard gabrielmougard force-pushed the fix/rm-disk-dev-deletes-empty--dir branch from eedfede to 8838e7c Compare January 10, 2024 15:48
@gabrielmougard gabrielmougard marked this pull request as ready for review January 10, 2024 15:49
@gabrielmougard
Copy link
Contributor Author

@tomponline I implemented the logic for the VM as well within the lxd-agent. I tested it on my side without an issue.

@tomponline
Copy link
Member

tomponline commented Jan 10, 2024

Thanks @gabrielmougard please can you add a PR description with what has changed so the reviewer can compare the code to the intended design.

Please do include any new API endpoints you've added to the lxd-agent.

Cheers

@tomponline
Copy link
Member

Please can you rebase @gabrielmougard

@gabrielmougard gabrielmougard force-pushed the fix/rm-disk-dev-deletes-empty--dir branch 2 times, most recently from 9140527 to 844ac0f Compare January 19, 2024 14:39
Copy link
Member

@tomponline tomponline left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gabrielmougard when the disk device is hot-plugged its mount inside the guest is triggered by an event in eventsProcess(). Is there a reason you didn't feel that we could unmount the disk in the same fashion and required the API endpoint instead?

@gabrielmougard gabrielmougard force-pushed the fix/rm-disk-dev-deletes-empty--dir branch from 844ac0f to f4c1453 Compare January 26, 2024 11:26
@gabrielmougard gabrielmougard force-pushed the fix/rm-disk-dev-deletes-empty--dir branch 2 times, most recently from 8820872 to 78fec08 Compare January 16, 2025 15:48
@tomponline
Copy link
Member

@gabrielmougard in your tests were you stopping with --force btw?

@gabrielmougard
Copy link
Contributor Author

@gabrielmougard in your tests were you stopping with --force btw?

I don't think so... Which tests are you talking about?

@tomponline
Copy link
Member

I don't think so... Which tests are you talking about?

the stop tests, so the time taken doesnt take into account the guest OS shutdown part.

@gabrielmougard
Copy link
Contributor Author

I don't think so... Which tests are you talking about?

the stop tests, so the time taken doesnt take into account the guest OS shutdown part.

aaah, the benchmark.. No, I'm not using --force

@tomponline
Copy link
Member

aaah, the benchmark.. No, I'm not using --force

are you able to try that so we dont get the guest OS shutdown time in the results?

@gabrielmougard
Copy link
Contributor Author

absolutely. I will update the benchmark taking that into account

@gabrielmougard
Copy link
Contributor Author

@tomponline It should be updated now

Some devices like disk devices with a target path need to be cleanly unmounted.
In the case of a VM instance, we prefer to handle the unmount logic within the agent
and handle the unmounting of the potential overmounts of the guest.

In some cases, we also need to remove a path resource (or not) on the guest side. That is
why we also pass some extra metadata contained in the `Volatile` field, to handle the resource
removal process in the agent and not on the LXD side (this would result in spawning aan SFTP client, which is a wasteful approach)

Signed-off-by: Gabriel Mougard <[email protected]>
…k_dev_name>.last_state.created` for the disk device

Signed-off-by: Gabriel Mougard <[email protected]>
When creating a directory within an instance filesystem, we should be able to track
the directory structure that has been created in order to remove the chain of directories
when removing a disk device associated to an instance.

This function takes an sftp client that, given an absolute path (this path is the desired directory path that a user whishes to create in the instance),
it creates a path encoding like the following:

<existing-path-part>/./<non-existing-path-part>
                   ~~~~
	          SEP_MARK

Using this encoding, when unmounting a device associated to a created directory within an instance, we can remove the chain of created directories, now that we know which have been created and which were there before.
In the end, we remove at the path `<existing-path-part>/<non-existing-path-part>[0]` (for the VM case)

For the container case, we remove directories using the sftp client and since this one doesn't have a recursive remove API for directories,
we need to directories from the deepest level all the way up:

1) rm `<existing-path-part>/<non-existing-path-part>[:L]`
2) rm `<existing-path-part>/<non-existing-path-part>[:L-1]`
3) rm `<existing-path-part>/<non-existing-path-part>[:L-2]`
...
Signed-off-by: Gabriel Mougard <[email protected]>
@gabrielmougard gabrielmougard force-pushed the fix/rm-disk-dev-deletes-empty--dir branch from 78fec08 to 5910823 Compare January 27, 2025 13:12
@gabrielmougard gabrielmougard force-pushed the fix/rm-disk-dev-deletes-empty--dir branch 2 times, most recently from b14b3b1 to 33239e8 Compare January 27, 2025 15:02
…ory when unmounting

When a disk device is removed while relying on a host directory and mapped to a target within a container,
we detect if the target directory has been created by LXD or not in order to not delete the content of a target
directory during the unmount.

Example:

- Let say we mounted a custom host empty directory (`test`) on the existing `/opt` directory of the container,
when unmounted (`lxc device remove ...`), the target `/opt` won't be removed, because we marked it as NOT being created
by LXD at mount time.

- If, however, we created a custom empty target directory at mount time:
`lxc config device add u1 test disk source=/home/user/test path=/new_dir`,
the directory `new_dir` will be created on the target instance and if we decide to unmount `test`,
the target `/new_dir` will be removed because is has been created by LXD

Signed-off-by: Gabriel Mougard <[email protected]>
Signed-off-by: Gabriel Mougard <[email protected]>
@gabrielmougard gabrielmougard force-pushed the fix/rm-disk-dev-deletes-empty--dir branch from 33239e8 to d290c7f Compare January 27, 2025 15:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
API Changes to the REST API Documentation Documentation needs updating
Projects
None yet
4 participants