Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

revert changes on fstype from mount #2049

Draft
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

dnegreira
Copy link

Revert the changes made on commit 0cc4fdf.
A mount point might not be associated with a device, such as a bind mount, and the property will return None in those cases and fail the installer.

Resolves: LP#2061732

@dnegreira
Copy link
Author

Pinging @dbungert since I am reverting their change, looking for any comment/improvements that are necessary.

@ogayot ogayot requested a review from dbungert August 8, 2024 13:58
Copy link
Collaborator

@dbungert dbungert left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR!

Some requests:

  1. Let's improve the commit message a bit - I'm a fan of the style where we explain in at most 50 characters in the first line the essential problem being solved - saying "revert fstype changes" doesn't quite do that, what problem are we fixing? Reverts are fine - talking about the previous commit by hash and saying "revert" are both perfectly reasonable but with 50 characters to work with on the summary of the commit, you probably don't have room, so save that for the body of the commit please.

  2. We seem to need more tests, preferably a unit test, in this area to protect against future regression. Have a look please at subiquity/models/tests/test_filesystem.py and see if you can make something reasonable work here, if this seems like 10x more work than you're expecting to write a unit test let's discuss.

  3. This change breaks ZFS installs as a ZFS object is a Mountlike but has no device, so we need a solution that doesn't harm ZFS. I think the solution is to expand the fstype getter to also include a setter, and the improved getter should handle the device==None case, which means that should_add_swapfile() can be left unchanged.

2024-08-08 17:43:48,653 ERROR root:38 finish: subiquity/Install/install/curtin_install: FAIL: 'ZFS' object has no attribute 'device'
2024-08-08 17:43:48,653 DEBUG subiquity.common.errorreport:398 generating crash report
2024-08-08 17:43:48,655 INFO subiquity.common.errorreport:424 saving crash report 'install failed crashed with AttributeError' to /
var/crash/1723139028.653917551.install_fail.crash
2024-08-08 17:43:48,655 ERROR root:38 finish: subiquity/Install/install: FAIL: 'ZFS' object has no attribute 'device'
2024-08-08 17:43:48,655 INFO root:38 start: subiquity/ErrorReporter/1723139028.653917551.install_fail/add_info:
2024-08-08 17:43:48,656 ERROR subiquity.server.server:498 top level error
Traceback (most recent call last):
  File "/snap/subiquity/x1/lib/python3.10/site-packages/subiquity/server/controllers/shutdown.py", line 72, in _wait_install
    await self.app.controllers.Install.install_task
  File "/snap/subiquity/x1/lib/python3.10/site-packages/subiquitycore/context.py", line 165, in decorated_async
    return await meth(self, **kw)
  File "/snap/subiquity/x1/lib/python3.10/site-packages/subiquity/server/controllers/install.py", line 604, in install
    await self.curtin_install(context=context, source=for_install_path)
  File "/snap/subiquity/x1/lib/python3.10/site-packages/subiquitycore/context.py", line 165, in decorated_async
    return await meth(self, **kw)
  File "/snap/subiquity/x1/lib/python3.10/site-packages/subiquity/server/controllers/install.py", line 398, in curtin_install
    step_config=self.filesystem_config(
  File "/snap/subiquity/x1/lib/python3.10/site-packages/subiquity/server/controllers/install.py", line 129, in filesystem_config
    cfg = self.model.filesystem.render(mode=mode)
  File "/snap/subiquity/x1/lib/python3.10/site-packages/subiquity/models/filesystem.py", line 2010, in render
    elif not self.should_add_swapfile():
  File "/snap/subiquity/x1/lib/python3.10/site-packages/subiquity/models/filesystem.py", line 2318, in should_add_swapfile
    if not can_use_swapfile("/", mount.device.fstype):
AttributeError: 'ZFS' object has no attribute 'device'

Copy link
Collaborator

@dbungert dbungert left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see previous comment

@dnegreira dnegreira marked this pull request as draft August 9, 2024 14:29
@dnegreira
Copy link
Author

Hi @dbungert, thanks for your input. I appreciate it.

I will set the draft as WIP while I work through the changes you asked for.

Can you point me to the user-data file, or input file that is passed to subiquity, so that I can test locally and ensure that I won´t break ZFS?

@dbungert
Copy link
Collaborator

dbungert commented Aug 9, 2024

I tested ZFS with the following cloud-config:

#cloud-config

autoinstall:
  version: 1
  source:
    id: ubuntu-server-minimal
  storage:
    layout:
      name: zfs
  identity:
    realname: ''
    hostname: ubuntu
    username: ubuntu
    password: '$6$wdAcoXrU039hKYPd$508Qvbe7ObUnxoj15DRCkzC3qO7edjH0VV7BPNRDYK4QR8ofJaEEF2heacn0QgD.f8pO8SNp83XNdWG6tocBM1'

@dnegreira dnegreira force-pushed the lp2061732 branch 2 times, most recently from 665acd3 to d3b2c98 Compare August 12, 2024 14:50
@dbungert
Copy link
Collaborator

That looks about right.
Pre-commit checks failed - have a look at https://github.com/canonical/subiquity/blob/main/CONTRIBUTING.md#pull-requests for some details there.

@dnegreira
Copy link
Author

Hey @dbungert that was fast, thanks, was going to ask for some feedback :)

I will fix the lining stuff on a later commit, and I will probably have to add another unit test for ZFS.

I am still struggling to fix the issue, though.
The unit test is passing, which for me, it seems that the fstype parameter is being generated/passed correctly, but the error still happens in curtin:

File "/snap/subiquity/5495/lib/python3.10/site-packages/curtin/commands/block_meta.py", line 1251, in mount_data
             raise ValueError(
         ValueError: mount entry without 'device' missing: ['fstype']. ({'path': '/tmp', 'options': 'mode=1777,nosuid,nodev', 'spec': 'none', 'id': 'tmpfs', 'type': 'mount'})

I'm not sure if it's something we should fix in curtin at this point, or what I might be missing in subiquity, I will continue investigating, but happy to hear about any pointers to where I should look at, specially how things are passed around to/from curtin.

@dbungert
Copy link
Collaborator

Do you have a longer form autoinstall that exhibits the problem?

@dnegreira
Copy link
Author

Do you have a longer form autoinstall that exhibits the problem?

#cloud-config
autoinstall:
  version: 1
  identity:
    hostname: ubuntu-server
    password: "$6$exDY1mhS4KUYCE/2$zmn9ToZwTKLhCw.b4/b.ZRTIZM30JZ4QrOQ2aOXJ8yk96xpcCof0kxKwuX1kqLG/ygbJ1f8wxED22bTL4F46P0"
    username: ubuntu
  storage:
    config:
    - type: disk
      id: disk-vda
      path: /dev/vda
      ptable: gpt
      grub_device: true
      preserve: false
    - type: partition
      id: vda-part1
      type: partition
      number: 1
      size: 1M
      device: disk-vda
      flag: bios_grub
      partition_type: EF02
    - type: partition
      id: vda-part2
      flag: linux
      number: 2
      size: 35G
      device: disk-vda
    - fstype: ext4
      id: format-partition-vda2
      type: format
      volume: vda-part2
    - device: format-partition-vda2
      id: mount-partition-vda2
      path: /
      type: mount
    - id: tmpfs1
      type: mount
      spec: "none"
      path: "/tmp"
      size: 4194304
      fstype: "tmpfs"

^ this is what I am using

Copy link
Collaborator

@dbungert dbungert left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @dbungert that was fast, thanks, was going to ask for some feedback :)

Welcome!

I am still struggling to fix the issue, though. The unit test is passing, which for me, it seems that the fstype parameter is being generated/passed correctly, but the error still happens in curtin:

         ValueError: mount entry without 'device' missing: ['fstype']. ({'path': '/tmp', 'options': 'mode=1777,nosuid,nodev', 'spec': 'none', 'id': 'tmpfs', 'type': 'mount'})

I'm not sure if it's something we should fix in curtin at this point, or what I might be missing in subiquity, I will continue investigating, but happy to hear about any pointers to where I should look at, specially how things are passed around to/from curtin.

That would be a Subiquity side fix - https://curtin.readthedocs.io/en/latest/topics/storage.html#mount-command is pretty clear here that on a mount action, if we don't have a device we do need fstype, and this action was emitted with neither device nor fstype. So Subiquity needs a fix for that, as the mount action supplied is out of spec.

I have a suggested addition to the test that illustrates the problem without a curtin invocation - the serialization code seems to not like the property I think? If that's true, maybe it makes more sense to abandon fstype as a property and fix the other code that handles the mountlikes to call to a new method that the serialization code is unaware of.

)
rendered_mount = model.all_mounts()
self.assertEqual(rendered_mount[0].fstype, "tmpfs")

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Try adding something like this:

Suggested change
self.assertEqual(model.render()["storage"]["config"][0]["fstype"], "tmpfs")

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @dbungert

Try adding something like this:

Suggested change
self.assertEqual(model.render()["storage"]["config"][0]["fstype"], "tmpfs")

With that code I could reproduce the issue, enabling me to debug further.

I have a suggested addition to the test that illustrates the problem without a curtin invocation - the serialization code seems to not like the property I think?

Indeed, the serialization code skips if the attribute starts with _, which is true in this specific case, and the attribute is not added to the serialization list.

If that's true, maybe it makes more sense to abandon fstype as a property [...]
I'm not sure I fully understand what you mean by this.
If we abandoned fstype as property, then should we revert back the change to fstype: Optional[str] = None ?

and fix the other code that handles the mountlikes to call to a new method that the serialization code is unaware of.

  1. IIUC, I believe you mean this code ? And append the fstype property to r ?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIUC, I believe you mean this code ? And append the fstype property to r ?

As the serialization code doesn't look at underscore attributes, or the property apparently, I'm suggesting that we revert fstype from a property like you had originally suggested - but fix the code that breaks on the ZFS codepath.

Copy link
Author

@dnegreira dnegreira Sep 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Try adding something like this:

This is now added on d4e2cf5

As for the ZFS unit test case, I'm not sure how to proceed; if you can point me to a similar test already done or some pointers, I would be grateful.

@dnegreira
Copy link
Author

Hi @dbungert,

I am on vacation, so my replies will get slower during the next two weeks.
I tried a separate testing branch by reverting the commits on 0cc4fdf and was unable to reproduce the issue you mentioned with ZFS on #2049 (review) with #2049 (comment) cloud-config.

Unit tests seem fine, with the new unit test as well, so LMK how would you like to proceed, if you have a reproducer I can try to set it up and fix it, or if you want me to just send the revert with the new unit test. I am unsure what would be a way to write a unit test to test the ZFS issue you mentioned, but if we get a reproducer, I am happy to have a stab at it.

@dbungert
Copy link
Collaborator

Using your testing branch and scripts/quick-test-this-branch.sh and my supplied cloud-config, the following install ISOs fail to install with the listed tracebacks I show above:

  • oracular daily build 20240812 of live-server
  • noble daily build 20240808 of live-server
  • jammy release build 24.04.4 of live-server

Further, this is reproducible in dryrun - just fix the path to the cloud-config

#!/bin/bash

set -x

rm -fr .subiquity

export PYTHONPATH=$PWD:$PWD/probert:$PWD/curtin
export SUBIQUITY_DEBUG="has-drivers"
export SUBIQUITY_REPLAY_TIMESCALE=1000
export SNAP_NAME="subiquity"

args=(
	--machine-config examples/machines/simple.json
	--storage-version 2
	--kernel-cmdline="autoinstall"
	--autoinstall ../ai-zfs.yaml
	--source-catalog="examples/sources/install.yaml"
)

python3 -m subiquity --dry-run "${args[@]}"

Hope that helps.

@dnegreira dnegreira force-pushed the lp2061732 branch 3 times, most recently from 6b15510 to 5d602a3 Compare September 5, 2024 14:05
@dnegreira
Copy link
Author

Thanks for that. I could replicate with dry run, and with my latest committed code, unit tests and the dry run do not generate an error.

I have done a simple check to verify if it is ZFS and skip the check.
I'm not sure if this is the right thing to do, but I'm just sending it out for you to review and comment on.

Currently the fstype parameter is not passed when its not associated
with a device.

This fix aims to pass the fstype mount parameter when there is no device
associated with the mount, this is needed when defining bind-mounts for
example. We also add an exception for ZFS, since there is no device
associated with a ZFS mount.

Signed-off-by: David Negreira <[email protected]>
@@ -2318,8 +2315,9 @@ def can_install(self) -> bool:
def should_add_swapfile(self):
mount = self._mount_for_path("/")
if mount is not None:
if not can_use_swapfile("/", mount.fstype):
return False
if mount.type != "zfs":
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I really don't want this bit here. I want this as part of the ZFS class.
So to abstract that we can add something to Mount and ZFS that isn't a property that allows retrieving the fstype - like the property originally, but with a different name (and not a property) so serialization isn't broken, and then ZFS can handle the special case.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants