Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Usroverlay causes (non-fatal) reference counting fail #3303

Open
antheas opened this issue Sep 14, 2024 · 8 comments
Open

Usroverlay causes (non-fatal) reference counting fail #3303

antheas opened this issue Sep 14, 2024 · 8 comments
Labels
bug needinfo This issue needs more information from the reporter triaged This issue has been evaluated and is valid

Comments

@antheas
Copy link

antheas commented Sep 14, 2024

Probably the cousin of issue secureblue/secureblue#369

Running sudo rpm-ostree usroverlay --hotfix causes an assertion error to be emitted for invalid reference counting. But the command completes successfully.

I think there is a chance the rpm-ostree version here is patched to fix the deployment bug that was found recently, but is not changed other than that.

bazzite@bazzite:~$ sudo rpm-ostree usroverlay --hotfix

(ostree admin unlock:7956): GLib-CRITICAL **: 23:57:45.069: g_atomic_ref_count_dec: assertion 'old_value > 0' failed

(ostree admin unlock:7956): GLib-CRITICAL **: 23:57:45.179: g_atomic_ref_count_dec: assertion 'old_value > 0' failed

(ostree admin unlock:7956): GLib-CRITICAL **: 23:57:45.416: g_atomic_ref_count_dec: assertion 'old_value > 0' failed

(ostree admin unlock:7956): GLib-CRITICAL **: 23:57:45.418: g_atomic_ref_count_dec: assertion 'old_value > 0' failed

(ostree admin unlock:7956): GLib-CRITICAL **: 23:57:45.418: g_atomic_ref_count_dec: assertion 'old_value > 0' failed

(ostree admin unlock:7956): GLib-CRITICAL **: 23:57:45.423: g_atomic_ref_count_dec: assertion 'old_value > 0' failed

(ostree admin unlock:7956): GLib-CRITICAL **: 23:57:45.439: g_atomic_ref_count_dec: assertion 'old_value > 0' failed

(ostree admin unlock:7956): GLib-CRITICAL **: 23:57:45.440: g_atomic_ref_count_dec: assertion 'old_value > 0' failed

(ostree admin unlock:7956): GLib-CRITICAL **: 23:57:45.444: g_atomic_ref_count_dec: assertion 'old_value > 0' failed
Copying /etc changes: 19 modified, 2 removed, 90 added
bootfs is sufficient for calculated new size: 0 bytes
Transaction complete; bootconfig swap: no; bootversion: boot.0.1, deployment count change: 0
Hotfix mode enabled.  A writable overlayfs is now mounted on /usr
for this booted deployment.  A non-hotfixed clone has been created
as the non-default rollback target.

bazzite@bazzite:~$ ostree --version
libostree:
 Version: '2024.7'
 Git: 684652bdaa25ae16a551e5cfef82c8896cca5725
...
bazzite@bazzite:~$ rpm-ostree --version
rpm-ostree:
 Version: '2024.7'
 Git: 7d3dce6a6eaac546d9078d7d42f4d5df28f5f0fd
...
@cgwalters
Copy link
Member

Can you get a stack trace here using env G_DEBUG=fatal-warnings rpm-ostree usroverlay e.g.?

@antheas
Copy link
Author

antheas commented Sep 14, 2024

Hm does not seem to generate one

bazzite@bazzite:~$ sudo G_DEBUG=fatal-warnings rpm-ostree usroverlay --hotfix

(ostree admin unlock:7580): GLib-CRITICAL **: 00:40:26.705: g_atomic_ref_count_dec: assertion 'old_value > 0' failed
Trace/breakpoint trap
bazzite@bazzite:~$ sudo coredumpctl dump /usr/bin/ostree
           PID: 7580 (ostree)
           UID: 0 (root)
           GID: 0 (root)
        Signal: 5 (TRAP)
     Timestamp: Sun 2024-09-15 00:40:26 EEST (3min 6s ago)
  Command Line: ostree admin unlock --hotfix
    Executable: /usr/bin/ostree
 Control Group: /user.slice/user-1000.slice/session-4.scope
          Unit: session-4.scope
         Slice: user-1000.slice
       Session: 4
     Owner UID: 1000 (bazzite)
       Boot ID: 133c87fa3e2649419c6c4e4dbaccc514
    Machine ID: 26b347316e3f40818770598196fe5a8f
      Hostname: bazzite
       Storage: none
       Message: Process 7580 (ostree) of user 0 terminated abnormally without generating a coredump.
Coredump entry has no core attached (neither internally in the journal nor externally on disk).

@cgwalters
Copy link
Member

This with that alternative hardened malloc? If so the fact we're seeing SIGTRAP here makes me think somehow it's hooking things earlier than the g_critical()? Or hmm is it changing what abort() does?

@antheas
Copy link
Author

antheas commented Sep 15, 2024

No, this is the stock version on bazzite. There is a chance we are patching rpm-ostree to fix the deployment error that is on next release but I don't think this would affect it.

Malloc should be the same as the one in kinoite

Trap was caused after adding the environment variable

These errors happen only on usroverlay, normal updates are fine

@antheas
Copy link
Author

antheas commented Sep 15, 2024

(gdb) bt 15
#0  0x00007fb49c72ec28 in g_logv () from /lib64/libglib-2.0.so.0
#1  0x00007fb49c72eea3 in g_log () from /lib64/libglib-2.0.so.0
#2  0x00007fb49c73f5da in g_atomic_ref_count_dec () from /lib64/libglib-2.0.so.0
#3  0x00007fb49c774853 in g_variant_unref () from /lib64/libglib-2.0.so.0
#4  0x00007fb49ca39db6 in checkout_tree_at_recurse () from /lib64/libostree-1.so.1
#5  0x00007fb49ca3aa10 in checkout_tree_at_recurse () from /lib64/libostree-1.so.1
#6  0x00007fb49ca3b3e1 in checkout_tree_at () from /lib64/libostree-1.so.1
#7  0x00007fb49ca3b70f in ostree_repo_checkout_at () from /lib64/libostree-1.so.1
#8  0x00007fb49cab1840 in prepare_deployment_etc.isra () from /lib64/libostree-1.so.1
#9  0x00007fb49caa97b7 in sysroot_initialize_deployment.constprop ()
   from /lib64/libostree-1.so.1
#10 0x00007fb49ca75465 in ostree_sysroot_deploy_tree_with_options ()
   from /lib64/libostree-1.so.1
#11 0x00007fb49ca75532 in ostree_sysroot_deploy_tree () from /lib64/libostree-1.so.1
#12 0x00007fb49ca6c249 in ostree_sysroot_deployment_unlock ()
   from /lib64/libostree-1.so.1
#13 0x0000564770576bd2 in ot_admin_builtin_unlock (argc=<optimized out>, 
    argv=<optimized out>, invocation=<optimized out>, cancellable=0x0, 
    error=0x7ffd1e77f9d0) at src/ostree/ot-admin-builtin-unlock.c:73
#14 0x00005647705683c8 in ostree_builtin_admin (argc=<optimized out>, 
    argv=<optimized out>, invocation=0x7ffd1e77f9d8, cancellable=0x0, 
    error=0x7ffd1e77f9d0) at src/ostree/ot-builtin-admin.c:178
(More stack frames follow...)

ran gdb on top. debuginfo does not work so I cannot see the symbols.

@antheas
Copy link
Author

antheas commented Sep 15, 2024

Figured it out. Happens during checking out /etc. Has 2 checkout_tree_at_recurse

They have these args:

(gdb) info args
self = 0x55f9912ccb20
options = 0x7ffd811fab10
state = 0x7ffd811faa80
destination_parent_fd = 12
destination_name = <optimized out>
dirtree_checksum = 0x7ffd811fa7e0 "3ed2a6f136bf47dbc493a1eb7cacb97ec5b9805d6192abc0d97e0eb65a5ed5cb"
dirmeta_checksum = <optimized out>
cancellable = <optimized out>
error = <optimized out>
(gdb) info args
self = 0x556f67fa3b20
options = 0x7ffd77afeb90
state = 0x7ffd77afeb00
destination_parent_fd = 13
destination_name = 0x7f95af8f5cde "etc"
dirtree_checksum = 0x556f681a29b0 "996d76ae160ea0e6512f4af45cbe76ac85c9594d99547b03356a1826a6b1853d"
dirmeta_checksum = 0x556f681a2a00 "c1fa8905c02e0199c4f6f215923914173d7030e336c0922fb2f0d800bf7a9b40"
cancellable = 0x0
error = 0x7ffd77aff3c0
ostree ls...
d00755 0 0      0 3ed2a6f136bf47dbc493a1eb7cacb97ec5b9805d6192abc0d97e0eb65a5ed5cb ec90a49ea284b4c39846e1f440091f539709d01dc7986e6f61a090afb2a8c6ac /usr/etc/fonts
cannot find 996d76ae160ea0e6512f4af45cbe76ac85c9594d99547b03356a1826a6b1853d

EDIT: Seems like 996d76ae160ea0e6512f4af45cbe76ac85c9594d99547b03356a1826a6b1853d is common in all of them. Could the ref count be caused by this missing?

@cgwalters
Copy link
Member

cgwalters commented Sep 18, 2024

Unfortunately there's a ton of g_variant_unref() invocations there and a whole lot of code so it'd be really helpful to narrow this down farther.

@antheas any chance you can run with a build of ostree with debuginfo? Even better, build with CFLAGS="-ggdb -O0".

@cgwalters cgwalters added bug triaged This issue has been evaluated and is valid needinfo This issue needs more information from the reporter labels Sep 18, 2024
@antheas
Copy link
Author

antheas commented Oct 27, 2024

I tried to do that. But unfortunately compiling ostree on a system without dev packages is not particularly easy...

I have a command you can use to reproduce it though:

> sudo bootc switch ghcr.io/ublue-os/bazzite:unstable-41.20241027
layers already present: 58; layers needed: 14 (1.0 GB)
Fetched layers: 974.69 MiB in 2 minutes (6.86 MiB/s)

(process:9057): GLib-CRITICAL **: 21:46:13.583: g_atomic_ref_count_dec: assertion 'old_value > 0' failed

(process:9057): GLib-CRITICAL **: 21:46:13.945: g_atomic_ref_count_dec: assertion 'old_value > 0' failed

(process:9057): GLib-CRITICAL **: 21:46:13.947: g_atomic_ref_count_dec: assertion 'old_value > 0' failed

(process:9057): GLib-CRITICAL **: 21:46:13.948: g_atomic_ref_count_dec: assertion 'old_value > 0' failed

(process:9057): GLib-CRITICAL **: 21:46:13.952: g_atomic_ref_count_dec: assertion 'old_value > 0' failed

(process:9057): GLib-CRITICAL **: 21:46:13.967: g_atomic_ref_count_dec: assertion 'old_value > 0' failed

(process:9057): GLib-CRITICAL **: 21:46:13.969: g_atomic_ref_count_dec: assertion 'old_value > 0' failed
Pruned images: 0 (layers: 0, objsize: 1.3 GB)
Queued for next boot: ghcr.io/ublue-os/bazzite:unstable-41.20241027
  Version: unstable-41.20241027
  Digest: sha256:7260cc16cff9624393c751bf25749f7a6f5dc34ca8a56a82166fa95918626a31

I am pretty sure you don't need to boot that image, just pulling it should work

Bootc looks a lot nicer to work with, really nice in F41.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug needinfo This issue needs more information from the reporter triaged This issue has been evaluated and is valid
Projects
None yet
Development

No branches or pull requests

2 participants