-
Notifications
You must be signed in to change notification settings - Fork 90
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fsck --reconstruct_alloc segfaults after interrupted run #240
Comments
This is all running on Linux 6.7.2; let me try to build archiso with Linux 6.7.5 and run off that... edit: no meaningful change as far as i can tell; though a run without |
here is running latest git built in debug mode and optimizations disabled and with latest Arch packages and kernel; seems to be more populated with proper debuginfo
And stdout/stderr:
|
...in my case here, i wonder if running reset-counters would help, but i don't think i want to do something stupid that may cause even more damage, so please tell me if that would be a good idea or a bad one or irrelevant i don't mind losing some data if i can at least restore a meaningful amount of it to be honest edit: nevermind, i made a full copy of the filesystem partition on another drive and did |
the segfault seems to be at inserting a key into the journal, which suggests to me the journal itself might have gotten messed up is there a way to just delete the journal entries without committing them or doing anything with them? i don't care much if this messes up some filesystem structure if i can get this thing to run and make it at least mountable; and if anything i made a full copy of the filesystem partition to test on lol |
Issue persists with latest commit 25e84a9 stdout/stderr:
gdb:
|
it appears the issue is that in Adding a small if statement right before the if(keys->gap > keys->nr) //bad hack?
keys->gap = keys->nr; I'll wait for it finish now... |
...Now it looks like it deadlocked on futexes. I think I experienced the same problem before, but didn't bother with it and just reran fsck with different params until it worked, but that doesn't seem feasible here, as I need to make it finish with Interrupting and rerunning it seems to just give me the same effect. Here's gdb backtraces:
May be related to #118 ? edit: it appears
I suppose it was deadlocking while checking lrus and is now very slowly munching through checking extents to backpointers lol |
It's another deadlock. the gdb:
edit: i wrote the backup (made before reset-counters) onto the raw nvme partition (removing the LUKS layer), and it now prints the next 3 lines (as in the comment above) too, but still deadlocks. edit: i copied the recompiled tool to a fat32 partition, did a reboot, and without mounting any bcachefs, got the tool from that partition and tries fsck --reconstruct_alloc -fy again, and this time i didn't get the next 3 lines. Those lines may just be random luck i guess lol |
IT MOUNTED! after trying fsck with and without reconstruct_alloc a few times, and running into the deadlock every single time, i decided to try and just mount it, and it worked. ...god, now to clean up all the mess i've done lol |
Currently in a bit of a pickle here. I ran
bcachefs fsck --reconstruct_alloc -pf /dev/myroot
out of boredom, saw it printing a load of messages quickly, assumed that's normal part of operation of this mode, and decided to ctrl+c it and run it again with -r.Now bcachefs tool segfaults when I try to do it again and it won't mount. Both latest stable in Arch Linux repos (3:1.6.2-1) and latest master commit
6ff5313cbe0432
segfault.Writing from my phone and operating from Arch Linux installer ISO environment at the second, as I don't have another machine to do stuff with.
Here's a run with gdb, with
thr apply all bt
Or building in debug mode and running the same thing (random output messages from the tool itself are missing, I guess gdb logging doesn't capture stderr):
Here's output (stdout+stderr) with just -p (it doesn't segfault but may still be of note)
I really hope I can get this data un-eaten lol
The text was updated successfully, but these errors were encountered: