mergerfs pool mount fails. Best way to monitor and self-recovery via script? #1098
-
I'm running a tiered cache setup (ZFS cache pool, slow disk on second mergerfs pool). The primary pool is the NFS mount point of linux hosts that copy a lot of data via rsync. I have found that sometimes FUSE mount fails causing NFS to fail as well; it doesn't recover from this failure. On the host running mergerfs pools; this is what I see
Here's /etc/fstab (edit: I just realized i misplaced "msplfs" in the wrong mount as I write this post. However my question about monitoring and self-healing is still something I wish to know.
The ZFS pool and mount point is fine.
How I recover from this is remounting and restarting NFS:
Info
This tells me /mnt/cached mergerfs cache pool failed. I am not super sure just yet to best troubleshoot the root-cause of the mergerfs crash but I would want to figure out a recommended way to have a monitoring and self-healing script as it seems that mergerfs is not catching itself failing. |
Beta Was this translation helpful? Give feedback.
Replies: 7 comments 14 replies
-
Why? It shouldn't fail and shouldn't need to be repaired. There is nothing special about mergerfs vs any other filesystem. |
Beta Was this translation helpful? Give feedback.
-
There are no "errors" to be reported. Things mount or not. If there are config errors it gets reported like any other. If there is a bug and it crashes there wouldn't be something logged except maybe in kernel messages depending on the system setup. |
Beta Was this translation helpful? Give feedback.
-
The odd things is that it keeps crashing.
edit 11/21, some kernel error show in logs. But this could be by FUSE mount no longer working?
|
Beta Was this translation helpful? Give feedback.
-
Just to rule out an issue with ZFS 2.1.3 (stable on ubuntu 10.04) I have upgraded it to ZFS 2.1.6. Also upgraded nfs (2.6.1-1ubuntu1.2) from It's odd that out of my two mergerfs pools the one that keeps crashing is the one which has ZFS+mergerfs_slow_disks - but the later never crashes. BUT I am not sure this is the root cause because if the primary branch which is ZFS (/cache) were to fail, I think mergerfs would just handle the requests towards the secondary branch (/mnt/slow-storage) which is the other mergerfs pool that's been stable. The system hang continue dmesg
stack
These crashes seem to be occurring more frequenty when NFS server is configured to operate in NFSv4 mode only. Here was the setup I had done for v4.2 force: https://github.com/TheLinuxGuy/free-unraid/blob/main/miscelaneous_sysadmin.md#nfs |
Beta Was this translation helpful? Give feedback.
-
I'm going to try |
Beta Was this translation helpful? Give feedback.
-
If it's crashing then strace is pretty useless. Need a stack trace from gdb. gdb path/to/mergerfs run -f -o options branches mountpoint when it crashes thread apply all bt |
Beta Was this translation helpful? Give feedback.
-
Even with XFS + mdadm the crashes / instability on NFS continue. Not sure if this says anything new or different from the past one but here it goes.
|
Beta Was this translation helpful? Give feedback.
If it's crashing then strace is pretty useless. Need a stack trace from gdb.
gdb path/to/mergerfs
run -f -o options branches mountpoint
when it crashes
thread apply all bt