-
Notifications
You must be signed in to change notification settings - Fork 68
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fuse filesystem returns EIO on close #135
Comments
If you see a call to Normally, whenever the FUSE file system in bb_worker returns Can you check the worker logs, the bb-browser page, etc. to see if there's anything useful in there? |
Worker logs have nothing useful.
bb-browser page (http://localhost:7984/fuse/blobs/sha256/historical_execute_response/352d5bd473d8692eb27f7d6b2599bed46957885b85c114a12e668e686b40fff5-1067/) has nothing useful either. I'll see what else I can find. |
I captured an ftrace of the failure and I can see the syscalls, but don't see where it goes wrong. I see an |
Probably a good idea to patch up bb_worker to set the |
Thanks for the ideas! I'm not seeing the difference between a good and bad run. Bad one:
Working one:
The read at the end of the bad run is where glog is reading the shared libraries to generate a backtrace. (I'm happy to share the full log if that would be helpful). |
A random guess, but could you maybe try a workaround like this? diff --git a/pkg/filesystem/virtual/fuse_handle_allocator.go b/pkg/filesystem/virtual/fuse_handle_allocator.go
index 0540488a..69be9c17 100644
--- a/pkg/filesystem/virtual/fuse_handle_allocator.go
+++ b/pkg/filesystem/virtual/fuse_handle_allocator.go
@@ -300,7 +300,7 @@ func (l *fuseStatefulNativeLeaf) Unlink() {
func (l *fuseStatefulNativeLeaf) injectAttributes(attributes *Attributes) {
attributes.SetInodeNumber(l.inodeNumber)
- attributes.SetLinkCount(l.linkCount.Load())
+ attributes.SetLinkCount(StatelessLeafLinkCount)
// The change ID should normally also be affected by the link
// count. We don't bother overriding the change ID here, as FUSE
// does not depend on it. I know that Linux has some logic to 'kill' file descriptors if the link count of a file reaches zero. For local file systems this is perfectly fine, but for FUSE/NFS/... this logic might backfire. |
I tried this for Austin and unfortunately it didn't seem to have any effect on either the behavior or the logs. I'm still trying to see if I can use kprobe to find where exactly the I/O error on flush is being generated. I did manage to confirm that it's happening on flush, as calling |
I've got a unit test (https://github.com/frc971/971-Robot-Code/blob/master/aos/starter/subprocess_reliable_test.cc in
SubprocessTest.CanSlowlyStopGracefully
) which works locally, but fails on buildbarn with the runner setup to use fuse on Linux (Debian Bookworm). When run understrace
, it reports thatclose()
fails when it never fails when run underlinux_sandbox
.Adding the
write
(LOG(INFO)
) reduces the probability of failure.I'll keep debugging, but any advice on where to start looking would be wonderful. I can work on creating a smaller reproducer if that would be helpful.
The text was updated successfully, but these errors were encountered: