-
Notifications
You must be signed in to change notification settings - Fork 29
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Tests fail on NFS-mounted temp drive #60
Comments
I'm not surprised that there are issues with locking on NFS. IIRC the normal |
Yes, although my understanding is the kernel version I'm using is supposed to emulate local locking behavior using fcntl(2) byte range locks when flock() is called, at least for nfs4. I wrote some test programs that seem to confirm that flock() (via perl) does work, at least for simple cases (one process appending to a file and grabbing an exclusive lock will block a second process trying to grab an exclusive lock on the same file). I was able to get this work going back to the old style local-only flock locking behavior (I.E. local_lock=flock, which also requires downgrading to nfs3). I'm still kind of baffled, I'll have to read up on how locks work with nfs4 sometime. 🤷🏻♀️ For what I'm doing, I don't need network-aware flock(), so I guess it'll work. |
Hmm, I really don't know what's going on here. One of the child processes appears to either have not run or to not have exited after 10 seconds. So there's a couple possibilities:
All of which is to say I'm confused. |
The processes are just taking a long time. I was seeing sometimes 8, sometimes 9 processes running right. The locking is working fine on nfs4...sort of. The locks are properly excluding concurrent access. But they are SLOW. I also validated all the processes start, but it takes a long time sometimes to do the write to the file. If I bump the 10s alarm to 30s in t/file-locked.t, it runs just fine for me (albeit slowly). 20s was usually good enough, but every once in a while caused an issue. I am not sure why it is so slow - this is a NAS connected via 10G to the server, in the same rack, so there isn't much latency or the like, and - but something makes these locks slow (I'll dig into that sometime, it seems slower than it should be - I wonder how the server manages multiple lock requests). Regardless, the speed issue isn't the module's fault. But in the meantime, maybe just bump the alarm time up? I also tried to drop the number of simultaneous processes down, but that didn't help (I.E. 2 children instead of 10 children still needed the 30s alarm). |
Maybe the locking emulation that NFS provides is slow? And yes, we could increase the alarm time a bit. |
I'm seeing an issue on a NFS-mounted temp directory with t/file-locked.t failing. It does not fail with local disks. I wouldn't be surprised if my locking configuration isn't quite right on this host (nfs4, Linux kernel 4.4.0).
Interesting output (I set the tmp directory manually to an NFS-mounted subdir under my home directory):
The text was updated successfully, but these errors were encountered: