-
Notifications
You must be signed in to change notification settings - Fork 20
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Handle filesystem latency when creating RUN_COMPLETE
file
#803
Handle filesystem latency when creating RUN_COMPLETE
file
#803
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I really don't like throwing an exception here.
I think we should try harder to force a cache update. Instead of just doing an exists
, you might try tossing in a path.parent.iterdir()
. Apparently opening the parent directory should cause the directory cache to refresh.
If that doesn't seem to be working, drop a symlink to the temp file location where you were going to move the original file, and walk away instead of throwing an exception.
Also, CREATION_ERROR is explicitly for test creation problems. I'd create a new 'WARNING' label for cases like this.
File system quirks mean that the RUN_COMPLETE file may not yet exist on the file system after it is created. Since the file is used by the same process relatively soon after creation, this can occassionally cause a FileNotFoundError. It was therefore necessary to wait for the file to be synced to the file system, and to impose a timeout. The previous timeout value was too short and resulted in TimeoutErrors relatively frequently. This commit increases the timeout value from 2 seconds to 30 seconds, per the suggestion of Paul Ferrell.
The idiosyncrocies of NFS means that when the RUN_COMPLETE file is written, it may not appear on the local file system, which results in a FileNotFoundError when that file is accessed. This commit adds a fallback when the file does not exist on the local file system, in the process reverting the previous solution which involved waiting and eventually raising a timeout. The fallback consists of the following steps: 1. Create the file as normal, and check whether it exists. 2. If it does not exist, attempt to force an NFS cache reset by listing the contents of the parent directory. 3. If the file still does not exist, create a symlink to the location where the file is expected eventually to be.
25f6f34
to
f7b6152
Compare
RUN_COMPLETE
file
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Code review checklist: