Skip to content

NFS caching causes file_exists() to return wrong result #458

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
epruesse opened this issue Jun 7, 2024 · 1 comment
Open

NFS caching causes file_exists() to return wrong result #458

epruesse opened this issue Jun 7, 2024 · 1 comment
Labels
feature a feature request or enhancement

Comments

@epruesse
Copy link

epruesse commented Jun 7, 2024

The file_exists() method relies on stat() to determine whether or not a file exists. Since stat() is subject to the NFS attribute cache, this information can easily be a full minute out of date. A more appropriate file existence test might be a call to open() followed by an immediate close(). This is not cached. Notably, the open()/close() sequence also flushes the NFS cache, making a subsequent call to access() (fs::file_access()) or stat() (fs::file_info()) yield fresh results.

I would propose to change file_exists() to the open/close pattern, as here caching is both unexpected (other implementations use open/close to detect file presence) and not usually performance relevant. At least, I can't think of many scenarios where a API user would check for the existence of a million files. With file_access() and file_info() the situation is different, it is more obvious that this is meta data that may get cached (on NFS specifically) and these are more likely to be called on large input vectors or many times. The rest can be docs, and file_exists can also double as cache flush. Also, this is much less code change than an extra flag to all the functions and a dedicated cache flush function.

(Happy to do PR to this effect if desired)

@gaborcsardi
Copy link
Member

I am sorry for the late response.

I assume that open() + close() is much (?) more costly than stat(), especially if the latter is cached. So it does not seem like a good idea to have the former as the default. Maybe we could have another implementation, selected by an optional argument.

Personally I am not sure if this is worth it, because file_exists() is a race condition by definition, anyway. But if it is still important for you, I'll be happy to review a PR.

@gaborcsardi gaborcsardi added the feature a feature request or enhancement label Apr 25, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature a feature request or enhancement
Projects
None yet
Development

No branches or pull requests

2 participants