Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

S3FileSystem.exists returns False instead of error if credentials are wrong. #750

Open
kasuteru opened this issue Jun 21, 2023 · 6 comments · May be fixed by #757
Open

S3FileSystem.exists returns False instead of error if credentials are wrong. #750

kasuteru opened this issue Jun 21, 2023 · 6 comments · May be fixed by #757

Comments

@kasuteru
Copy link

Short version: When calling exists("/path/to/file") on an s3fs.S3FileSystem with wrong credentials, instead of throwing an error that the credentials are wrong, it instead responds with False. This is misleading, since the file might actually exist but the credentials are simply wrong.

Tested with version 2023.6.0, but existed in previous versions at least dating back to 2022 as well.

Was this a conscious decision or is this a bug? In case of the former, I heavily disagree with it...

@kasuteru
Copy link
Author

On closer inspection, this part of the code seems to be responsible:

s3fs/s3fs/core.py

Lines 1024 to 1028 in 10c6747

try:
await self._call_s3("get_bucket_location", Bucket=bucket, **self.req_kw)
return True
except Exception:
return False

It is looking for a generic exception - meaning that the expected exception (This file does not exist) and the unexpected exception (You entered the wrong credentials and we cannot even check whether the file exists) get mixed together... I propose looking at the error message, or maybe even test whether the file system even exists in the very beginning.

Reason for this is, I initially used fs.exists("/") to check whether connectivity was there (because, as mentioned in the description for this function, "/" always exists, so it is safe to call) and only started to question this when I realized the function returns True even if I have no connection at all...

@kasuteru kasuteru changed the title S3FileSystem.exists returns false instead of error if credentials are wrong. S3FileSystem.exists returns False instead of error if credentials are wrong. Jun 21, 2023
@martindurant
Copy link
Member

Your reasoning makes sense, you can try to implement two Except branches, specifically to the "credentials are bad" case (or "couldn't connect at all"). I'm not sure how those exceptions look, though.

@kasuteru
Copy link
Author

I will investigate and attempt to fix this.

@kasuteru
Copy link
Author

@martindurant Reason for deleting my previous comments was that they were incorrect and I didn't want to confuse future readers, sorry if this lead to confusion.

Upon further investigation, I have come to the conclusion that only the FileNotFoundError needs to be caught. It is actually correct to raise a PermissionError in case one is raised by S3 - this happens on private instances where buckets are hidden. I tested this with our internal S3 (where only admin can create new buckets), and currently, the following is an actual misbehavior, in my opinion:

# Test a private bucket that exists, but user has no access:
private_fs.exists("/bucket_exists_no_acccess")   # -> Returns False

The current response is False (From catching PermissionError) - However, this is incorrect. The bucket might exist, but we are not allowed to check that. Correct response would be to raise PermissionError.

I will do a bit more testing and the raise a PR.

@martindurant
Copy link
Member

I believe you are right. As you have found, it isn't exactly straight-forward compared to a real filesystem... Indeed, it is possible to only be able to see some contents of a bucket, and the user would have no real way to know about other paths they can't see. I think the current interpretation of exists() is "can I see?", but passing on any information we have to the user is valuable, especially for the case that the user cannot list at all.

@kasuteru
Copy link
Author

kasuteru commented Jul 20, 2023

So the question is, what behavior do users expect? The most precise answer in this case would be "You are not able to see this bucket, but it might exists - you do not have the rights to check that". Potential behaviours are (for PermissionError):

  1. Keep responding with False
  2. Keep responding with False, but raise a Warning
  3. Raise the PermissionError raised by boto3
  4. First do 2) with a deprecation message, then in a few months move to 3)

Number 2. would be the least intrusive, because it would not break current behavior. It is also what Amazon itself proposes: https://docs.aws.amazon.com/AmazonS3/latest/userguide/example_s3_HeadBucket_section.html

Ultimately, I am happy to implement any of those, but since I am not a maintainer of this library, I feel like the decision which one should be implemented is not up to me.

@kasuteru kasuteru linked a pull request Jul 20, 2023 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants