Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Endpoint URL and S3Config is used when checking if the dir is empty #214

Closed
1 task done
amitani opened this issue Jul 2, 2024 · 1 comment
Closed
1 task done
Labels
bug Something isn't working

Comments

@amitani
Copy link

amitani commented Jul 2, 2024

s3torchconnector version

s3torchconnector-1.2.3

s3torchconnectorclient version

s3torchconnectorclient-1.2.3

AWS Region

No response

Describe the running environment

Running locally on M1 Mac.

What happened?

This is related to experimental feature in 1.2.3 for specifying endpoint URL.

Specifying S3 URL as dirpath in Trainer triggered checking if the dir is empty using fsspec, instead of plugin.

This part tries to access S3 directly from Lightning without the plugin, leading to authentication error.
https://github.com/Lightning-AI/pytorch-lightning/blob/37e04d075a5532c69b8ac7457795b4345cca30cc/src/lightning/pytorch/callbacks/model_checkpoint.py#L274

Instead, all access to S3 when using the connector should go through the plugin.

Plugin I tried to use.

        s3_lightning_checkpoint = S3LightningCheckpoint(
            "us-west-2",
            s3client_config=S3ClientConfig(unsigned=True),
            endpoint="http://localhost:9090",
        )

Relevant log output

No response

Code of Conduct

  • I agree to follow this project's Code of Conduct
@amitani amitani added the bug Something isn't working label Jul 2, 2024
@IsaevIlya
Copy link
Contributor

Hello @amitani,
Thank you for your interest in Amazon S3 Connector for PyTorch.

You're correct that PyTorch Lightning's current design and architecture limit the use of the CheckpointIO interface implementation, which our library provides, to only read/write/delete checkpoint operations. However, PyTorch Lightning is using its built-in fsspec library for other types of requests to S3, such as listing directories, which bypass our library's implementation and lead to the authentication error you encountered.

Given that this is an architectural limitation within PyTorch Lightning itself, and the CheckpointIO interface is not designed to handle all types of requests to S3, there is indeed little we can do from our end to address this issue directly.

In light of this information, I appreciate you taking the time to explain the root cause and the constraints we face due to PyTorch Lightning's current architecture. While we cannot resolve this issue within our library, your feedback provides valuable insights that can help us better understand the limitations.

I'm closing this issue for the time being. Please feel free to reopen it if you have any additional questions or concerns regarding this topic.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants