Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

_PickleableListObjectStream expects invalid XML element in "ListAllMyBucketsResult" #211

Closed
1 task done
N-o-Z opened this issue Jun 18, 2024 · 3 comments
Closed
1 task done
Labels
bug Something isn't working

Comments

@N-o-Z
Copy link

N-o-Z commented Jun 18, 2024

s3torchconnector version

s3torchconnector==1.2.3

s3torchconnectorclient version

s3torchconnectorclient==1.2.3

AWS Region

No response

Describe the running environment

Using a lakeFS server as a backing store (S3 compatible)
Trying to read images from s3 path

What happened?

Trying to read from an S3 prefix, and error is returned due to expected xml element, which is not required in that specific API response.
According to AWS response structure for ListAllMyBucketsResult does not contain the IsTruncated element.
AWS reference
lakeFS as an S3 compatible endpoint does not return this element as part of the response as expected.

Code snippet:

    import s3torchconnector


    IMAGES_URI = "s3://test/main/"
    REGION = "us-east-1"

    dataset = s3torchconnector.S3MapDataset.from_prefix(IMAGES_URI, region=REGION, endpoint='http://localhost:8000')
    object = dataset[0]
    content = object.read()

Relevant log output

/dev/python-wrapper-venv/lib/python3.10/site-packages/s3torchconnector/s3map_dataset.py:144: in __getitem__
    return self._transform(self._get_object(i))
/dev/python-wrapper-venv/lib/python3.10/site-packages/s3torchconnector/s3map_dataset.py:138: in _get_object
    bucket_key = self._dataset_bucket_key_pairs[i]
/dev/python-wrapper-venv/lib/python3.10/site-packages/s3torchconnector/s3map_dataset.py:56: in _dataset_bucket_key_pairs
    self._bucket_key_pairs = list(self._get_dataset_objects(self._get_client()))
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

self = <s3torchconnector._s3_bucket_iterable._PickleableListObjectStream object at 0x725c6e345b40>

    def __next__(self) -> ListObjectResult:
>       return next(self._list_stream)
E       s3torchconnectorclient._mountpoint_s3_client.S3Exception: Client error: Internal S3 client error: Missing field IsTruncated from XML element Element { prefix: None, namespace: None, namespaces: None, name: "ListAllMyBucketsResult", attributes: {}, children: [Element(Element { prefix: None, namespace: None, namespaces: None, name: "Buckets", attributes: {}, children: [Element(Element { prefix: None, namespace: None, namespaces: None, name: "Bucket", attributes: {}, children: [Element(Element { prefix: None, namespace: None, namespaces: None, name: "CreationDate", attributes: {}, children: [Text("2024-06-18T18:40:00.731Z")] }), Element(Element { prefix: None, namespace: None, namespaces: None, name: "Name", attributes: {}, children: [Text("test")] })] })] }), Element(Element { prefix: None, namespace: None, namespaces: None, name: "Owner", attributes: {}, children: [Element(Element { prefix: None, namespace: None, namespaces: None, name: "DisplayName", attributes: {}, children: [] }), Element(Element { prefix: None, namespace: None, namespaces: None, name: "ID", attributes: {}, children: [] })] })] }

/dev/python-wrapper-venv/lib/python3.10/site-packages/s3torchconnector/_s3_bucket_iterable.py:50: S3Exception

Code of Conduct

  • I agree to follow this project's Code of Conduct
@N-o-Z N-o-Z added the bug Something isn't working label Jun 18, 2024
@fuatbasik
Copy link
Contributor

fuatbasik commented Jun 21, 2024

Hi @N-o-Z. Thanks a lot for creating this issue.

I think this is related to your LakeFS implementation as request and response does not match.

S3 Connector for PyTorch is making a ListObjectV2 request when trying to access to the elements of a map dataset for the first time.

https://github.com/awslabs/s3-connector-for-pytorch/blob/main/s3torchconnectorclient/rust/src/list_object_stream.rs#L55

and client call is https://github.com/awslabs/mountpoint-s3/blob/main/mountpoint-s3-client/src/s3_crt_client.rs#L1144

but seems like LakeFS is returning ListAllMyBucketsResult response.

ListObjectV2 response indeed requires isTruncated field.

Please let me know if this addresses your question.

@N-o-Z
Copy link
Author

N-o-Z commented Jun 21, 2024

Hi @N-o-Z. Thanks a lot for creating this issue.

I think this is related to your LakeFS implementation as request and response does not match.

S3 Connector for PyTorch is making a ListObjectV2 request when trying to access to the elements of a map dataset for the first time.

https://github.com/awslabs/s3-connector-for-pytorch/blob/main/s3torchconnectorclient/rust/src/list_object_stream.rs#L55

and client call is https://github.com/awslabs/mountpoint-s3/blob/main/mountpoint-s3-client/src/s3_crt_client.rs#L1144

but seems like LakeFS is returning ListAllMyBucketsResult response.

ListObjectV2 response indeed requires isTruncated field.

Please let me know if this addresses your question.

@fuatbasik thank you for the response.
I was sure the s3 connector was doing a list buckets requests.
Please allow me look into it further and I'll close the issue if needed

@N-o-Z
Copy link
Author

N-o-Z commented Jun 21, 2024

@fuatbasik Hi,

I verified that the issue was that I was expecting path style url. I modified our server to use a virtual host and it works.
I also noticed this issue was opened in regards - so closing this issue.

@N-o-Z N-o-Z closed this as completed Jun 21, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants