-
Notifications
You must be signed in to change notification settings - Fork 18
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cost regarding the usage of s3-connector-for-pytorch #203
Comments
Hi Tommaso, Thanks for reaching out and for your interest in our connector! Thanks, |
Hi Tommaso, Following up on your question, before jumping into costs, let me give you a bit of context on how our connector works for datasets creation. Let’s dive deep on option 1: S3MapDataset
S3IterableDataset
Please note that in this context, for both dataset types, listing the objects under the prefix once is not necessarily equivalent with one LIST request to S3. Now, let’s see what listing the objects under the prefix once means in terms of requests that actually get to S3.
In your specific scenario of 100000 objects, we’ll issue 100000/1000 = 100 ListObjectsV2 requests for listing once the objects under the given prefix. Then, depending on your implementation, this will get multiplied as explained above. Of course, in addition to this, there are also the costs associated with the actual retrieval of the object content when going through the dataloader’s items. |
Thank you for your detailed response.
|
Hi Tommaso, Thanks a lot for your interest in using S3 Connector for PyTorch. Caching is a interesting topic for the PyTorch connectors. Are you interested in caching object keys (to avoid lists) or object data? If former, you can always use S3MapDataset.from_objects() or S3IterableDataset.from_objects() methods those let you pass list of S3Keys you like to access. We would love to learn more about your use-case so we can understand how caching would benefit your workload? You would not pay for data-transferred if your S3 bucket and compute is located in the same AWS region. There would still be request cost though, which is 0.0004 USD for 1000 requests for S3 Standard in us-east-1. Please let us know if you have any follow-up questions. Fuat. |
Hi @TommasoBendinelli Is my answer above answered your questions? I am going to close this issue for now, please let us know if you have any other questions. |
Dear AWS Labs Team,
I'm writing to inquire about the underlying mechanism used by the S3 connector. Specifically, does it rely on LIST requests to access objects within an S3 bucket?
If so, Is the following cost estimation associated with iterating over objects for neural network training correct? Let's consider a scenario with a bucket containing 100,000 objects and with standard cost of the LIST request being 0.0005/1k USD. If I train a network for 10 epochs (and in each epoch I do a full pass of the dataset) would the total cost for these requests be approximately 100k * 10 * 0.0005 USD / 1k = 0.5 USD?
Thank you for your time and assistance.
Best regards,
The text was updated successfully, but these errors were encountered: