Skip to content

Setting LIMIT on Dynamo scan doesn't give expected read throughput. #60

Open
@si-robinson

Description

@si-robinson

When backing up a table, setting the read-percentage doesn't result in expected throughput:

The calc to ascertain what the throughput should be is correct but using that value as the limit on a scan does not result in that throughput.
Setting the limit on scan merely prevents a scan operation from reading more than that many items in a single scan operation.
Items are not equivalent to units of throughput (which is more a measure of data size than count).
Scans are not limited to 1 per second.

So, with a table capacity of 10 and a read-percentage of 0.8, the expected throughput would be 8 units/second.
BUT setting the scan limit to 8 only tells it to consume a max of 8 items (or the max of 1MB). Those items could be of any size. From the dynamo docs:
"One read capacity unit represents one strongly consistent read per second, or two eventually consistent reads per second, for an item up to 4 KB in size".
So, if each item in our example was a little bigger than 4K, consuming 8 items will actually use 16 units of capacity.

So, while memory is being protected by limiting the scan, it doesn't result in the desired read-percentage throughput.

I think what this needs is a delay of some sort. There's a nice approach described here:
https://aws.amazon.com/blogs/developer/rate-limited-scans-in-amazon-dynamodb/
using Google Guava's RateLimiter class...
Essentially, it uses the metadata passed back by dyanmo to calculate consumed throughput and the rate limiter to achieve the throughput required.

I've attached a graph showing consumed throughput on a table during 2 backup operations:

  1. Back up a table with read-percentage set to 0.8 and a table capacity of 10
  2. Back up a table with read-percentage set to 0.8 and a table capacity of 5
    As you can see, this isn't quite what was expected :)

staging

Any chance this functionality could be added?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions