Setting LIMIT on Dynamo scan doesn't give expected read throughput.

When backing up a table, setting the read-percentage doesn't result in expected throughput:

The calc to ascertain what the throughput should be is correct but using that value as the limit on a scan does not result in that throughput.
Setting the limit on scan merely prevents a scan operation from reading more than that many *items* in a single scan operation.  
Items are not equivalent to units of throughput (which is more a measure of data size than count).
Scans are not limited to 1 per second.

So, with a table capacity of 10 and a read-percentage of 0.8, the expected throughput would be 8 units/second.
BUT setting the scan limit to 8 only tells it to consume a max of 8 items (or the max of 1MB).  Those items could be of any size.  From the dynamo docs:
"One read capacity unit represents one strongly consistent read per second, or two eventually consistent reads per second, for an item up to 4 KB in size".  
So, if each item in our example was a little bigger than 4K, consuming 8 *items* will actually use 16 *units* of capacity.

So, while memory is being protected by limiting the scan, it doesn't result in the desired read-percentage throughput.

I think what this needs is a delay of some sort.  There's a nice approach described here:
https://aws.amazon.com/blogs/developer/rate-limited-scans-in-amazon-dynamodb/
using Google Guava's RateLimiter class...
Essentially, it uses the metadata passed back by dyanmo to calculate consumed throughput and the rate limiter to achieve the throughput required.

I've attached a graph showing consumed throughput on a table during 2 backup operations:
1) Back up a table with read-percentage set to 0.8 and a table capacity of 10
2) Back up a table with read-percentage set to 0.8 and a table capacity of 5 
As you can see, this isn't quite what was expected :)
<img width="916" alt="staging" src="https://user-images.githubusercontent.com/20928194/30318686-b986c7a4-97a5-11e7-9a28-4a283992d7d7.png">

Any chance this functionality could be added?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Setting LIMIT on Dynamo scan doesn't give expected read throughput. #60

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Setting LIMIT on Dynamo scan doesn't give expected read throughput. #60

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions