Description
When backing up a table, setting the read-percentage doesn't result in expected throughput:
The calc to ascertain what the throughput should be is correct but using that value as the limit on a scan does not result in that throughput.
Setting the limit on scan merely prevents a scan operation from reading more than that many items in a single scan operation.
Items are not equivalent to units of throughput (which is more a measure of data size than count).
Scans are not limited to 1 per second.
So, with a table capacity of 10 and a read-percentage of 0.8, the expected throughput would be 8 units/second.
BUT setting the scan limit to 8 only tells it to consume a max of 8 items (or the max of 1MB). Those items could be of any size. From the dynamo docs:
"One read capacity unit represents one strongly consistent read per second, or two eventually consistent reads per second, for an item up to 4 KB in size".
So, if each item in our example was a little bigger than 4K, consuming 8 items will actually use 16 units of capacity.
So, while memory is being protected by limiting the scan, it doesn't result in the desired read-percentage throughput.
I think what this needs is a delay of some sort. There's a nice approach described here:
https://aws.amazon.com/blogs/developer/rate-limited-scans-in-amazon-dynamodb/
using Google Guava's RateLimiter class...
Essentially, it uses the metadata passed back by dyanmo to calculate consumed throughput and the rate limiter to achieve the throughput required.
I've attached a graph showing consumed throughput on a table during 2 backup operations:
- Back up a table with read-percentage set to 0.8 and a table capacity of 10
- Back up a table with read-percentage set to 0.8 and a table capacity of 5
As you can see, this isn't quite what was expected :)
Any chance this functionality could be added?