Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DynamoDB: Add pagination support for full-load table loader #252

Merged
merged 3 commits into from
Sep 2, 2024

Conversation

amotl
Copy link
Member

@amotl amotl commented Sep 2, 2024

About

For making DynamoDB full-load operations more efficient, use CrateDB's bulk operations. With this patch, the batching towards CrateDB is implemented by doing pagination on the DynamoDB source database. For other strategies, see backlog.

References

Backlog

For subsequent iterations:

/cc @wierdvanderhaar, @hlcianfagna

@amotl amotl force-pushed the dynamodb-full-load-batch branch 2 times, most recently from d7195d3 to 5b595d9 Compare September 2, 2024 09:49
pyproject.toml Outdated
"commons-codec>=0.0.12",
"commons-codec @ git+https://github.com/crate/commons-codec.git@dynamodb-full-load-batch",
Copy link
Member Author

@amotl amotl Sep 2, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

commons-codec needs to converge into a release beforehand, after merging crate/commons-codec#43.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

commons-codec v0.0.14 has been released.

@amotl amotl force-pushed the dynamodb-full-load-batch branch from 5b595d9 to 3a98776 Compare September 2, 2024 10:24
@amotl amotl force-pushed the dynamodb-full-load-batch branch 3 times, most recently from 46bbc16 to 0807320 Compare September 2, 2024 10:39
@amotl amotl requested review from seut and surister September 2, 2024 10:45
@amotl amotl marked this pull request as ready for review September 2, 2024 10:46
@amotl amotl force-pushed the dynamodb-full-load-batch branch 2 times, most recently from f853e90 to ee5ecd8 Compare September 2, 2024 10:52
@amotl amotl force-pushed the dynamodb-full-load-batch branch 2 times, most recently from a063a5b to 5747aed Compare September 2, 2024 11:37
@amotl amotl force-pushed the dynamodb-full-load-batch branch from 5747aed to d32cabd Compare September 2, 2024 14:00
@amotl amotl mentioned this pull request Sep 2, 2024
11 tasks
Comment on lines +39 to +40
if key is not None:
scan_kwargs.update({"ExclusiveStartKey": key})
Copy link
Member Author

@amotl amotl Sep 2, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This spot, not covered yet, certainly needs a software test. I didn't pay enough attention, and Codecov didn't have access to the repository beforehand, so it did not run a corresponding admonition.

@amotl amotl merged commit 622edb3 into main Sep 2, 2024
32 of 33 checks passed
@amotl amotl deleted the dynamodb-full-load-batch branch September 2, 2024 14:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants