From 29e82e10448663afe60d19a85266394c42a9ee99 Mon Sep 17 00:00:00 2001 From: Andreas Motl Date: Mon, 16 Sep 2024 17:50:55 +0200 Subject: [PATCH] ETL/CDC: Add information about AWS Database Migration Service (AWS DMS) --- docs/_include/links.md | 2 ++ docs/integrate/cdc/index.md | 19 +++++++++++++++- docs/integrate/etl/index.md | 42 ++++++++++++++++++++++++----------- docs/migrate/rockset/index.md | 7 ++++-- 4 files changed, 54 insertions(+), 16 deletions(-) diff --git a/docs/_include/links.md b/docs/_include/links.md index 6ffedcd..4694b55 100644 --- a/docs/_include/links.md +++ b/docs/_include/links.md @@ -1,5 +1,7 @@ [Amazon DynamoDB Streams]: https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/Streams.html [Amazon Kinesis Data Streams]: https://docs.aws.amazon.com/streams/latest/dev/introduction.html +[AWS Database Migration Service (AWS DMS)]: https://aws.amazon.com/dms/ +[AWS DMS Integration with CrateDB]: https://cratedb-toolkit.readthedocs.io/io/dms/ [BM25]: https://en.wikipedia.org/wiki/Okapi_BM25 [cloud-datashader-colab]: https://colab.research.google.com/github/crate/cratedb-examples/blob/amo/cloud-datashader/topic/timeseries/explore/cloud-datashader.ipynb [cloud-datashader-github]: https://github.com/crate/cratedb-examples/blob/amo/cloud-datashader/topic/timeseries/explore/cloud-datashader.ipynb diff --git a/docs/integrate/cdc/index.md b/docs/integrate/cdc/index.md index 82d1a07..14276fa 100644 --- a/docs/integrate/cdc/index.md +++ b/docs/integrate/cdc/index.md @@ -17,7 +17,24 @@ to use them optimally. Please also have a look at support for [generic ETL](#etl) solutions. ::: -## Amazon Kinesis +(cdc-dms)= +## AWS DMS + +:::{div} +[AWS Database Migration Service (AWS DMS)] is a managed migration and replication +service that helps move your database and analytics workloads between different +kinds of databases quickly, securely, and with minimal downtime and zero data +loss. It supports migration between 20-plus database and analytics engines. + +AWS DMS supports both `full-load` and `cdc` operation modes, often used in +combination with each other (`full-load-and-cdc`). + +The [AWS DMS Integration with CrateDB] uses Amazon Kinesis Data Streams as +a DMS target, combined with a CrateDB-specific downstream processor element. +::: + +(cdc-kinesis)= +## AWS Kinesis You can use Amazon Kinesis Data Streams to collect and process large streams of data records in real time. A typical Kinesis Data Streams application reads data from a data stream as data records. diff --git a/docs/integrate/etl/index.md b/docs/integrate/etl/index.md index 10b8c51..80686bf 100644 --- a/docs/integrate/etl/index.md +++ b/docs/integrate/etl/index.md @@ -17,19 +17,6 @@ to use them optimally. Please also have a look at support for [](#cdc) solutions. -## Amazon Kinesis - -Amazon Kinesis Data Streams is a serverless streaming data service that -simplifies the capture, processing, and storage of data streams at any -scale, such as application logs, website clickstreams, and IoT telemetry -data, for machine learning (ML), analytics, and other applications. -:::{div} -The [DynamoDB CDC Relay] pipeline uses Amazon Kinesis to relay a table -change stream from a DynamoDB table into a CrateDB table, see also -[DynamoDB CDC](#cdc-dynamodb). -::: - - ## Apache Airflow / Astronomer A set of starter tutorials. @@ -86,6 +73,35 @@ kafka-connect - [Connecting to CrateDB from Apache NiFi] +## AWS DMS + +:::{div} +[AWS Database Migration Service (AWS DMS)] is a managed migration and replication +service that helps move your database and analytics workloads between different +kinds of databases quickly, securely, and with minimal downtime and zero data +loss. It supports migration between 20-plus database and analytics engines. + +AWS DMS supports both `full-load` and `cdc` operation modes, often used in +combination with each other (`full-load-and-cdc`). + +The [AWS DMS Integration with CrateDB] uses Amazon Kinesis Data Streams as +a DMS target, combined with a CrateDB-specific downstream processor element. +::: + + +## AWS Kinesis + +Amazon Kinesis Data Streams is a serverless streaming data service that +simplifies the capture, processing, and storage of data streams at any +scale, such as application logs, website clickstreams, and IoT telemetry +data, for machine learning (ML), analytics, and other applications. +:::{div} +The [DynamoDB CDC Relay] pipeline uses Amazon Kinesis to relay a table +change stream from a DynamoDB table into a CrateDB table, see also +[DynamoDB CDC](#cdc-dynamodb). +::: + + ## Azure Functions - {ref}`azure-functions` diff --git a/docs/migrate/rockset/index.md b/docs/migrate/rockset/index.md index 4e4d126..f13f3be 100644 --- a/docs/migrate/rockset/index.md +++ b/docs/migrate/rockset/index.md @@ -273,13 +273,16 @@ Learn how to migrate your database use cases and workloads from Rockset to Crate ::::{grid-item-card} ::: -:::{rubric} Migrating DynamoDB workloads from Rockset to CrateDB +:::{rubric} Migrating data using AWS DMS +::: +- [AWS DMS Integration with CrateDB] +:::{rubric} Migrating data from DynamoDB to CrateDB ::: - [DynamoDB Table Loader] - [DynamoDB CDC Relay] - [DynamoDB CDC Relay with AWS Lambda] - Blog: [Replicating CDC events from DynamoDB to CrateDB] -:::{rubric} Migrating MongoDB workloads from Rockset to CrateDB +:::{rubric} Migrating data from MongoDB to CrateDB ::: - [MongoDB Table Loader] - [MongoDB CDC Relay]