A simple process that pulls data from elastic -raw-* indices, takes the hits and mimics them as a fhir bundle and pushes to kafka for consuming into clickhouse.
- Scale down reverse proxy to 0 (we will need to have clickhouse + clickhouse mapper running during the process, and we do not want new data to interfere with the migration).
docker service scale reverse-proxy_reverse-proxy-nginx=0
- Create
migration
topic in kafka
docker exec kafka_kafka-01.1. ... /opt/bitnami/kafka/bin/kafka-topics.sh --bootstrap-server localhost:9092 --create --topic migration --partitions 3
- Deploy clickhouse
- Set
KAFKA_2XX_TOPIC
tomigration
in the relevant .env file - Set
RAW_CONSUMER_GROUP_ID
toclickhouse-migration
in the relevant .env file
This is required before deploying the kafka-mapper-consumer service so we consume from the migration topic instead and so we do not influence the default xx group id.
- Deploy kafka-mapper-consumer
- Update
ELASTIC_PASSWORD
under docker/docker-compose.yml to the correct password - Copy this folder to the QA / PROD server
GLOBIGNORE='.git:.vscode' scp -r /path/to/folder/elastic-clickhouse-migrator/* user@ip-address:~/elastic-clickhouse-migrator
- Deploy this code base as a service on the server. This is necessary since the networks are not attachable and to avoid creating a temporary attachable network we deploy this as a service, since that allows you to connect to the networks.
docker stack deploy -c docker/docker-compose.yml migration
Once the service we deploy has exited double check the logs to make sure it exited due to finishing and not due to an error. If it was due to an error, investigate it, resovle any issues and then retry from step 1. You will need to remove clickhouse + clickhouse volumes on all nodes before starting step 1 again (just so we have a fresh clickhouse instance and not partial data). Also remove the migration stack since it won't restart if you redeploy the stack.
- Check that the kafka topic has drained and so all the messages have been sent to clickhouse
docker exec kafka_kafka-01.1. ... /opt/bitnami/kafka/bin/kafka-consumer-groups.sh --bootstrap-server localhost:9092 --group clickhouse-migration --describe
- Remove the service we just deployed to migrate data
docker stack rm migration
- Set
KAFKA_2XX_TOPIC
to2xx
and setRAW_CONSUMER_GROUP_ID
toclickhouse-2xx
.
docker service update kafka-mapper-consumer_kafka-mapper-consumer --env-add=KAFKA_2XX_TOPIC=2xx --env-add=RAW_CONSUMER_GROUP_ID=clickhouse-2xx
- Remove the migration topic
docker exec kafka_kafka-01.1. ... /opt/bitnami/kafka/bin/kafka-topics.sh --bootstrap-server localhost:9092 --delete --topic migration
- Scale reverse-proxy back again
docker service scale reverse-proxy_reverse-proxy-nginx=1
While this is not specific to the migration of data from clickhouse to elastic, clickhouse, under the cdr implementation, requires the dbt ofelia job to be running. So you'll need to redeploy that service. Just make sure you correctly set the SUBDOMAINS
environment variable to include both ndr unique subdomains (clickhouse, superset) and cdr unique subdomains (kibana).