Skip to content

A simple process that pulls data from elastic -raw-* indices, takes the hits and mimics them as a fhir bundle and pushes to kafka for consuming into clickhouse.

Notifications You must be signed in to change notification settings

arran-standish/eth-elastic-clickhouse-migrator

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

About

A simple process that pulls data from elastic -raw-* indices, takes the hits and mimics them as a fhir bundle and pushes to kafka for consuming into clickhouse.

HOW TO RUN IN QA / PROD

Step 1: Start migration / duplication process

  1. Scale down reverse proxy to 0 (we will need to have clickhouse + clickhouse mapper running during the process, and we do not want new data to interfere with the migration).
docker service scale reverse-proxy_reverse-proxy-nginx=0
  1. Create migration topic in kafka
docker exec kafka_kafka-01.1. ... /opt/bitnami/kafka/bin/kafka-topics.sh --bootstrap-server localhost:9092 --create --topic migration --partitions 3
  1. Deploy clickhouse
  2. Set KAFKA_2XX_TOPIC to migration in the relevant .env file
  3. Set RAW_CONSUMER_GROUP_ID to clickhouse-migration in the relevant .env file

This is required before deploying the kafka-mapper-consumer service so we consume from the migration topic instead and so we do not influence the default xx group id.

  1. Deploy kafka-mapper-consumer
  2. Update ELASTIC_PASSWORD under docker/docker-compose.yml to the correct password
  3. Copy this folder to the QA / PROD server
GLOBIGNORE='.git:.vscode' scp -r /path/to/folder/elastic-clickhouse-migrator/* user@ip-address:~/elastic-clickhouse-migrator
  1. Deploy this code base as a service on the server. This is necessary since the networks are not attachable and to avoid creating a temporary attachable network we deploy this as a service, since that allows you to connect to the networks.
docker stack deploy -c docker/docker-compose.yml migration

Step 2: Cleanup

Once the service we deploy has exited double check the logs to make sure it exited due to finishing and not due to an error. If it was due to an error, investigate it, resovle any issues and then retry from step 1. You will need to remove clickhouse + clickhouse volumes on all nodes before starting step 1 again (just so we have a fresh clickhouse instance and not partial data). Also remove the migration stack since it won't restart if you redeploy the stack.

  1. Check that the kafka topic has drained and so all the messages have been sent to clickhouse
docker exec kafka_kafka-01.1. ... /opt/bitnami/kafka/bin/kafka-consumer-groups.sh --bootstrap-server localhost:9092 --group clickhouse-migration --describe
  1. Remove the service we just deployed to migrate data
docker stack rm migration
  1. Set KAFKA_2XX_TOPIC to 2xx and set RAW_CONSUMER_GROUP_ID to clickhouse-2xx.
docker service update kafka-mapper-consumer_kafka-mapper-consumer --env-add=KAFKA_2XX_TOPIC=2xx --env-add=RAW_CONSUMER_GROUP_ID=clickhouse-2xx
  1. Remove the migration topic
docker exec kafka_kafka-01.1. ... /opt/bitnami/kafka/bin/kafka-topics.sh --bootstrap-server localhost:9092 --delete --topic migration
  1. Scale reverse-proxy back again
docker service scale reverse-proxy_reverse-proxy-nginx=1

Step 3: Other

While this is not specific to the migration of data from clickhouse to elastic, clickhouse, under the cdr implementation, requires the dbt ofelia job to be running. So you'll need to redeploy that service. Just make sure you correctly set the SUBDOMAINS environment variable to include both ndr unique subdomains (clickhouse, superset) and cdr unique subdomains (kibana).

About

A simple process that pulls data from elastic -raw-* indices, takes the hits and mimics them as a fhir bundle and pushes to kafka for consuming into clickhouse.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published