This README contains the steps needed to run the data pipeline with the Assignment 3 configuration.
- Run Vagrant Up to start everything up, from there Ansible will take over.
- Once the vms have started up, you must ssh into VM2 and create the private registry then pull the official couchdb and ubuntu18.04 docker images, then build and tag and push all the neccessary docker images (besides consumer) (make sure to build the kafka one before the zookeeper one and to tag the kafka one as plainkafka)
- Once the images are ready you can start up Kubernetes on VM2 and then connect VM3 with the command given
- Then using the provided yaml files you start up the zookeeper then the brokers (the brokers yaml's must be edited to add the zookeeper's cluster IP)
- Then start up CouchDB using its yaml file
- Then run the bootstrap command from a separate VM to create the topics
bin/kafka-topics.sh --create --topic utilization1 --bootstrap-server 129.114.25.96:30000
(we did this and then the same command but withutilization2
from where we ran one of the producers) - Finally, build, tag, and push the consumer docker image then start the consumer job using the yaml and run producers from seperate VMs (the consumer code must have the proper IP for the CouchDB cluster IP)
- We can then check to see if it worked by running
curl -X GET http://admin:[email protected]:30005/assignment3/_all_docs?include_docs=true