Assignment 3 - Data Pipeline with Docker & Kubernetes

Team 14: Jacob Stallings & Quinton Chudik

This README contains the steps needed to run the data pipeline with the Assignment 3 configuration.

Run Vagrant Up to start everything up, from there Ansible will take over.
Once the vms have started up, you must ssh into VM2 and create the private registry then pull the official couchdb and ubuntu18.04 docker images, then build and tag and push all the neccessary docker images (besides consumer) (make sure to build the kafka one before the zookeeper one and to tag the kafka one as plainkafka)
Once the images are ready you can start up Kubernetes on VM2 and then connect VM3 with the command given
Then using the provided yaml files you start up the zookeeper then the brokers (the brokers yaml's must be edited to add the zookeeper's cluster IP)
Then start up CouchDB using its yaml file
Then run the bootstrap command from a separate VM to create the topics bin/kafka-topics.sh --create --topic utilization1 --bootstrap-server 129.114.25.96:30000 (we did this and then the same command but with utilization2 from where we ran one of the producers)
Finally, build, tag, and push the consumer docker image then start the consumer job using the yaml and run producers from seperate VMs (the consumer code must have the proper IP for the CouchDB cluster IP)
We can then check to see if it worked by running curl -X GET http://admin:[email protected]:30005/assignment3/_all_docs?include_docs=true

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
AnsibleVagrant_Combo		AnsibleVagrant_Combo
Assignment 3 Effort Expended (T14).pdf		Assignment 3 Effort Expended (T14).pdf
Assignment 3 Teamwork.pdf		Assignment 3 Teamwork.pdf
README.md		README.md
consumer.py		consumer.py