GitHub - Hunk13/ansible-kafka-spark

Ansible KAfka + Spark pipeline

Basic steps to get started:

Run Vagrant machine

Vagrant up

Run ansible task:

ansible-playbook kafka.yml

This script will install, configure and run the necessary services.

Create topics:

/usr/local/lib/kafka/bin/kafka-topics.sh --create --zookeeper localhost:2181/kafka --replication-factor 1 --partitions 1 --topic Inbound

Show running topics:

/usr/local/lib/kafka/bin/kafka-topics.sh --list --zookeeper localhost:2181/kafka

Compile scala script

cd /vagrant/sp/

mvn clean install

Run compiled script:

java -jar target/spark-streaming-kafka-0-10-inbound-1.0.jar

If you get an error "Error: A JNI error has occurred, please check your installation and try again" when starting the jar file you need to run command zip -d target/spark-streaming-kafka-0-10-inbound-1.0.jar META-INF/*.RSA META-INF/*.DSA META-INF/*.SF

After compile and run jar file you can push data in topic 'Inbound' and pull data from topic 'Outbound'

Push data:

cat /vagrant/sample-data.json | /usr/local/lib/kafka/bin/kafka-console-producer.sh --broker-list localhost:9092 --topic Inbound

Get data:

/usr/local/lib/kafka/bin/kafka-console-consumer.sh --bootstrap-server localhost:9092 --topic Outbound --from-beginning

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
pub		pub
roles		roles
.gitignore		.gitignore
Vagrantfile		Vagrantfile
ansible.cfg		ansible.cfg
inventory		inventory
kafka.retry		kafka.retry
kafka.yml		kafka.yml
readme.MD		readme.MD

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Ansible KAfka + Spark pipeline

About

Releases

Packages

Languages

Hunk13/ansible-kafka-spark

Folders and files

Latest commit

History

Repository files navigation

Ansible KAfka + Spark pipeline

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages