A starting point for an Apache Flink project that powers a data pipeline involving Kafka and the ELK stack. There are also various administrative tools like Kafdrop. All these systems are able to run under docker.
-
Docker on Mac
-
Gradle - You have a few options here
- If you're using Intellij, just make sure it's enabled.
- Run
brew install gradle
Let's first clone the repo and fire up our system,
git clone [email protected]:aedenj/apache-flink-starter.git ~/projects/apache-flink-starter
cd ~/projects/apache-flink-starter;./gradlew kafkaUp
Now you have a single node Kafka cluster with various admin tools to make life a little easier. See the Kafka cluster repo for its operating details.
The sample job in this repo reads from a topic named source
and writes to a topic named destination
.
There are a couple of ways of running this job depending on what you're trying to accomplish.
First, let's setup the kafka topics. Run ./gradlew createTopics
.
For quick feedback it's easiest to run the job locally,
- If you're using Intellij, use the usual methods.
- On the command line run
./gradlew shadowJar run
Run ./gradlew shadowJar startJob
. This will run the job within a job cluster that is setup in flink-job-cluster.yml
. That cluster will run against the Kafka cluster started earlier.
After starting the job with one of the methods above, let's observe it reading an writing a message from one Kafak topic to another.
- Start the job using one of the methods above.
- In a new terminal start a Kafka producer by running
./scripts/start-kafka-producer.sh
- You'll see the prompt
>
. Enter the message1:{ message: "Hello World!" }
- Navigate to the Kafdrop and view the messages both the
source
anddestination
topics. Be sure to change format to default or else you will not see any messages.
You should see the message 1:{ message: "Hello World!" }
in both topics.
Live reload is a great feature to have in your development loop as it can save you time. The closest I've come to on this is the command ./gradlew -t shadowJar startJob
. This approach attempts to simulate live reload using Gradle's -t
flag by restarting the containers of the Flink job cluster in flink-job-cluster.yml
.
Another approach I've explored is using JRebel, which can make classes reloadable with existing class loaders. Only changed classes are recompiled and instantly reloaded in the running application. JRebel will not re-run main
for you so I've had mixed results with its effectiveness.
If you've found a better way, please drop me an email.
This repo also comes with the ability to spin up Grafana, Elastic Search and Logstash that let's you try out the common use case of using Grafana as a visual tool for querying data from Elastic Search. Simply run ./gradlew grafanaUp
. There are additional containers present to support administractive tasks,
- Dejavu - Dejavu is a UI for browsing data in Elasticsearch..
- Cerebro - Is a cluster management UI for Elastic Search
Some things to note about the setup,
- Elasticsearch as already been setup as a datasource for Grafana.
- Logstash has a basic configuration to read from the Kafka cluster and write to Elasticsearch.
- No dashboard has been setup in Grafana.
- Run
./gradlew grafanaUp
if you haven't already. - Open Dejavu
- Enter
http://elasticsearch:9200
into the input box with hint text ofURL for Cluster
. - Enter
*
in the input box with the hint text ofAppname
- Open Cerebro and give it a spin. It's feature rich.