Apache Flink Starter

A starting point for an Apache Flink project that powers a data pipeline involving Kafka and the ELK stack. There are also various administrative tools like Kafdrop. All these systems are able to run under docker.

Pre-Requisites

Docker on Mac
Gradle - You have a few options here
- If you're using Intellij, just make sure it's enabled.
- Run brew install gradle

Up & Running

Let's first clone the repo and fire up our system,

git clone [email protected]:aedenj/apache-flink-starter.git ~/projects/apache-flink-starter
cd ~/projects/apache-flink-starter;./gradlew kafkaUp

Now you have a single node Kafka cluster with various admin tools to make life a little easier. See the Kafka cluster repo for its operating details.

Running the App

The sample job in this repo reads from a topic named source and writes to a topic named destination. There are a couple of ways of running this job depending on what you're trying to accomplish.

First, let's setup the kafka topics. Run ./gradlew createTopics.

Locally

For quick feedback it's easiest to run the job locally,

If you're using Intellij, use the usual methods.
On the command line run ./gradlew shadowJar run

Using the Job Cluster

Run ./gradlew shadowJar startJob. This will run the job within a job cluster that is setup in flink-job-cluster.yml. That cluster will run against the Kafka cluster started earlier.

Observing the Job in Action

After starting the job with one of the methods above, let's observe it reading an writing a message from one Kafak topic to another.

Start the job using one of the methods above.
In a new terminal start a Kafka producer by running ./scripts/start-kafka-producer.sh
You'll see the prompt >. Enter the message 1:{ message: "Hello World!" }
Navigate to the Kafdrop and view the messages both the source and destination topics. Be sure to change format to default or else you will not see any messages.

You should see the message 1:{ message: "Hello World!" } in both topics.

Live Reload

Live reload is a great feature to have in your development loop as it can save you time. The closest I've come to on this is the command ./gradlew -t shadowJar startJob. This approach attempts to simulate live reload using Gradle's -t flag by restarting the containers of the Flink job cluster in flink-job-cluster.yml.

Another approach I've explored is using JRebel, which can make classes reloadable with existing class loaders. Only changed classes are recompiled and instantly reloaded in the running application. JRebel will not re-run main for you so I've had mixed results with its effectiveness.

If you've found a better way, please drop me an email.

Grafana, Elastic Search and Logstash

This repo also comes with the ability to spin up Grafana, Elastic Search and Logstash that let's you try out the common use case of using Grafana as a visual tool for querying data from Elastic Search. Simply run ./gradlew grafanaUp. There are additional containers present to support administractive tasks,

Dejavu - Dejavu is a UI for browsing data in Elasticsearch..
Cerebro - Is a cluster management UI for Elastic Search

Some things to note about the setup,

Elasticsearch as already been setup as a datasource for Grafana.
Logstash has a basic configuration to read from the Kafka cluster and write to Elasticsearch.
No dashboard has been setup in Grafana.

Viewing Data with Dejavu

Run ./gradlew grafanaUp if you haven't already.
Open Dejavu
Enter http://elasticsearch:9200 into the input box with hint text of URL for Cluster.
Enter * in the input box with the hint text of Appname

Managing Elastic Search with Cerebro

Open Cerebro and give it a spin. It's feature rich.

Name		Name	Last commit message	Last commit date
Latest commit History 39 Commits
conf		conf
docker		docker
gradle/wrapper		gradle/wrapper
scripts		scripts
src		src
.gitignore		.gitignore
README.md		README.md
build.gradle		build.gradle
gradle.properties		gradle.properties
gradlew		gradlew
gradlew.bat		gradlew.bat
settings.gradle		settings.gradle

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Apache Flink Starter

Pre-Requisites

Up & Running

Running the App

Locally

Using the Job Cluster

Observing the Job in Action

Live Reload

Grafana, Elastic Search and Logstash

Viewing Data with Dejavu

Managing Elastic Search with Cerebro

About

Releases

Packages

Languages

aedenj/apache-flink-starter

Folders and files

Latest commit

History

Repository files navigation

Apache Flink Starter

Pre-Requisites

Up & Running

Running the App

Locally

Using the Job Cluster

Observing the Job in Action

Live Reload

Grafana, Elastic Search and Logstash

Viewing Data with Dejavu

Managing Elastic Search with Cerebro

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages