A Spring Cloud Data Flow demo on Kubernetes with Kafka provided by the Confluent Operator and VMware Tanzu Kubernetes Grid.
TKGI:
To demo the full stack refer to this repo to setup a Confluent Operator environment running on TKGI jbogie-vmware/tkgi-confluent-operator.
TKG:
COMING SOON
To walkthrough SCDF's streaming, batch, analytics, and observability features, the following components will be provisioned in the Kubernetes cluster.
- Spring Cloud Data Flow
- Spring Cloud Skipper
- Confluent Operator
- Prometheus
- Grafana
- MariaDB
trucks
— generates trucks in random intervalbrake-temperture
— computes moving average of truck's brake temperature in 10s intervalbrake-logs
— prints the truck datathumbinator
— a task/batch-job that can create thumbnails from images
The demo relies on the Spring Cloud Data Flow's Bitnami chart. Students will have to run the scripts/deploy-scdf.sh
script that is available at jbogie-vmware/SCDF-ConfluentOperator-Demo.
To get started with the demo, first, you will have to prepare the environment.
Test your environment by running the
scripts/test-env.sh
script — see example below.
❯ ./test-env.sh
helm is ready
kubectl is not ready
k9s is ready
java is ready
mvn is ready
Log into your TKG cluster and ensure you have the proper context selected with kubectl
.
❯ tkgi login -a api.pks.foo.bar.com -u admin -k
Password: ********************************
API Endpoint: api.pks.foo.bar.com
User: admin
Login successful.
❯ kctx
cotkgi
It is now time to deploy SCDF! To get started, you will run the scripts/deploy-scdf.sh
script.
[ec2-user@ip-172-31-19-111 ~]$ sh SpringOne2020/scripts/deploy-scdf.sh
Hang tight while we grab the latest from your chart repositories...
...Successfully got an update from the "bitnami" chart repository
Update Complete. ⎈ Happy Helming!⎈
Error from server (NotFound): namespaces "monitoring" not found
A namespace called monitoring for prometheus should exist, creating it
A namespace called monitoring exists in the cluster
Error: release: not found
Install bitnami/prometheus-operator prometheus_release_name=prom prometheus_namespace=monitoring
....
....
....
This command will take ~5-6mins to finish. Behind the scenes, the script uses Bitnami's Prometheus and Grafana operator to provision the monitoring stack. With that foundation up and running, the script runs SCDF's Bitnami chart to deploy the following.
- MariaDB
- Apache Kafka + Zookeeper
- Spring Cloud Data Flow
- Spring Cloud Skipper
While all this is starting, open a terminal session in a new tab and run the k9s
command to verify the current deployment status.
[ec2-user@ip-172-31-19-111 ~]$ k9s
When the script completes successfully, you will see the following output.
[ec2-user@ip-172-31-19-111 ~]$ sh SpringOne2020/scripts/deploy-scdf.sh
Hang tight while we grab the latest from your chart repositories...
...Successfully got an update from the "bitnami" chart repository
...
...
...
### Stack succesfully deployed ###
Connect to Data Flow
$ helm status scdf
Grafana password
$ kubectl -n monitoring get secret graf-grafana-admin -o jsonpath={.data.GF_SECURITY_ADMIN_PASSWORD} | base64 --decode
Forward grafana
$ kubectl port-forward -n monitoring svc/graf-grafana 3000:3000
Likewise, you can verify what version of the Spring Cloud Data Flow's Bitnami helm chart is currently in use.
[ec2-user@ip-100 SpringOne2020]$ helm list
NAME NAMESPACE REVISION UPDATED STATUS CHART APP VERSION
scdf default 1 2020-08-20 21:19:45.728756477 +0000 UTC deployed spring-cloud-dataflow-0.6.1 2.6.0
Before we start the next lab, though, let's verify that SCDF is up and running. We will apply port-forwarding rules for the following pods so that we can test the newly deployed components.
- SCDF: click
shift+f
on thescdf-spring-cloud-dataflow-server-***
pod to open the port-forward window ink9s
; change the "Address" to0.0.0.0
. - Grafana: click
shift+f
on thegraf-grafana-b6bf96c9c-***
pod to open the port-forward window ink9s
; change the "Address" to0.0.0.0
. - Prometheus: click
shift+f
on theprometheus-prom-prometheus-operator-prometheus-**
pod to open the port-forward window ink9s
; change the "Address" to0.0.0.0
.
Now that the stack is ready and the port-forwarding rules are active, we can access SCDF's dashboard by clicking the "SCDF Dashboard" tab in the Strigo platform. If the page doesn't load, hit the "refresh page" button inside the tab.
Alternatively, you can review SCDF's Shell access. To do that, switch to the terminal tab, and run the following.
[ec2-user@ip-172-31-19-111 ~]$ cd scdf-shell
[ec2-user@ip-172-31-19-111 scdf-shell]$ java -jar spring-cloud-dataflow-shell-2.6.0.jar --dataflow.uri=http://localhost:8080
This command might take a few seconds to start. Hang tight. You will see the
dataflow:>
prompt.
Congrats! You have completed lab #1. 🙂
This lab will review how to build and deploy event-streaming applications using Spring Cloud Data Flow. Let's begin with reviewing the applications we will use in this lab.
To open VS Code, from the terminal window, you will have to start the code-server
process inside the Strigo lab VM.
[ec2-user@ip-172-31-19-111 ]$ cd code
[ec2-user@ip-172-31-19-111 code]$ bin/code-server &
[1] 103335
[ec2-user@ip-172-31-19-111 code]$ info Using config file ~/.config/code-server/config.yaml
info Using user-data-dir ~/.local/share/code-server
info code-server 3.4.1 48f7c2724827e526eeaa6c2c151c520f48a61259
info HTTP server listening on http://0.0.0.0:1111
info - No authentication
info - Not serving HTTPS
The
code-server
process takes a ~5-10 seconds to start. You can verify whether the port1111
is running withsudo lsof -i -P -n | grep 1111
command in the terminal.
When the code-server
process is running, click the "IDE" tab in Strigo to open VS Code. Go to home/ec2-user/SpringOne2020
folder to review the applications we will be using for the labs.
If the window doesn't load, please click the "refresh page" button inside the "IDE" tab.
Now that we have had a look at the applications, let's build them.
[ec2-user@ip-172-31-19-111 ~]$ cd SpringOne2020
[ec2-user@ip-172-31-19-111 SpringOne2020]$ git pull
Already up to date.
[ec2-user@ip-172-31-19-111 SpringOne2020]$ ls -ltr
total 28
-rw-rw-r-- 1 ec2-user ec2-user 6608 Aug 12 20:17 mvnw.cmd
-rwxrwxr-x 1 ec2-user ec2-user 10070 Aug 12 20:17 mvnw
drwxrwxr-x 2 ec2-user ec2-user 47 Aug 14 20:16 scripts
-rw-rw-r-- 1 ec2-user ec2-user 3520 Aug 17 16:18 README.md
-rw-rw-r-- 1 ec2-user ec2-user 2609 Aug 17 19:02 pom.xml
drwxrwxr-x 5 ec2-user ec2-user 104 Aug 17 19:06 trucks
drwxrwxr-x 5 ec2-user ec2-user 104 Aug 17 19:06 brake-temperature
drwxrwxr-x 5 ec2-user ec2-user 104 Aug 17 19:06 brake-logs
drwxrwxr-x 5 ec2-user ec2-user 104 Aug 17 19:07 thumbinator
The build attempts to create container images for these applications using the jib
plugin.
To configure your local environment to re-use the Docker daemon running inside the Minikube instance, we will run the eval $(minikube docker-env)
before building the code and generating the docker images.
[ec2-user@ip-172-31-19-111 SpringOne2020]$ eval $(minikube docker-env)
Run the Maven build and generate Docker images using mvn clean install com.google.cloud.tools:jib-maven-plugin:dockerBuild -DskipTests
command.
[ec2-user@ip-172-31-19-111 SpringOne2020]$ mvn clean install com.google.cloud.tools:jib-maven-plugin:dockerBuild -DskipTests
[INFO] Scanning for projects...
[INFO] ------------------------------------------------------------------------
[INFO] Reactor Build Order:
[INFO]
[INFO] labs [pom]
[INFO] trucks [jar]
[INFO] brake-temperature [jar]
[INFO] brake-logs [jar]
[INFO] thumbinator [jar]
...
...
[INFO] Container entrypoint set to [java, -cp, /app/resources:/app/classes:/app/libs/*, com.springone.trucks.TrucksApplication]
[INFO]
[INFO] Built image to Docker daemon as dev.local/trucks, dev.local/trucks:0.0.1-SNAPSHOT
[INFO] Executing tasks:
[INFO] [==============================] 100.0% complete
...
...
[INFO] ------------------------------------------------------------------------
[INFO] Reactor Summary for labs 0.0.1-SNAPSHOT:
[INFO]
[INFO] labs ............................................... SUCCESS [ 6.388 s]
[INFO] trucks ............................................. SUCCESS [ 42.855 s]
[INFO] brake-temperature .................................. SUCCESS [ 10.615 s]
[INFO] brake-logs ......................................... SUCCESS [ 4.851 s]
[INFO] thumbinator ........................................ SUCCESS [ 5.697 s]
[INFO] ------------------------------------------------------------------------
[INFO] BUILD SUCCESS
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 01:12 min
[INFO] Finished at: 2020-08-19T18:36:29Z
[INFO] ------------------------------------------------------------------------
Verify application images in the local registry under the dev.local
repository.
[ec2-user@ip-172-31-19-111 SpringOne2020]$ docker images | grep dev.local
dev.local/thumbinator 0.0.1-SNAPSHOT a2f62650b367 50 years ago 233MB
dev.local/thumbinator latest a2f62650b367 50 years ago 233MB
dev.local/brake-logs 0.0.1-SNAPSHOT e914bf6237ab 50 years ago 250MB
dev.local/brake-logs latest e914bf6237ab 50 years ago 250MB
dev.local/trucks 0.0.1-SNAPSHOT f1da7c4eb7aa 50 years ago 250MB
dev.local/trucks latest f1da7c4eb7aa 50 years ago 250MB
dev.local/brake-temperature 0.0.1-SNAPSHOT 61e84d293eb9 50 years ago 264MB
dev.local/brake-temperature latest 61e84d293eb9 50 years ago 264MB
If you haven't already prepared the environment, please review Lab-1-Prepare-Environment.
Secondly, this lab assumes that you have locally built the applications and that the container images are available in the Docker daemon running inside the single-node Minikube cluster.
Use-case: Imagine there are 100s of freight-trucks on the road and that you're interested in finding out the fleet's performance in real-time. To narrow it down further, imagine if you want to understand the truck's peak performance, given the current load it is carrying. A vital factor to consider is the brake condition, which would have significant wear and tear depending on the freight. It could even be dangerous if it goes unnoticed.
Given that background, we will deploy a streaming data pipeline in SCDF with three applications.
trucks
— generates truck data in a random intervalbrake-temperture
— computes moving average of a truck's brake temperature in 10s intervalbrake-logs
— logs the output in real-time
Applications rely-on and use Spring Cloud Stream's Apache Kafka binder implementation. However, the real-time computations inside the
brake-temperature
application uses the Spring Cloud Stream's Kafka Streams binder.
The source code for trucks
, brake-temperature
, and brake-logs
are under the SpringOne2020
directory. You can open the code in IDE as described at Explore-Application-Code section.
Given the lab's time constraints, we will attempt only to build and deploy the applications instead of extending or customizing the behavior. If you manage to run the lab quickly, feel free to crack at any customizations, and redeploy as you find appropriate.
-
Let's open SCDF and register the three new applications that are already available in the Docker registry.
The coordinates for the 3 applications are:
source.trucks=docker:dev.local/trucks:0.0.1-SNAPSHOT processor.brake-temperature=docker:dev.local/brake-temperature:0.0.1-SNAPSHOT sink.brake-log=docker:dev.local/brake-logs:0.0.1-SNAPSHOT
If in case you aren't able to locally build the applications, you can register the applications from Docker Hub.
source.trucks=docker:sabby/trucks:0.0.1-SNAPSHOT processor.brake-temperature=docker:sabby/brake-temperature:0.0.1-SNAPSHOT sink.brake-log=docker:sabby/brake-logs:0.0.1-SNAPSHOT
Navigate to "Application(s)" section and select "Bulk import application" as the option. On the right-frame, copy+paste the three application coordinates to import the applications to SCDF's application registry.
-
Now that we have the applications registered, it is time to create and deploy a stream.
Click the "Streams" link from the left-navigation and open the "Create Stream(s)" page. Copy the following streaming DSL command into the DSL text area in the dashboard. Alternatively, you can drag +drop the apps in the canvas and interactively configure the desired properties.
truck-performance = trucks --spring.cloud.stream.function.bindings.generateTruck-out-0=output | brake-temperature --spring.cloud.stream.function.bindings.processBrakeTemperature-in-0=input --spring.cloud.stream.function.bindings.processBrakeTemperature-out-0=output | brake-log --spring.cloud.stream.function.bindings.log-in-0=input
In case you're wondering about
--spring.cloud.stream.function...
in-line properties, and what they mean, you can learn more about Spring Cloud Stream's function bindings and the naming conventions from the reference guide.Click "Create Stream(s)" button and deploy the
truck-performance
stream from the list page.SCDF is now in the process of programmatically creating the Kubernetes deployment manifests for the three applications and deploying them onto Kubernetes. You can switch to the "Terminal" tab and review the new pods from the
k9s
output.The applications take ~1-2 mins to start and for the liveness/readiness probes to be running correctly.
-
Let's review the results by tailing the logs of
truck-performance-brake-log-v1-***
application. Alternatively, you can open the logs from SCDF's dashboard, too.You will notice the computed moving-average in the logs, in real-time. See below an example that includes the average brake temperature for the truck with the ID="JH4KA8170MC002642".
{ "average": 14.967257499694824, "count": 3, "end": 1597875620000, "id": "JH4KA8170MC002642", "start": 1597875610000, "totalValue": 44.90177 }
Now that the real-time streaming data pipeline is running in Kubernetes, we will next review the steps to monitor the streaming applications using Prometheus and Grafana.
All the heavy-lifting of configuring Prometheus, Grafana, and preparing the applications for metrics scrapping is already handled automatically by SCDF.
Click the Grafana Dashboard
tab to navigate to the Grafana GUI. For the login, you need to find the admin password from the Kubernetes secrets. To retrieve and decode the password, run the following in the terminal window.
kubectl -n monitoring get secret graf-grafana-admin -o jsonpath={.data.GF_SECURITY_ADMIN_PASSWORD} | base64 --decode
Example:
[ec2-user@ip-172-31-30-179 ~]$ kubectl -n monitoring get secret graf-grafana-admin -o jsonpath={.data.GF_SECURITY_ADMIN_PASSWORD} | base64 --decode
MaHOlr9Vxv
In my case, I am using admin/MaHOlr9Vxv
as the credentials to log-in to Grafana dashboard.
The user is
admin
, but the password would be a randomly generated token for every student. You will have to run the abovekubectl
command to retrieve it. The password will be different for every student.
The SCDF-specific metrics dashboards are preloaded in the Grafanaa service running in your Kubernetes cluster. First time when you login, you will have to go to the "Manage" section to find the dashboards.
Click the "Applications" dashboard to view the real-time performance of the streaming applications.
That's it! You have completed lab #2. 💥 🚀
If you haven't already prepared the environment, please review Lab-1-Prepare-Environment.
This lab assumes that you have already built the applications and that the container images are available in the Docker daemon running inside the single-node Minikube cluster.
This lab aims to highlight how short-lived and ephemeral-style batch applications can be orchestrated, scheduled, and monitored using Spring Cloud Data Flow.
To demonstrate the features, we will build a Task application (thumbinator
) that includes two batch-jobs internally.
The first job includes 3-steps to simulate extract, transform, and the loading of data — an ETL job.
@Bean
public Job extractImage() {
// extract an image
}
@Bean
public Job transformImage() {
// create a thumbnail for the image
}
@Bean
public Job loadImage() {
// load the thumbnail to a different directory
}
The second job is going to query and print the result.
@Bean
public Job statusImage() {
// print the size of the original and the thumbnail images
}
To keep it simple and to repeat the lab easily, this workshop includes multiple jobs inside the same application. However, you can choose to create new Task applications for each of the jobs instead, which would allow you to evolve them with bug-fixes and improvements independently, so you can continuously deliver them.
In your Strigo VM, you can follow the Explore-Application-Code steps to review the code of thumbinator
application.
Click the "SCDF Dashboard" tab in Strigo to launch SCDF's dashboard.
-
First, you will have to register the locally built application in SCDF's application registry.
The Docker coordinate for the
thumbinator
applications is:task.thumbinator=docker:dev.local/thumbinator:0.0.1-SNAPSHOT
If in case you aren't able to locally build the application, you can register the application from Docker Hub.
task.thumbinator=docker:sabby/thumbinator:0.0.1-SNAPSHOT
-
Open "Tasks" -> "Create Task(s)"
Give a name to the task definition and create the task.
-
Let's launch the task. Click the "play" button from the task list page to manually launch the task with the default parameters and arguments.
Look for a newly launched task pod in the
k9s
terminal. -
Verify the results by tailing the logs of the task pod running in Kubernetes. Switch to
k9s
terminal and look for the "pod" with the name that you assigned to the task. You will find the results from both the jobs in the log.
In SCDF, there's an out-of-the-box option to schedule task/batch-jobs to launch at a recurring cadence. To do that, SCDF builds on the primitives of cronjob
spec in Kubernetes.
From the task list page, select the dropdown to choose "Schedule Task".
Give your schedule a name and schedule the task to launch every minute with the */1 * * * *
cron-expression.
Switch back to k9s
terminal and verify the scheduled creation of pods automatically launching once every minute.
Alternatively, you can also use the kubectl
command to query the cronjob
and job
resources that are running in the Kubernetes cluster.
[ec2-user@ip-100 ~]$ kubectl get cronjob
NAME SCHEDULE SUSPEND ACTIVE LAST SCHEDULE AGE
sch-thumbnails */1 * * * * False 1 54s 4m27s
[ec2-user@ip-100 ~]$ kubectl get job
NAME COMPLETIONS DURATION AGE
sch-thumbnails-1597961580 1/1 39s 3m57s
sch-thumbnails-1597961640 1/1 49s 2m57s
sch-thumbnails-1597961700 1/1 40s 117s
sch-thumbnails-1597961760 0/1 57s 57s
Let's also verify the task execution details from SCDF's dashboard.
Similar to the monitoring steps discussed in the event-streaming lab, the task monitoring with Prometheus and Grafana is preloaded, and the metrics dashboard is prepared to monitor the tasks running in the Kubernetes cluster.
Go to "Grafana Dashboard" -> "Tasks".
Kudos to you for completing lab-3! 🏆 😄
- Source code is at: sabbyanandan/SpringOne2020
- Slides: SpeakerDeck
:::info