This project is a demo of Hazelcast Jet, a data streaming engine based on Hazelcast IMDG.
It displays the position of public transports in the Bay Area in near real-time.
Note
|
It used to showcase Switzerland’s public transports. Unfortunately, the Swiss data provider doesn’t provide the GTFS-RT feed anymore. |
The technology stack consists of:
-
Kotlin for the code
-
Spring Boot for the webapp
-
Maven for the build system
The project contains several modules with dedicated responsibilities:
Name | Description |
---|---|
|
Code shared across modules |
|
Contain the static data files, as well as configuration files for Docker Compose and Kubernetes |
|
As an alternative to the previous module, start a local Jet instance to be able to debug inside the IDE |
|
Load GTFS-RT static data from files in memory. Those files contain reference data that are used later to enrich the data pipeline |
|
Call an OpenData endpoint to get dynamic data, transform it, enrich it, and store it into an IMDG map |
|
Subscribe to the aforementioned IMDG map and publish changes to a web-socket endpoint. The UI subscribes to the endpoint and displays each data point on an Open Street Map. |
The data provider releases data compliant with the General Transport Feed Specification (by Google).
Two types of data are available:
-
Static files that contain reference data that don’t change often e.g. schedules, stops, etc.
-
A REST endpoint serves dynamic data e.g. vehicle positions
The demo is based on data provided by 511 SF Bay’s Open Data Portal.
Every day, new reference data (e.g. expected stop times) are published.
Hence, the infrastructure project that contains said data needs to be updated with new files.
Note that only 4 files are required for the demo: agency.txt
, routes.txt
, stops.txt
and trips.txt
.
GTFS Feed Download allows the user to download a zip file containing GTFS dataset for the specified operator/agency. It also contains additional files, called the GTFS+ files, that provide information not contained in the GTFS files such as the direction names, fare zone names, etc.
Allowable parameters:
api_key
(mandatory),operator_id
(mandatory), andhistoric
(optional)
Calling the endpoint requires an API key.
-
First, register
-
You’ll receive a confirmation email
-
When you’ve confirmed the email, you’ll receive a new email with the token
-
The token should be used as an argument when launching the
com.hazelcast.jettrain.data.MainKt
class from thestream-dynamic
module:java com.hazelcast.jettrain.data.MainKt $TOKEN
Note
|
There’s a rate limiter on the server side: the endpoint returns a 429 status if it’s queried more than 60 times per hour. In order to not go over this limit too soon, the Jet job is configured to run only once per 31 seconds. |
If you’re a Java developer, this approach will be fastest as you probably have all the tools ready.
-
Git (with LFS extension installed - on Ubuntu it’s not installed by default)
-
A Java IDE e.g. IntelliJ IDEA, Eclipse, etc.
-
Clone the repo
-
Import the code into your IDE
-
In the
local-jet
module, run thecom.hazelcast.jettrain.LocalJet.kt
class inside the IDE with the following parameters:-Xmx8g \ (1) -XX:+UseStringDeduplication \ (2) --add-modules java.se \ (3) --add-exports java.base/jdk.internal.ref=ALL-UNNAMED \ (3) --add-opens java.base/java.lang=ALL-UNNAMED \ (3) --add-opens java.base/java.nio=ALL-UNNAMED \ (3) --add-opens java.base/sun.nio.ch=ALL-UNNAMED \ (3) --add-opens java.management/sun.management=ALL-UNNAMED \ (3) --add-opens jdk.management/com.sun.management.internal=ALL-UNNAMED (3)
-
Reserve extra memory
-
Improve memory efficiency when storing strings
-
Necessary when working with Java 9+
-
-
To import static data files, run the
MainKt
class from inside theload-static
module:java -Ddata.path=/path/to/local/infrastructure/data com.hazelcast.jettrain.refs.MainKt
-
To query dynamic data, run the
MainKt
class from inside thestream-dynamic
module:java -Dtoken=$YOUR_511_TOKEN com.hazelcast.jettrain.data.MainKt
In the
web
module:java com.hazelcast.jettrain.JetDemoKt
The webapp is available at http://localhost:8080.
With this setup, you’ll build the demo from source.
-
Start Docker
-
Get the webapp image:
docker pull nfrankel/jettrain:latest
-
Adapt the
docker-compose.yml
file to your file hierarchy. I found no way to use relative files path in Docker Compose (hints/PRs welcome). You need to update the file to use the correct paths. Look for paths starting with/Users/nico/projects/hazelcast/
and update accordingly. -
Start the containers: In the
infrastructure/compose
folder :docker-compose up
-
Get the latest "static" JAR
-
To load static data:
In the Hazelcast Jet distribution folder, run the following commands:
./jet submit -n Agencies -v -c com.hazelcast.jettrain.refs.Agencies $PROJECT_ROOT/load-static/target/load-static-1.0-SNAPSHOT.jar ./jet submit -n Stops -v -c com.hazelcast.jettrain.refs.Stops $PROJECT_ROOT/load-static/target/load-static-1.0-SNAPSHOT.jar ./jet submit -n Routes -v -c com.hazelcast.jettrain.refs.Routes $PROJECT_ROOT/load-static/target/load-static-1.0-SNAPSHOT.jar ./jet submit -n Trips -v -c com.hazelcast.jettrain.refs.Trips $PROJECT_ROOT/load-static/target/load-static-1.0-SNAPSHOT.jar
-
Get the latest "dynamic" JAR
-
To query dynamic data:
In the Hazelcast Jet distribution folder, run the following command: