Abrane is a big-data as a service cloud which provides you with the required platforms and infrastructure to ingest, store, and process your data.
You must connect to Abrane using a VPN connection. Create a VPN connection with the L2TP/IPSec as the protocol and provided credentials to "demo.abrane.ir". You can find the details here. To test your setup ping check-ping.your-domain.abrane.ir. For your convinice, please set your search domain to your-domain.abrane.ir on this connection. You must now be able to ping check-ping with success and open http://dashboard in your brower. In the reset of this guide we assume such a setup or else append your-domain.abrane.ir to the hostnames.
Note: After connecting to Abrane, your machine must be able to route 172.{16,17,18}.0.0/16 traffic through the VPN. If you have a different setup (e.g. a docker or vm bridge interface on these subnets) please correct your routing rules.
Now let's create a sample Spark application and deploy it to the Abrane. In this application we want to consume received messages from a Kafka topic and echo them to another Kafka topic.
- Input
To ingest data into your cluster you can use the provided Kafka service for streaming ingestion or use the provided HDFS service for batch ingestion. In this example we use Kafka for streaming ingestion. As described in the Kafka service section, you can work with the Kafka service using its REST api or directly connect your Kafka producer to the exposed broker. Let's do the direct approach. For this end, you need a kafka distribution with version above 0.10 downloaded on your machine. Sending messages to the cluster is then easy:
.<kafka-home>/bin/kafka-console-producer.sh --broker-list kafka-1:9092,kafka-2:9092 --topic input
Note that the Kafka service is configured to automatically create topics with the required partitions and replicas (due to this, in your first message you might see LEADER_NOT_AVAILABLE warning which is not important). The configured Kafka keeps your data at least 3 days reliably.
- Process
In the samples directory of this repo, there is a project named spark-stream-k2k which is a java-based Spark streaming application which consumes messages from a Kafka topic named "input" and produces the same message to a Kafka topic named "output". Now lets deploy and run this application on the cluster. First clone it:
git clone [email protected]:sahabpardaz/abrane.git
To build the project you need Maven >= 3 and JDK >= 8 installed on your machine.
cd abrane/samples/spark-stream-k2k
mvn clean package
The build process creates a jar named "sample-spark-stream-k2k-1.0-SNAPSHOT-jar-with-dependencies.jar" in the "target" directory. We must first upload it into the HDFS service. We use the provided webhdfs REST service for this. There is a script in the scripts directory which do this REST call.
./../../scripts/upload_file.sh "target/sample-spark-stream-k2k-1.0-SNAPSHOT-jar-with-dependencies.jar" "k2k-1.0.jar"
With our jar uploaded to HDFS with name "k2k_1.0.jar", we can run our Spark application using the provided Spark submission REST api. Again the submit_job.sh do the work for us.
./../../scripts/submit_job.sh "k2k-1.0.jar" "ir.sahab.abrane.sample.spark.k2k.Main"
As a shortuct of the above steps, you can call the run.sh script inside the spark-stream-k2k directory too which basically do the same steps.
In order to see your application metrics and logs, point your browser to http://spark-master/ and navigate to the Web UI of your application. You can also view your previouse application runs from http://spark-history/ address.
- Output
Again, you can use Kafka or HDFS to retrieve your data from the cluster. For instance To consume your Kafka topics directly:
.<kafka-home>/bin/kafka-console-consumer.sh --bootstrap-server kafka-1:9092,kafka-2:9092 --topic output
Now sending some input from your producer, you must see them reprinted in the consumer too.
The following services are currently provided in Abrane (you can view them in http://dashboard).
- Installed version: 0.10.2.2
- Brokers address: kafka-1:9092,kafka-2:9092
- Kafka manager: http://kafka-manager
- REST: http://kafka/api/v2 (Guide) (will be available soon)
- Installed version: 2.7.7
- Namenodes service address (inside of cluster): nn-cluster:8020
- Namenode UI: http://namenode
- REST: http://hdfs/api/v1 (Guide)
- Installed version: 2.3.2
- Master UI: http://spark-master/
- History Server UI: http://spark-history/
- REST: http://spark/api/v1 (Guide)
- Installed version: 0.8.0
- http://zeppelin
- Installed version: 6.2.4
- http://kibana
- Installed version: 3.4.14
- http://zabbix