From f612b44279b1a718e7d0ffb02e0d266efa3087a3 Mon Sep 17 00:00:00 2001 From: Theofilos Kakantousis Date: Mon, 23 Apr 2018 17:35:18 +0200 Subject: [PATCH] remove spark readme from tensorflow --- tensorflow/README.md | 44 -------------------------------------------- 1 file changed, 44 deletions(-) delete mode 100644 tensorflow/README.md diff --git a/tensorflow/README.md b/tensorflow/README.md deleted file mode 100644 index 73cc2e07..00000000 --- a/tensorflow/README.md +++ /dev/null @@ -1,44 +0,0 @@ -# Spark & Kafka -To help you get started, *StreamingExample* provides the code for a basic streaming Spark application. HopsWorks makes use of the latest Spark-Kafka [API](http://spark.apache.org/docs/latest/streaming-kafka-0-10-integration.html). To run the example you need to provide the following parameters when creating a Spark job for Kafka in HopsWorks: -``` -Usage: (producer|consumer) [] -``` -* **type**: Defines if the the job is producing/consuming to/from Kafka. -* **sink**: Used only by a Consumer job, it defines the path to the Dataset or folder to which the Spark job appends its streaming output. The latter contain the consumed Avro records from Kafka. The name of the folder is suffixed with the YARN applicationId to deferantiate between multiple jobs writing to the same Dataset. In this example, the sink file contains data from the latest microbatch. The default microbatch period is set to two(2) seconds. - -**MainClass** is io.hops.examples.spark.kafka.StreamingExample - -**Topics** are provided via the HopsWorks Job UI. User checks the *Kafka* box and selects the topics from the drop-down menu. When consuming from multiple topics using a single Spark directStream, all topics must use the same Avro schema. Create a new directStream for topic(s) that use different Avro schemas. - -**Consumer groups** are an advanced option for consumer jobs. A default one is set by HopsWorks and a user can add further ones via the Jobservice UI. - -## Example: -**Producer** - -``` -producer - -``` - -**Consumer** -``` -consumer /Projects/KafkaProject/Resources/Data -``` - -## Avro Records -This example produces String pairs which are converted by HopsWorks **KafkaUtil** into Avro records and serialized into bytes. Similarly, during consuming from a Spark directStream, messages are deserialized into Avro records. **The Avro schema used in this example is the following**: - -``` -{ - "fields": [ - { "name": "platform", "type": "string" }, - { "name": "program", "type": "string" } - ], - "name": "myrecord", - "type": "record" -} -``` - -## Libraries - -*StreamingExample* makes use of the Hops API available [here](https://github.com/hopshadoop/hops-util). This library is automatically provided by HopsWorks with every Job/Notebook. If the user wants to implement a custom functionality, thens it must be added the job when creating it in HopsWorks.