Skip to content

Latest commit

 

History

History
134 lines (106 loc) · 4.49 KB

getting-started.md

File metadata and controls

134 lines (106 loc) · 4.49 KB

Build Gimel

How to Build using Maven

Clone git repository

git clone [email protected]:paypal/gimel.git
OR
git clone https://github.com/paypal/gimel.git
cd gimel

Run below command to build (-T 8 is to run 8 tasks in parallel; reduces the build time considerably)

Profile Command Notes
General mvn clean install -T 8 -B -Pgeneral Builds with all dependencies pulled from maven central
HWX releases mvn clean install -T 8 -B -Phwx-2.6.3.11-1 Builds with all dependencies pulled from horton works repo if available
Stand Alone mvn clean install -T 8 -B -Pstandalone Builds gimel with scala packages bundled in jar, used for standalone execution of gimel jar / polling services

Get Gimel

Components

Component Purpose
gimel-tools Tools (core + sql + runnables such as copyDataSet, etc)
gimel-sql SQL Support (core + Gimel-SQL Functionality)
gimel-core Core (Contains All Connectors, the Unified Data API)

Refer as Maven Dependency

Tools

    <dependency>
      <groupId>com.paypal.gimel</groupId>
      <artifactId>gimel-tools</artifactId> <!--Refer one of the below listed 3 versions, depending on the required spark version -->
      <version>1.2.0</version> <!--provides spark 2.2.0 compiled code-->
      <scope>provided</scope> <!--Ensure scope is provided as the gimel libraries will be added at runtime-->
    </dependency>

SQL

    <dependency>
      <groupId>com.paypal.gimel</groupId>
      <artifactId>gimel-sql</artifactId> <!--Refer one of the below listed 3 versions, depending on the required spark version -->
      <version>1.2.0</version> <!--provides spark 2.2.0 compiled code-->
      <scope>provided</scope> <!--Ensure scope is provided as the gimel libraries will be added at runtime-->
    </dependency>

Core

    <dependency>
      <groupId>com.paypal.gimel</groupId>
      <artifactId>gimel-core</artifactId> <!--Refer one of the below listed 3 versions, depending on the required spark version -->
      <version>1.2.0</version> <!--provides spark 2.2.0 compiled code-->
      <scope>provided</scope> <!--Ensure scope is provided as the gimel libraries will be added at runtime-->
    </dependency>

Data API Usage

Quick Starter for DataSet and DataStream APIs. Please refer individual storage system documentation for details.

import com.paypal.gimel._
import org.apache.spark.sql._
import scala.collection.immutable.Map

// Initiate DataSet
val sparkSession = SparkSession.builder().enableHiveSupport().getOrCreate()
val dataSet = DataSet(sparkSession)

// Read Data
val readOptions = Map[String,Any]()
val data1 : DataFrame = dataSet.read("pcatalog.table1",readOptions)
val data2 : DataFrame = dataSet.read("pcatalog.table2")

// Write Data
val writeOptions = Map[String,Any]()
dataSet.write("pcatalog.table3",data1,writeOptions)
dataSet.write("pcatalog.table4",data2)

// Initiate DataStream
val dataStream = DataStream(sparkSession)

// Get Reference to Stream
  val streamingResult: StreamingResult = dataStream.read(datasetName)
  // Clear CheckPoint if necessary
  streamingResult.clearCheckPoint("some message")
  // Helper for Clients
  streamingResult.dStream.foreachRDD { rdd =>
    val count = rdd.count()
    if (count > 0) {
      /**
        * Mandatory | Get Offset for Current Window, so we can checkpoint at the end of this window's operation
        */
      streamingResult.getCurrentCheckPoint(rdd)
      /**
        * Begin | User's Usecases
        */
      // dataSet.write("pcatalog.targetDataSet",derivedDF)
      streamingResult.saveCurrentCheckPoint()
    }
  }
  // Start the Context
  dataStream.streamingContext.start()
  dataStream.streamingContext.awaitTermination()

Gimel Modules (UML)

Below is the dependency graph of Gimel Modules.