Skip to content

Latest commit

 

History

History
 
 

examples

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 
 
 
 
 
 
 
 
 

Run Spline examples

  1. Make sure the Spline Producer instance is running (see instructions)

  2. Download the Spline source code from GitHub and switch to the examples directory

    git clone [email protected]:AbsaOSS/spline-spark-agent.git
    cd spline-spark-agent
    mvn install -DskipTests
    cd examples

Python (PySpark)

  1. Execute pyspark with a Spline Spark Agent Bundle corresponding to the Spark and Scala versions in use:

      pyspark \
        --packages za.co.absa.spline.agent.spark:spark-3.1-spline-agent-bundle_2.12:0.6.1 \
        --conf spark.sql.queryExecutionListeners=za.co.absa.spline.harvester.listener.SplineQueryExecutionListener \
        --conf spark.spline.producer.url=http://localhost:8080/producer

    In this example we used a so-called codeless initialization method, e.i. the one that requires no changes in your Spark application code.

    Alternatively you can enable Spline manually by calling the SparkLineageInitializer.enableLineageTracking() method. See python_example.py

  2. Execute your PySpark code as normal.

Spark Shell

Same as pyspark example above, but use spark-shell command instead.

Scala / Java

To run all available examples

mvn test -P examples

To run examples with the specific Spark 2.x version (i.e. 2.2, 2.3, 2.4)

mvn test -P examples -P spark-2.4

To run examples with the specific Spark 3.x version (i.e. 3.0, 3.1 or newer)

# switch the project to Scala 2.12 mode
mvn scala-cross-build:change-version -Pscala-2.12
# then run Maven with the `-Pspark-xxx` argument as above 
mvn test -P examples -P spark-3.1

To run a selected example job (e.g. Example1Job)

mvn test -P examples -D exampleClass=za.co.absa.spline.example.batch.Example1Job

To change the Spline Producer URL (default is http://localhost:8080/producer)

mvn test -P examples -D spline.producer.url=http://localhost:8888/producer

To change the Spline Mode

mvn test -P examples -D spline.mode=ENABLED

Examples source code

Run Spline examples using docker image

Recommended docker settings: cpu=2, memory=4096M

docker run --rm -e "SPLINE_PRODUCER_URL=http://localhost:8080/producer" absaoss/spline-spark-agent

Available environment variables:

Variable name Description
SPLINE_PRODUCER_URL Spline Producer REST API endpoint URL
SPLINE_MODE (see Spline mode)
DISABLE_SSL_VALIDATION If true, disables validation of the server SSL certificate in the HttpLineageDispatcher
HTTP_PROXY_HOST (see Java Networking and Proxies)
HTTP_PROXY_PORT (see Java Networking and Proxies)
HTTP_NON_PROXY_HOSTS (see Java Networking and Proxies)

(The default values can be seen in the respective Dockerfile)


Copyright 2019 ABSA Group Limited

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.