-
Make sure the Spline Producer instance is running (see instructions)
-
Download the Spline source code from GitHub and switch to the
examples
directorygit clone [email protected]:AbsaOSS/spline-spark-agent.git cd spline-spark-agent mvn install -DskipTests cd examples
-
Execute
pyspark
with a Spline Spark Agent Bundle corresponding to the Spark and Scala versions in use:pyspark \ --packages za.co.absa.spline.agent.spark:spark-3.1-spline-agent-bundle_2.12:0.6.1 \ --conf spark.sql.queryExecutionListeners=za.co.absa.spline.harvester.listener.SplineQueryExecutionListener \ --conf spark.spline.producer.url=http://localhost:8080/producer
In this example we used a so-called codeless initialization method, e.i. the one that requires no changes in your Spark application code.
Alternatively you can enable Spline manually by calling the
SparkLineageInitializer.enableLineageTracking()
method. See python_example.py -
Execute your PySpark code as normal.
Same as pyspark
example above, but use spark-shell
command instead.
To run all available examples
mvn test -P examples
To run examples with the specific Spark 2.x version (i.e. 2.2
, 2.3
, 2.4
)
mvn test -P examples -P spark-2.4
To run examples with the specific Spark 3.x version (i.e. 3.0
, 3.1
or newer)
# switch the project to Scala 2.12 mode
mvn scala-cross-build:change-version -Pscala-2.12
# then run Maven with the `-Pspark-xxx` argument as above
mvn test -P examples -P spark-3.1
To run a selected example job (e.g. Example1Job
)
mvn test -P examples -D exampleClass=za.co.absa.spline.example.batch.Example1Job
To change the Spline Producer URL (default is http://localhost:8080/producer)
mvn test -P examples -D spline.producer.url=http://localhost:8888/producer
To change the Spline Mode
mvn test -P examples -D spline.mode=ENABLED
- Scala
- Java
- Python
- Shell script - custom, non-Spark example, using REST API
Recommended docker settings: cpu=2
, memory=4096M
docker run --rm -e "SPLINE_PRODUCER_URL=http://localhost:8080/producer" absaoss/spline-spark-agent
Available environment variables:
Variable name | Description |
---|---|
SPLINE_PRODUCER_URL | Spline Producer REST API endpoint URL |
SPLINE_MODE | (see Spline mode) |
DISABLE_SSL_VALIDATION | If true , disables validation of the server SSL certificate in the HttpLineageDispatcher |
HTTP_PROXY_HOST | (see Java Networking and Proxies) |
HTTP_PROXY_PORT | (see Java Networking and Proxies) |
HTTP_NON_PROXY_HOSTS | (see Java Networking and Proxies) |
(The default values can be seen in the respective Dockerfile)
Copyright 2019 ABSA Group Limited
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.