Dockerized_Demo

The following procedure describes how to bring up a Spark Oracle demo environment in a docker container. The prerequisite is that you have docker setup on your host machine. This has only be tested on Macs(it will probably work on a linux host).

This is purely for demo purposes. We are not supporting the docker env

Dockerbuilder tool

The dockerbuilder sub-project a utility to construct a Dockerfile for spark-on-oracle. In an empty folder unzip the dokcer builder artifact. You should also copy the spark-oracle-0.1.0-SNAPSHOT.zip to this folder. Then run: ./sparkOraDockerBuilder-0.1.0-SNAPSHOT without any options to see what is required. (you may need to run the command outside of vpn for the spark and zeppelin download url checks to work.)

Usage: sparkOraDockerBuilder [options]

  -m, --spark_mem <value>  memory in Mb/Gb for spark; for example 512m or 2g.
                           when running the tpcds demo set it to 4g
  -c, --spark_cores <value>
                           num_cores for spark.
                           when running the tpcds demo set it to at-least 4
  -j, --oracle_instance_jdbc_url <value>
                           jdbc connection information for the oracle instance.
                           for example: "jdbc:oracle:thin:@10.89.206.230:1531/cdb1_pdb7.regress.rdbms.dev.us.oracle.com

                           specify the ip-addr of host; otherwise you may need
                           to edit the /etc/resolv.conf of the docker container."
  -u, --oracle_instance_username <value>
                           Oracle username to connect the oracle instance
  -p, --oracle_instance_password <value>
                           Oracle password for the oracle user.
                           Either provide the password or location of a wallet.
  -w, --oracle_instance_wallet_loc <value>
                           Oracle password for the oracle user.
                           Either provide the password or location of a wallet.
  -s, --spark_download_url <value>
                           url to download apache spark. Spark version must be 3.1.0 or above.
                           for example: https://downloads.apache.org/spark/spark-3.1.1/spark-3.1.1-bin-hadoop3.2.tgz
  -z, --zeppelin_download_url <value>
                           url to download apache zeppelin. Spark version must be 0.9.0 or above.
                           for example: https://downloads.apache.org/zeppelin/zeppelin-0.9.0/zeppelin-0.9.0-bin-netinst.tgz
  -o, --spark_ora_zip <value>
                           location of spark-oracle package.
                           for example: ~/Downloads/spark-oracle-0.1.0-SNAPSHOT.zip

Provide the specified options:

for example, if you want to use our dev. env run as:

./sparkOraDockerBuilder-0.1.0-SNAPSHOT -c 4 -m 4g \
  -j jdbc:oracle:thin:@10.89.206.230:1531/cdb1_pdb7.regress.rdbms.dev.us.oracle.com \ 
  -u tpcds -p tpcds \
  -s https://downloads.apache.org/spark/spark-3.1.1/spark-3.1.1-bin-hadoop3.2.tgz \
  -z https://mirrors.ocf.berkeley.edu/apache/zeppelin/zeppelin-0.9.0/zeppelin-0.9.0-bin-netinst.tgz \
  -o spark-oracle-0.1.0-SNAPSHOT.zip

This will create a Dockerfile and associated files for the specified options.

Notes on the Dockerfile and docker container:

for the oracle instance jdbc url specify an ip-addr instead of a hostname. If you specify a hostname you will have to edit the /etc/resolv.conf in the docker container. For example:

-- you may need to add to /etc/reolv.conf

search us.oracle.com
nameserver 2606:B400:300:D:FEED::1
nameserver 2606:B400:300:D:FEED::2
nameserver 206.223.27.1
nameserver 206.223.27.2

The docker container

is setup with Apache Spark, Spark-Oracle extension and Apache Zeppelin.
Currently we have not enabled Apache Zeppelin. We are working on notebooks for the Demo.
The default command for the container is spark-shell. You can follow the steps in the Demo
To build the docker image issue something like: docker image build -t spark_ora_demo:latest .
Then to run the container issue something like:

docker run -it -p 8080:8080 -p 4040:4040 --rm spark_ora_demo:latest

give the port options -p 8080:8080 -p 4040:4040 so you can see the Spark UI and when available the Zeppelin notebooks from a host browser.
The Dockerfile is setup with a CMD to start the spark-shell So you will be in the spark-shell when your terminal enters the container.
Once there you can follow the steps in the demo. Start by issuing sql("use oracle") and then follow the steps in the demo.
The container starts with an empty metadata-cache so you will notice that the first time you execute a query(even in pushdown=true mode) takes several seconds more than usual. This is because oracle table metadata(including partition information) is pulled into the metadata_cache on demand, the first time a query is issued against a table.

Quick Start
Latest Demo
Configuration
Catalog
Translation
Query Splitting details
DML Operations
Language Integration
- Spark SQL Macros
- Spark SQL Macro examples
Dockerized Demo env.
Sharded Database
- TPCH queries
Developer Notes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Dockerized_Demo

Dockerbuilder tool

The docker container

Clone this wiki locally