External Tables

We will support creation of Spark external tables that are defined in the spark grammar as:

    | createTableHeader ('(' colTypeList ')')? tableProvider
        createTableClauses
        (AS? query)?                                                   #createTable

We will focus on parquet provider. We may certify for orc, avro and csv.
These kinds of Spark DDLs will be translated to setup Oracle External Tables.

Creation of Hive external tables in Spark DDL is routed to the Session Catalog; so we will not support the translation of these. In Spark grammar these are of the form:

    | createTableHeader ('(' columns=colTypeList ')')?
        (commentSpec |
        (PARTITIONED BY '(' partitionColumns=colTypeList ')' |
        PARTITIONED BY partitionColumnNames=identifierList) |
        bucketSpec |
        skewSpec |
        rowFormat |
        createFileFormat |
        locationSpec |
        (TBLPROPERTIES tableProps=tablePropertyList))*
        (AS? query)?                                                   #createHiveTable

Creation of non-partitioned tables

Something like this:

create table t2(id long, p string)
using parquet
location "https://objectstorage.us-ashburn-1.oraclecloud.com/n/idlxex3qf8sf/b/SparkTest/o/t1.parquet"

We will translate the Spark DDL specification into an equivalent DBMS_CLOUD.CREATE_EXTERNAL_TABLE invocation:

The CatalogPlugin must be configured with the credential_name to use with object_store
- TBDL parameter_name and example.
The location parameter must be provided; it will be interpreted as an 'Oracle Cloud Infrastructure Object Storage Native URI Format'.
The file_uri_list will be populated based on the contents of the folder.
- Contents are inferred using:
  - dbms_cloud functions list_object or list_files
  - Alternatively if we make Catalog in Spark aware of object store credentials, this can be done using oci-hdfs functionality. This is probably preferred TBD
The column_list parameter populated from schema specified in Spark DDL.
The format parameter populated from the provided set in Spark DDL.

Creation of Partitioned tables

Something like this:

create table t2(id long, p string)
using parquet
partitioned by (p)
location "https://objectstorage.us-ashburn-1.oraclecloud.com/n/idlxex3qf8sf/b/SparkTest/o/t2/"

We will translate the Spark DDL specification into an equivalent DBMS_CLOUD.CREATE_EXTERNAL_PART_TABLE invocation:

partitioning_clause constructed by introspecting object store location
- Using dbms_cloud functions list_object or list_files or oci-hdfs
- oci-hdfs will probably be preferred TBD

Scanning External Table

Once set up in Oracle dictionary, no special action is needed in normal cases. Spark Queries involving only external tables can be set up with physical plans that directly read data using oci-hdfs. TBD when and if we do this.

Writing External Tables

Doesn't seem to be a way to write parquet/orc/avro from Oracle.
So write part of the plan would have to be done in Spark
- Move input data to Spark: initially to Spark on Oracle; later on Spark in Oracle.
- Have Spark tasks write oci files. This implies for write support Catalog must be configured with access to object store.

Quick Start
Latest Demo
Configuration
Catalog
Translation
Query Splitting details
DML Operations
Language Integration
- Spark SQL Macros
- Spark SQL Macro examples
Dockerized Demo env.
Sharded Database
- TPCH queries
Developer Notes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

External Tables

Creation of non-partitioned tables

Creation of Partitioned tables

Scanning External Table

Writing External Tables

Clone this wiki locally