-
Couldn't load subscription status.
- Fork 10
External Tables
We will support creation of Spark external tables that are defined in the spark grammar as:
| createTableHeader ('(' colTypeList ')')? tableProvider
createTableClauses
(AS? query)? #createTable
- We will focus on
parquetprovider. We may certify fororc,avroandcsv. - These kinds of Spark DDLs will be translated to setup Oracle External Tables.
Creation of Hive external tables in Spark DDL is routed to the Session Catalog; so we will not support the translation of these. In Spark grammar these are of the form:
| createTableHeader ('(' columns=colTypeList ')')?
(commentSpec |
(PARTITIONED BY '(' partitionColumns=colTypeList ')' |
PARTITIONED BY partitionColumnNames=identifierList) |
bucketSpec |
skewSpec |
rowFormat |
createFileFormat |
locationSpec |
(TBLPROPERTIES tableProps=tablePropertyList))*
(AS? query)? #createHiveTable
Something like this:
create table t2(id long, p string)
using parquet
location "https://objectstorage.us-ashburn-1.oraclecloud.com/n/idlxex3qf8sf/b/SparkTest/o/t1.parquet"
We will translate the Spark DDL specification into an equivalent DBMS_CLOUD.CREATE_EXTERNAL_TABLE invocation:
- The CatalogPlugin must be configured with the
credential_nameto use with object_store- TBDL parameter_name and example.
- The location parameter must be provided; it will be interpreted as an 'Oracle Cloud Infrastructure Object Storage Native URI Format'.
- The
file_uri_listwill be populated based on the contents of the folder.- Contents are inferred using:
-
dbms_cloudfunctionslist_objectorlist_files - Alternatively if we make Catalog in Spark aware of object store credentials, this can be done using oci-hdfs functionality. This is probably preferred TBD
-
- Contents are inferred using:
- The
column_listparameter populated from schema specified in Spark DDL. - The
formatparameter populated from the provided set in Spark DDL.
Something like this:
create table t2(id long, p string)
using parquet
partitioned by (p)
location "https://objectstorage.us-ashburn-1.oraclecloud.com/n/idlxex3qf8sf/b/SparkTest/o/t2/"
We will translate the Spark DDL specification into an equivalent DBMS_CLOUD.CREATE_EXTERNAL_PART_TABLE invocation:
-
partitioning_clauseconstructed by introspecting object store location- Using
dbms_cloudfunctionslist_objectorlist_filesor oci-hdfs - oci-hdfs will probably be preferred TBD
- Using
Once set up in Oracle dictionary, no special action is needed in normal cases. Spark Queries involving only external tables can be set up with physical plans that directly read data using oci-hdfs. TBD when and if we do this.
- Doesn't seem to be a way to write parquet/orc/avro from Oracle.
- So write part of the plan would have to be done in Spark
- Move input data to Spark: initially to Spark on Oracle; later on Spark in Oracle.
- Have Spark tasks write oci files. This implies for write support Catalog must be configured with access to object store.
- Quick Start
- Latest Demo
- Configuration
- Catalog
- Translation
- Query Splitting details
- DML Operations
- Language Integration
- Dockerized Demo env.
- Sharded Database
- Developer Notes