-
Notifications
You must be signed in to change notification settings - Fork 13
External Tables
Harish Butani edited this page Jan 13, 2022
·
2 revisions
We will support creation of Spark external tables that are defined in the spark grammar as:
| createTableHeader ('(' colTypeList ')')? tableProvider
createTableClauses
(AS? query)? #createTable
- We will focus on
parquet
provider. We may certify fororc
,avro
andcsv
. - These kinds of Spark DDLs will be translated to setup Oracle External Tables.
Creation of Hive external tables in Spark DDL is routed to the Session Catalog; so we will not support the translation of these. In Spark grammar these are of the form:
| createTableHeader ('(' columns=colTypeList ')')?
(commentSpec |
(PARTITIONED BY '(' partitionColumns=colTypeList ')' |
PARTITIONED BY partitionColumnNames=identifierList) |
bucketSpec |
skewSpec |
rowFormat |
createFileFormat |
locationSpec |
(TBLPROPERTIES tableProps=tablePropertyList))*
(AS? query)? #createHiveTable
Something like this:
create table t2(id long, p string)
using parquet
location "https://objectstorage.us-ashburn-1.oraclecloud.com/n/idlxex3qf8sf/b/SparkTest/o/t1.parquet"
We will translate the Spark DDL specification into an equivalent DBMS_CLOUD.CREATE_EXTERNAL_TABLE
invocation:
- The CatalogPlugin must be configured with the
credential_name
to use with object_store- TBDL parameter_name and example.
- The location parameter must be provided; it will be interpreted as an 'Oracle Cloud Infrastructure Object Storage Native URI Format'.
- The
file_uri_list
will be populated based on the contents of the folder.- Contents are inferred using:
-
dbms_cloud
functionslist_object
orlist_files
- Alternatively if we make Catalog in Spark aware of object store credentials, this can be done using oci-hdfs functionality. This is probably preferred TBD
-
- Contents are inferred using:
- The
column_list
parameter populated from schema specified in Spark DDL. - The
format
parameter populated from the provided set in Spark DDL.
Something like this:
create table t2(id long, p string)
using parquet
partitioned by (p)
location "https://objectstorage.us-ashburn-1.oraclecloud.com/n/idlxex3qf8sf/b/SparkTest/o/t2/"
We will translate the Spark DDL specification into an equivalent DBMS_CLOUD.CREATE_EXTERNAL_PART_TABLE
invocation:
-
partitioning_clause
constructed by introspecting object store location- Using
dbms_cloud
functionslist_object
orlist_files
or oci-hdfs - oci-hdfs will probably be preferred TBD
- Using
Once set up in Oracle dictionary, no special action is needed in normal cases. Spark Queries involving only external tables can be set up with physical plans that directly read data using oci-hdfs. TBD when and if we do this.
- Doesn't seem to be a way to write parquet/orc/avro from Oracle.
- So write part of the plan would have to be done in Spark
- Move input data to Spark: initially to Spark on Oracle; later on Spark in Oracle.
- Have Spark tasks write oci files. This implies for write support Catalog must be configured with access to object store.
- Quick Start
- Latest Demo
- Configuration
- Catalog
- Translation
- Query Splitting details
- DML Operations
- Language Integration
- Dockerized Demo env.
- Sharded Database
- Developer Notes