Write Path Flow

There are 2 write paths:

Write from a Spark native plan to an OracleTable
Write from a DataSourceV2ScanRelation of a OraScan

Write from a Spark Native Plan

These will be Spark plans with the sink of the write being an OracleTable. Spark optimizer will set up such a plan in all cases.

The execution of such a plan is divided into three stages: driver-side Init, executor-side task execution and driver-side job finishing. There are three kinds of writes done on the Oracle side to the destination table: APPEND rows, UPDATE rows (done as delete existing rows + insert new rows) and partition exchange. The assumption is that the oracle JDBC connection configured in Spark has the privilege to create tables.

Driver Side Setup flow:

write path driver setup

Executor Side task execution:

write path executor

Driver Side Finish flow:

write path driver finish

The one logical optimization we may do in the future is to inject a shuffle when writing to a partitioned table. This will ensure each task writes to few oracle partitions.
A physical rewrite rule we plan to add is to check if the input plan of such plan is a DataSourceV2ScanExecBase on an OraScan we will rewrite this into the second case.

Quick Start
Latest Demo
Configuration
Catalog
Translation
Query Splitting details
DML Operations
Language Integration
- Spark SQL Macros
- Spark SQL Macro examples
Dockerized Demo env.
Sharded Database
- TPCH queries
Developer Notes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Write Path Flow

Write from a Spark Native Plan

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Clone this wiki locally