-
Notifications
You must be signed in to change notification settings - Fork 13
Read Path Flow
Harish Butani edited this page Nov 11, 2020
·
4 revisions
This package contains the structures and functions for read plans.
Read Path Planning flow:
Read Execution:
- responsible for setting up an OraScan
- for an OracleTable with optional filter pushdowns and
requiredSchema
it sets up a OraPlan, that is passed to the OraScan.- it is not required for the OraPlan to apply all filters, as these are applied on top of the org.apache.spark.sql.execution.datasources.v2.DataSourceV2ScanRelation Ensuring these can be pushed to Oracle will be done in the Oracle pushdown rules.
OraScan :
- acts like a FileScan, so the org.apache.spark.sql.execution.datasources.PruneFileSourcePartitions rule can apply on this scan, and partition and data filter expressions can be pushed to it.
- but implementation behavior is completely overridden.
- it has an empty
fileIndex
- it reports
partitionFilters
anddataFilters
to be empty. The filters pushed into the OraPlan are reapplied on top of the org.apache.spark.sql.execution.datasources.v2.DataSourceV2ScanRelation - For physical planning:
- it uses OraQuerySplitting to infer how to parallelize the OraPlan
- each OracleDBSplit has an enhanced OraPlan.
- An OraPartition is setup for each Split with its oracle query, bind values and preferred locations.
- stats estimation: try to use a table's stats otherwise estimate as unknown
- Quick Start
- Latest Demo
- Configuration
- Catalog
- Translation
- Query Splitting details
- DML Operations
- Language Integration
- Dockerized Demo env.
- Sharded Database
- Developer Notes