-
Notifications
You must be signed in to change notification settings - Fork 10
Expression Translation
Currently only Oracle Catalog tables with the following Oracle datatypes are supported.
- the Catalyst DataType column lists the Catalyst datatype they are mapped to.
Oracle Type | Catalyst DataType |
---|---|
CHAR(n) | StringType |
VARCHAR(n) | StringType |
NCHAR(n) | StringType |
NVARCHAR(n) | StringType |
NUMBER(p), p<=2
|
ByteType |
NUMBER(p), p<=4
|
ShortType |
NUMBER(p), p<=9
|
IntegerType |
NUMBER(p), p<=18
|
LongType |
NUMBER(p), p>18
|
DecimalType(p, 0) |
NUMBER(p, s), s>p
|
Not supported |
NUMBER(p, s), s<0
|
Not supported |
NUMBER(p, s), p<38
|
DecimalType(p, s) |
NUMBER(p, s), p>=38
|
DecimalType(38, 18) |
FLOAT | broken, TODO |
LONG | StringType |
BINARY_FLOAT | FloatType |
BINARY_DOUBLE | DoubleType |
DATE | DateType |
DATE | DateType |
TIMESTAMP(s) | TimestampType |
TIMESTAMP WITH TZ | TODO |
TIMESTAMP WITh LOCAL TZ | TODO |
Spark Literal values are converted to oracle literals using the following rules. Based on oracle sql guide
-
String are wrapped in quotes(
'
) and single "'" are escaped with "''" -
Date and Timestamp values are converted to strings using the format
yyyy-MM-dd HH:mm:ss.SSSSSS
and set up as DATE/TIMESTAMP literals. For examleTIMESTAMP '2000-03-11 08:00:00.000000'
andTRUNC(TIMESTAMP '2000-03-11 08:00:00.000000')
- Other datatypes are converted to their string representation.
the following are currently translated: Cast, PromotePrecision, CheckOverflow
.
- for
PromotePrecision
just return child translation. - for
CheckOverflow
- if
nullOnOverflow
is true add a case check - if
nullOnOverflow
is false: do nothing? translated oExpr will throw
- if
The following helper SQL snippet generator functions are referenced in the following:
val epochTS = {
// osql"to_timestamp_tz('1970-01-01 00:00:00 00:00', 'YYYY-MM-DD HH24:MI:SS TZH:TZM')"
osql"from_tz(to_timestamp('1970-01-01', 'YYYY-MM-DD'), 'UTC')"
}
val epochDt = osql"date '1970-01-01'"
val epochTSAtSessionTZ = osql"to_timestamp('1970-01-01', 'YYYY-MM-DD')"
val true_bool_TS =
osql"from_tz(to_timestamp('1970-01-01', 'YYYY-MM-DD'), 'UTC') + interval '0.001' second(0,3)"
- if there is no
catalystExpr#zoneId
then generatecast(${oraE} as timestamp)
- else generate
cast(${oraE} as timestamp) at time zone ${zoneOE}
- if there is no
catalystExpr#zoneId
then- cast input to a timestamp and set its timezone; then cast the result to a date
- Translation expression:
cast(from_tz(cast({oraE} as timestamp), {zoneOE}) as date)
- Otherwise translation expression is
cast({oraE} as date)
Translation logic is:
millisToInterval = numtodsinterval({oraE}/1000, 'SECOND')
millisToIntervalWithTZOffset = {millisToInterval} + {epochTS} - {epochTSAtSessionTZ}
result = {epochTSAtSessionTZ} + ({millisToIntervalWitTZOffset})
For example for oraE = 1603425037802
, sql is:
to_timestamp('1970-01-01', 'YYYY-MM-DD') +
(numtodsinterval(1603425037802/1000, 'SECOND') +
from_tz(to_timestamp('1970-01-01', 'YYYY-MM-DD')), 'UTC') -
to_timestamp('1970-01-01', 'YYYY-MM-DD')
)
Translation logic is:
millisToInterval = numtodsinterval({oraE}/1000, 'SECOND')
millisToIntervalWithTZOffset = {millisToInterval} + {epochTS} - {epochTSAtSessionTZ}
epoch_ts = {epochTSAtSessionTZ} + {millisToIntervalWitTZOffset}
result = trunc({epoch_ts}, 'DD')
For example for oraE = 1603425037802
, sql is:
trunc(
to_timestamp('1970-01-01', 'YYYY-MM-DD') +
(numtodsinterval(1603425037802/1000, 'SECOND') +
from_tz(to_timestamp('1970-01-01', 'YYYY-MM-DD')), 'UTC') -
to_timestamp('1970-01-01', 'YYYY-MM-DD')
),
'DD'
)
Translation logic is:
// using ora date arithmetic: ora_ts - ora_ts -> ora_interval
days = extract(day from ({oraE} - {epochTS}))246060
hours = extract(hour from ({oraE} - {epochTS}))6060
mins = extract(minute from ({oraE} - {epochTS}))6060
secs = extract(second from ({oraE} - {epochTS}))6060
result = ({days} + {hours} + {mins} + {secs})1000
For example for oraE = systimestamp
, sql is:
extract(day from (systimestamp - from_tz(to_timestamp('1970-01-01', 'YYYY-MM-DD')), 'UTC')))246060 +
extract(hour from (systimestamp - from_tz(to_timestamp('1970-01-01', 'YYYY-MM-DD')), 'UTC')))6060 +
extract(minute from (systimestamp - from_tz(to_timestamp('1970-01-01', 'YYYY-MM-DD')), 'UTC')))60 +
extract(second from (systimestamp - from_tz(to_timestamp('1970-01-01', 'YYYY-MM-DD')), 'UTC')))
)1000
Translation logic is:
trunc_to_days = trunc(sysdate, 'DD')
// using ora date arithmetic: ora_date - ora_ts -> ora_interval
interval_from_epoch = trunc_to_days - epoch_ts
num_hours = extract(day from interval_from_epoch)24 +
extract(hour from interval_from_epoch)
result = num_hours60601000
For example, for sysdate:
(extract(day from(trunc(sysdate, 'DD') - from_tz(to_timestamp('1970-01-01', 'YYYY-MM-DD')), 'UTC')))24 +
extract(hour from(trunc(sysdate, 'DD') - from_tz(to_timestamp('1970-01-01', 'YYYY-MM-DD')), 'UTC')))
)60601000
for widening Cast: do nothing, return 'childOE'
for narrowing Cast: use the following sql expression template:
case when {childOE} > {toDT.MinValue} and {childOE} < {toDT.MaxValue}
then cast({childOE} as {toDT})
else null
end
from string:
-
to numeric:
- apply TO_NUMBER oracle function
- so sql template is
to_number({childOE})
-
to date:
- Spark uses
DateTimeUtils.stringToDate
; this tries a number of date formats - When translating we will use the default date format of the Oracle connection. From Oracle To_Date Oracle function: The default date format is determined implicitly by the NLS_TERRITORY initialization parameter or can be set explicitly by the NLS_DATE_FORMAT parameter.
- so translation is
to_date({childOE})
- Spark uses
-
to timestamp:
- Spark uses
org.joda.time.DateTimeUtils.stringToTimestamp
; this tries a number of date formats. - When translating we will use the default timestamp format of the Oracle connection. From Oracle To_Timestamp Oracle function The default format of the TIMESTAMP data type, which is determined by the NLS_TIMESTAMP_FORMAT initialization parameter.
- so translation is
to_timestamp({childOE})
- Spark uses
-
to boolean:
- Spark uses
-
StringUtils.isTrueString
to translate totrue
-
StringUtils.isFalseString
to translate tofalse
- else
null
-
- sql template:
(case when ${childOE} in ('t', 'true', 'y', 'yes', '1') then 1 when ${childOE} in ('f', 'false', 'n', 'no', '0') then 0 else null end) = 1
- So the below example shows Boolean translations:
-- This returns 1 row select 1 from dual where (case when't' in ('t', 'true', 'y', 'yes', '1') then 1 when 't' in ('f', 'false', 'n', 'no', '0') then 0 else null end) = 1; -- These return 0 rows: select 1 from dual where (case when'f' in ('t', 'true', 'y', 'yes', '1') then 1 when 'f' in ('f', 'false', 'n', 'no', '0') then 0 else null end) = 1; select 1 from dual where (case when'u' in ('t', 'true', 'y', 'yes', '1') then 1 when 'u' in ('f', 'false', 'n', 'no', '0') then 0 else null end) = 1;
- Spark uses
to string:
-
from numeric:
- Spark applies
UTF8String.fromString({childExpr.val}.toString)
- translate using TO_CHAR(number) Oracle function
- So the translation template is
to_char({childOE})
- For example
to_char(12345678912345.345678900000)
returns12345678912345.345678900000
- Spark applies
-
from date:
- Spark uses
DateFormatter
for thetimeZoneId
of thecastExpr
- The date pattern used is
defaultPattern: String = "yyyy-MM-dd"
- The date pattern used is
- Translate to SQL template using TO_CHAR(date) Oracle function.
- The template is
to_char({childOE})
. - This uses the default date format of the Oracle connection. This can be changed in Oracle by setting the 'NLS_TERRITORY' initialization parameter or can be set explicitly by the 'NLS_DATE_FORMAT' parameter.
- The template is
- Spark uses
-
from timestamp:
- Spark uses
FractionTimestampFormatter
for thetimeZoneId
of thecastExpr
- timestamp pattern used is
formatter = DateTimeFormatterHelper.fractionFormatter
Which parses/formats timestamps according to the patternyyyy-MM-dd HH:mm:ss.[..fff..]
- timestamp pattern used is
- translate to sql template using TO_CHAR(date) oracle function.
- The template is
to_char({childOE})
. - This uses the default date format of the Oracle connection. Which can be changed in Oracle by setting the 'NLS_TERRITORY' initialization parameter or can be set explicitly by the 'NLS_TIMESTAMP_FORMAT' parameter.
- For example:
to_char('22-OCT-20 01.16.32.740812 PM')
- The template is
- Spark uses
-
from boolean:
- use template:
case when {childOE} then 'true' else 'false' end
- use template:
from boolean:
-
to numeric:
{orE} != 0
-
to string: Same as
Boolean -> String
in StringCasting -
to date: Same as
Boolean -> Date
in DateCasting -
to timestamp: Same as
Boolean -> Timestamp
in TimestampCasting
to boolean:
-
from numeric:
{oraE}
-
from string: Same as
String -> Boolean
in StringCasting -
from date: Same as
Date -> Boolean
in DateCasting -
from timestamp: Same as
Timestamp -> Boolean
in TimestampCasting
from date:
-
to numeric:
- In Spark: num_of_days since epoch.
- translate to:
{oraE} - {epochDt}
. Based on Oracle's date arithmetic (oraDt - oraDt -> number
), this represents the number of days since start of epoch.
-
to string:
- Same as
Date -> String
in StringCasting
- Same as
-
to timestamp:
- In Spark:
DatetimeUtils.daysToMicros(d, zoneId)
- Converts days since
1970-01-01
at the given zone ID to microseconds since 1970-01-01 00:00:00Z.
- Converts days since
- Translate to:
cast({oraE} as timestamp)
with additionalat time zone {castE.timeZoneId}
. See dtToTimestamp() method.
- In Spark:
-
to boolean:
- In Spark:
null
- translate to:
null
- In Spark:
to date:
-
from numeric:
- In Spark it is undefined
- translate to:
{epochDt} + {oraE}
Based on oracle's date arithmetic this represents thedate
that is{oraE}
days from epoch.
-
from string:
- Same as
String -> Date
in StringCasting
- Same as
-
from timestamp:
- In Spark: convert timestamp at given time zone to date
- Translate to:
cast({oraE} as date)
; if{castE.timeZoneId}
is specified first convert to timestamp in timeZone. See timestampToDt()
-
from boolean:
- In Spark it is undefined
- We throw during translation.
from timestamp:
-
to numeric:
- In Spark: convert to
millis_since_epoch
- Translate to: see timestampToEpoch()
- In Spark: convert to
-
to string: Sames as
Timestamp -> String
in StringCasting -
to date: Same as
Timestamp -> Date
in DateCasting -
to boolean:
- In Spark:
millis_since_epoch != 0
Translate to:timestampToEpoch({oraE}) != 0
. See timestampToEpoch()
- In Spark:
to timestamp:
-
from numeric:
- In Spark it is undefined
- Translate to: See epochToTimestamp()
-
from string: same as
String -> Date
in StringCasting -
from date:
- In Spark: convert timestamp at given tz to date
- Translate to: See timestampToDt()
-
from boolean:
- In Spark:
true
is interpreted as1L millis_since_epoch
, andfalse
is0L millis_since_epoch
. - translate to:
case when {oraE} then ${true_bool_TS} else ${epochTS} end
- In Spark:
the following are currently translated:
UnaryMinus, UnaryPositive, Abs, +, -, *, /, Remainder, Pmod, Least and Greatest
-
AttributeReference
is translated to a column reference. -
Alias
is translated to an alias expression.
the following are currently translated:
If, and CaseWhen
-
IsNull and IsNotNull
are translated intois null
andis not null
oracle predicates -
IsNull SubqueryExpression
is translated intonot exists (subquery sql)
-
IsNotNull SubqueryExpression
is translated intoexists (subquery sql)
the following are currently translated:
Not, And, Or, In, InSet, EqualNullSafe, BinaryComparison
the following are currently translated:
ScalarSubquery, ListQuery
- If
spark.sql.caseSensitive=false
(the default setting)- Table, Schema and Column names are resolved in a case-insensitive manner.
- If
spark.sql.caseSensitive=true
- Table and Schema names are resolved in a case-sensitive manner; if resolution fails, a case-insensitive resolution is attempted.
- Column names are resolved in a case-sensitive manner.
- Quick Start
- Latest Demo
- Configuration
- Catalog
- Translation
- Query Splitting details
- DML Operations
- Language Integration
- Dockerized Demo env.
- Sharded Database
- Developer Notes