Skip to content

Latest commit

 

History

History
18 lines (13 loc) · 1.54 KB

README.md

File metadata and controls

18 lines (13 loc) · 1.54 KB

Spark NullIntolerant Operators may cause schema mismatches in ABRiS

This repo is to demonstrate how NullIntolerant expressions make a previously nullable expression non-nullable after query optimization. The nullability of the expression is only changed when the query is evaluated. This means that printSchema can still show nullable=true, even if the column will later be non-nullable.

This is problematic for ABRiS because it relies on the AvroSerializer which relies on the nullability information, which is lazily evaluated, i.e. after query optimization.

Therefore, an innocent looking === fails the query in this example:

      inputDf.filter(col("value1") === lit(42)) // causes IncompatibleSchemaException later on

because === extends the NullIntolerant trait and makes value1 non-nullable after optimization. To retain the nullability, the eqNullSafe operator can be used.