This repo is to demonstrate how NullIntolerant
expressions make a previously nullable expression non-nullable after query optimization. The nullability of the expression is only changed when the query is evaluated. This means that printSchema
can still show nullable=true
, even if the column will later be non-nullable.
This is problematic for ABRiS because it relies on the AvroSerializer which relies on the nullability information, which is lazily evaluated, i.e. after query optimization.
Therefore, an innocent looking ===
fails the query in this example:
inputDf.filter(col("value1") === lit(42)) // causes IncompatibleSchemaException later on
because ===
extends the NullIntolerant
trait and makes value1
non-nullable after optimization. To retain the nullability, the eqNullSafe
operator can be used.
- See the Actions tab for test output
- Note that this particular problem does not occur anymore in Spark 3.0, thanks to https://issues.apache.org/jira/browse/SPARK-27838)