Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Spark] Add dataframe reader options to unblock non-additive schema changes #4126

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

johanl-db
Copy link
Collaborator

Description

Non-additive schema changes - DROP/RENAME and, since https://github.com/databricks-eng/runtime/pull/124363 , type changes - in streaming block the stream until the user sets a SQL conf to unblock them:

spark.databricks.delta.streaming.allowSourceColumnRename
spark.databricks.delta.streaming.allowSourceColumnDrop
spark.databricks.delta.streaming.allowSourceColumnTypeChange

This change adds dataframe reader options as an alternative to SQL confs to unblock non-additive schema changes:

spark.readStream
  .option("allowSourceColumnRename", "true")
  .option("allowSourceColumnDrop", "true")
  .option("allowSourceColumnTypeChange", "true")

How was this patch tested?

Extended existing tests in DeltaSourceMetadataEvolutionSupportSuite to also cover dataframe reader options.

This PR introduces the following user-facing changes

The error thrown on non-additive schema changes during streaming is updated to suggest dataframe reader options in addition to SQL confs to unblock the stream:

[DELTA_STREAMING_CANNOT_CONTINUE_PROCESSING_POST_SCHEMA_EVOLUTION]
We've detected one or more non-additive schema change(s) (DROP) between Delta version 1 and 2 in the Delta streaming source.
Please check if you want to manually propagate the schema change(s) to the sink table before we proceed with stream processing using the finalized schema at version 2.
Once you have fixed the schema of the sink table or have decided there is no need to fix, you can set the following configuration(s) to unblock the non-additive schema change(s) and continue stream processing.

<NEW>
Using dataframe reader option(s):
  .option("allowSourceColumnDrop", "true")
<NEW>

Using SQL configuration(s):
To unblock for this particular stream just for this series of schema change(s):
  SET spark.databricks.delta.streaming.allowSourceColumnDrop.ckpt_123456 = 2;
To unblock for this particular stream:
  SET spark.databricks.delta.streaming.allowSourceColumnDrop.ckpt_123456 = "always";
To unblock for all streams:
  SET spark.databricks.delta.streaming.allowSourceColumnDrop= "always";

The user can use the available reader option to unblock a given type of non-additive schema change:

spark.readStream
  .option("allowSourceColumnRename", "true")
  .option("allowSourceColumnDrop", "true")
  .option("allowSourceColumnTypeChange", "true")

@johanl-db johanl-db changed the title Add dataframe reader options to unblock non-additive schema changes [Spark] Add dataframe reader options to unblock non-additive schema changes Feb 6, 2025
@johanl-db johanl-db force-pushed the streaming-non-additive-reader-option branch from ed23bd6 to 5bdabda Compare February 6, 2025 10:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant