Skip to content

Conversation

gustavodemorais
Copy link
Contributor

What is the purpose of the change

The planner currently considers the union of both the unique and upsert keys from the left and from the right to be a valid resulting upsert key. That's true for inner joins but for left/right/full joins that leads to a resulting unique key that contains columns that can be null, which is not valid.

Brief change log

  • Check for null generating columns when creating superset of unique keys

Verifying this change

  • Adjusted existing tests

Does this pull request potentially affect one of the following parts:

  • Dependencies (does it add or upgrade a dependency): (no)
  • The public API, i.e., is any changed class annotated with @Public(Evolving): (no)
  • The serializers: (no)
  • The runtime per-record code paths (performance sensitive): (no)
  • Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Kubernetes/Yarn, ZooKeeper: (no)
  • The S3 file system connector: (no)

Documentation

  • Does this pull request introduce a new feature? (no)
  • If yes, how is the feature documented? (not applicable)

@flinkbot
Copy link
Collaborator

flinkbot commented Oct 8, 2025

CI report:

Bot commands The @flinkbot bot supports the following commands:
  • @flinkbot run azure re-run the last Azure build

assertEquals(
toBitSet(Array(1), Array(1, 5), Array(1, 5, 6)),
mq.getUpsertKeys(logicalLeftJoinOnUniqueKeys).toSet)
assertEquals(toBitSet(Array(1)), mq.getUpsertKeys(logicalLeftJoinOnUniqueKeys).toSet)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Am i missing something, I do not see tests for nulls on each side?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The above is the test: 5, and 6 were columns coming from the right side of the join. So they can't be included in unique keys adn that's why "Array(1, 5), Array(1, 5, 6)" were remove as possible unique keys.

The same for all the other tests that were updated, they test the new behaviour

@gustavodemorais
Copy link
Contributor Author

CI is failling on unrelated e2e tests issue happening across PRs

@xuyangzhong
Copy link
Contributor

Hi, @gustavodemorais. I'm wondering why we can't have null columns as part of unique keys? From what I see in the SQL standard, it seems that unique keys can contain null columns.
Here is the specific description from the SQL 2016 standard:

A unique constraint that does not include a <without overlap specification> on a table T is satisfied if and only
if there do not exist two rows R1 and R2 of T such that R1 and R2 have the same non-null values in the unique
columns. If a unique constraint UC on a table T includes a <without overlap specification>WOS, then let ATPN
be the <application time period name> contained in WOS. UC is satisfied if and only if there do not exist two
rows R1 and R2 of T such that R1 and R2 have the same non-null values in the unique columns and the ATPN
period values of R1 and R2 overlap. In addition, if the unique constraint was defined with PRIMARY KEY,
then it requires that none of the values in the specified column or columns be a null value.

@github-actions github-actions bot added the community-reviewed PR has been reviewed by the community. label Oct 9, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

community-reviewed PR has been reviewed by the community.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants