Skip to content

Conversation

juntaozhang
Copy link
Contributor

Purpose

Spark data evolution table can appear inconsistent before and after compaction. Example:

CREATE TABLE s (id INT, b INT);
INSERT INTO s VALUES (1, 11), (2, 22);

CREATE TABLE t (id INT, b INT, c INT) TBLPROPERTIES ('row-tracking.enabled' = 'true', 'data-evolution.enabled' = 'true');
INSERT INTO t VALUES (2, 2, 2), (3, 3, 3);
MERGE INTO t
USING s
ON t.id = s.id
WHEN MATCHED THEN UPDATE SET t.b = s.b
WHEN NOT MATCHED THEN INSERT (id, b, c) VALUES (id, b, 0);
select *, _ROW_ID, _SEQUENCE_NUMBER from t order by _ROW_ID asc;
CALL sys.compact(table => 't');
select *, _ROW_ID, _SEQUENCE_NUMBER from t order by _ROW_ID asc;

before compaction:

+----+----+---+---------+------------------+
| id |  b | c | _ROW_ID | _SEQUENCE_NUMBER |
+----+----+---+---------+------------------+
|  2 | 22 | 2 |       0 |                2 |
|  3 |  3 | 3 |       1 |                2 |
|  1 | 11 | 0 |       2 |                2 |
+----+----+---+---------+------------------+

after compaction:

+--------+----+--------+---------+------------------+
|     id |  b |      c | _ROW_ID | _SEQUENCE_NUMBER |
+--------+----+--------+---------+------------------+
| <NULL> | 22 | <NULL> |       0 |                2 |
|      2 |  2 |      2 |       0 |                1 |
| <NULL> |  3 | <NULL> |       1 |                2 |
|      3 |  3 |      3 |       1 |                1 |
|      1 | 11 |      0 |       2 |                2 |
+--------+----+--------+---------+------------------+

Disable compaction in Spark to align with Flink behavior (#6112).

Tests

API and Format

Documentation

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant