Skip to content

0.12.5 (2024-12-03)

Latest
Compare
Choose a tag to compare
@github-actions github-actions released this 03 Dec 09:32
· 14 commits to develop since this release
57754e4

Improvements

  • Use sipHash64 instead of md5 in Clickhouse for reading data with {"partitioning_mode": "hash"}, as it is 5 times faster.
  • Use hashtext instead of md5 in Postgres for reading data with {"partitioning_mode": "hash"}, as it is 3-5 times faster.
  • Use BINARY_CHECKSUM instead of HASHBYTES in MSSQL for reading data with {"partitioning_mode": "hash"}, as it is 5 times faster.

Big fixes

  • In JDBC sources wrap MOD(partitionColumn, numPartitions) with ABS(...) to make al returned values positive. This prevents data skew.
  • Fix reading table data from MSSQL using {"partitioning_mode": "hash"} with partitionColumn of integer type.
  • Fix reading table data from Postgres using {"partitioning_mode": "hash"} lead to data skew (all the data was read into one Spark partition).