-
Notifications
You must be signed in to change notification settings - Fork 3
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Repartition by event name before writing to the lake
Previously, our Iceberg writer was using the [hash write distribution mode][1] because that is the default for Iceberg. In this mode, Spark repartitions by the dataframe immediately before writing to the lake. After this commit, we explicitly repartition the dataframe as part of the existing spark task for preparing the final dataframe. This means we can change the Iceberg write distribution mode to `none`. We partition by the combination `event_name + event_id`: the former because it matches the lake partitioning, and the latter because it adds salt ensures equally sized partitions. Overall this seems to improve the time taken to write a window of events to Iceberg. This fixes a problem we found, in which the write phase could get too slow when under high load (Iceberg only): specifically, a write was taking longer than the loader's "window" and this caused periods of low cpu usage, where the loader's processing phase was waiting for the write phase to catch up. Note: this improvement will not help Snowplow users who have changed the parition key to something different to our default. We might want to make a follow-up change, in which it auto-discovers the lake's partition key. For example, some users might want to partition by `app_id` instead of `event_name`. [1]: https://iceberg.apache.org/docs/1.7.1/spark-writes/#writing-distribution-modes
- Loading branch information
Showing
11 changed files
with
99 additions
and
90 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,8 @@ | ||
<?xml version="1.0"?> | ||
<allocations> | ||
<pool name="pool1"> | ||
<schedulingMode>FIFO</schedulingMode> | ||
<weight>1000</weight> | ||
<minShare>1</minShare> | ||
</pool> | ||
</allocations> |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters