You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We experience unexplainable large delays on the commits from time to time.
The setup is using the kafka-connect connector. Tested both with the latest release (1.7.1) and with main. (1.8.0-SNAPSHOT)
These are the important settings from the connector config:
So we commit every 10 seconds and timeout after 20 seconds. This work well approximately 75% of the time. But occasionally (every 4-5 commits or so) we experience a delay that cannot be explained.
When this happens, a message is received. The connector goes through the commit logic and times out as if there is no message received. This happens a couple of times until it is finally committed (no other messages are received in this time period.
Here's an example:
These messages are from the input topic (The timestamp is in GMT+1)
This is the kafka offset and consumer offset for the source topic, showing the delay in offset commit (matches the iceberg commit)
This is the same for the iceberg connector control topic, which does not have any delays (so that's not the issue)
At 07:40:15 two messages are received. One of them is committed immediately after, but the other one is not. No more messages are received and the commit loop goes through several cyles (4 commits and timeouts), before it is actually committed at 07:41:44.
The logs can be seen below, as well as a filtered part of the log, which only shows when a commit interval starts, times out and completes. Log files are in GMT
Any idea what could be causing this? Another thing I notice, is that when no messages have been received since the last commit, it still has to wait for the timeout. This does not seem like the most feasible solution. Wouldn't it be better to track that and then simply return?
2024-12-16 06:40:07,031 INFO Sending event of type: START_COMMIT [org.apache.iceberg.connect.channel.Channel]
2024-12-16 06:40:07,056 INFO Commit e2f65021-851f-47a6-b5ca-463d54fccd85 initiated [org.apache.iceberg.connect.channel.Coordinator]
2024-12-16 06:40:15,146 INFO Refreshing table metadata from new version: abfss://[email protected]/iceberg/evaliceberg.db/kc_ingestion_testing_1000300255_long_ingest_delay/metadata/08740-ba0f43df-a77f-4e5f-a921-550595523a36.metadata.json [org.apache.iceberg.BaseMetastoreTableOperations]
2024-12-16 06:40:15,378 INFO Table loaded by catalog: iceberg.evaliceberg.kc_ingestion_testing_1000300255_long_ingest_delay [org.apache.iceberg.BaseMetastoreCatalog]
2024-12-16 06:40:15,390 INFO Saving 1 records. Last offset 151629 [org.apache.iceberg.connect.channel.CommitterImpl]
2024-12-16 06:40:15,429 INFO Sending event of type: DATA_WRITTEN [org.apache.iceberg.connect.channel.Channel]
2024-12-16 06:40:15,429 INFO Sending event of type: DATA_COMPLETE [org.apache.iceberg.connect.channel.Channel]
2024-12-16 06:40:15,438 INFO Handled event of type: START_COMMIT [org.apache.iceberg.connect.channel.Channel]
2024-12-16 06:40:15,438 INFO Sending event of type: DATA_COMPLETE [org.apache.iceberg.connect.channel.Channel]
2024-12-16 06:40:15,440 INFO Handled event of type: DATA_WRITTEN [org.apache.iceberg.connect.channel.Channel]
2024-12-16 06:40:15,440 INFO Commit e2f65021-851f-47a6-b5ca-463d54fccd85 not ready, received responses for 0 of 1 partitions, waiting for more [org.apache.iceberg.connect.channel.CommitState]
2024-12-16 06:40:15,440 INFO Handled event of type: DATA_COMPLETE [org.apache.iceberg.connect.channel.Channel]
2024-12-16 06:40:15,463 INFO Handled event of type: START_COMMIT [org.apache.iceberg.connect.channel.Channel]
2024-12-16 06:40:15,464 INFO Commit e2f65021-851f-47a6-b5ca-463d54fccd85 ready, received responses for all 1 partitions [org.apache.iceberg.connect.channel.CommitState]
2024-12-16 06:40:15,468 INFO Refreshing table metadata from new version: abfss://[email protected]/iceberg/evaliceberg.db/kc_ingestion_testing_1000300255_long_ingest_delay/metadata/08740-ba0f43df-a77f-4e5f-a921-550595523a36.metadata.json [org.apache.iceberg.BaseMetastoreTableOperations]
2024-12-16 06:40:15,473 INFO Refreshing table metadata from new version: abfss://[email protected]/iceberg/evaliceberg.db/kc_ingestion_testing_1000300255_long_ingest_delay/metadata/08740-ba0f43df-a77f-4e5f-a921-550595523a36.metadata.json [org.apache.iceberg.BaseMetastoreTableOperations]
2024-12-16 06:40:15,614 INFO Table loaded by catalog: iceberg.evaliceberg.kc_ingestion_testing_1000300255_long_ingest_delay [org.apache.iceberg.BaseMetastoreCatalog]
2024-12-16 06:40:15,650 INFO Table loaded by catalog: iceberg.evaliceberg.kc_ingestion_testing_1000300255_long_ingest_delay [org.apache.iceberg.BaseMetastoreCatalog]
2024-12-16 06:40:15,661 INFO Saving 1 records. Last offset 151630 [org.apache.iceberg.connect.channel.CommitterImpl]
2024-12-16 06:40:15,948 INFO Committed to table iceberg.evaliceberg.kc_ingestion_testing_1000300255_long_ingest_delay with the new metadata location abfss://[email protected]/iceberg/evaliceberg.db/kc_ingestion_testing_1000300255_long_ingest_delay/metadata/08741-8579a9ff-8d13-4089-abef-087b19d19f97.metadata.json [org.apache.iceberg.hive.HiveTableOperations]
2024-12-16 06:40:15,948 INFO Successfully committed to table iceberg.evaliceberg.kc_ingestion_testing_1000300255_long_ingest_delay in 206 ms [org.apache.iceberg.BaseMetastoreTableOperations]
2024-12-16 06:40:15,948 INFO Committed snapshot 409307506258743637 (MergeAppend) [org.apache.iceberg.SnapshotProducer]
2024-12-16 06:40:15,976 INFO Refreshing table metadata from new version: abfss://[email protected]/iceberg/evaliceberg.db/kc_ingestion_testing_1000300255_long_ingest_delay/metadata/08741-8579a9ff-8d13-4089-abef-087b19d19f97.metadata.json [org.apache.iceberg.BaseMetastoreTableOperations]
2024-12-16 06:40:16,147 INFO Received metrics report: CommitReport{tableName=iceberg.evaliceberg.kc_ingestion_testing_1000300255_long_ingest_delay, snapshotId=409307506258743637, sequenceNumber=8740, operation=append, commitMetrics=CommitMetricsResult{totalDuration=TimerResult{timeUnit=NANOSECONDS, totalDuration=PT0.357961945S, count=1}, attempts=CounterResult{unit=COUNT, value=1}, addedDataFiles=CounterResult{unit=COUNT, value=1}, removedDataFiles=null, totalDataFiles=CounterResult{unit=COUNT, value=8763}, addedDeleteFiles=null, addedEqualityDeleteFiles=null, addedPositionalDeleteFiles=null, addedDVs=null, removedDeleteFiles=null, removedEqualityDeleteFiles=null, removedPositionalDeleteFiles=null, removedDVs=null, totalDeleteFiles=CounterResult{unit=COUNT, value=0}, addedRecords=CounterResult{unit=COUNT, value=1}, removedRecords=null, totalRecords=CounterResult{unit=COUNT, value=80408}, addedFilesSizeInBytes=CounterResult{unit=BYTES, value=17446}, removedFilesSizeInBytes=null, totalFilesSizeInBytes=CounterResult{unit=BYTES, value=625872320}, addedPositionalDeletes=null, removedPositionalDeletes=null, totalPositionalDeletes=CounterResult{unit=COUNT, value=0}, addedEqualityDeletes=null, removedEqualityDeletes=null, totalEqualityDeletes=CounterResult{unit=COUNT, value=0}}, metadata={iceberg-version=Apache Iceberg 1.8.0-SNAPSHOT (commit ff813445916bfd6ec1cc30a02b02f8bade7a26f6)}} [org.apache.iceberg.metrics.LoggingMetricsReporter]
2024-12-16 06:40:16,147 INFO Sending event of type: COMMIT_TO_TABLE [org.apache.iceberg.connect.channel.Channel]
2024-12-16 06:40:16,153 INFO Commit complete to table evaliceberg.kc_ingestion_testing_1000300255_long_ingest_delay, snapshot 409307506258743637, commit ID e2f65021-851f-47a6-b5ca-463d54fccd85, valid-through null [org.apache.iceberg.connect.channel.Coordinator]
2024-12-16 06:40:16,162 INFO Sending event of type: COMMIT_COMPLETE [org.apache.iceberg.connect.channel.Channel]
2024-12-16 06:40:16,186 INFO Commit e2f65021-851f-47a6-b5ca-463d54fccd85 complete, committed to 1 table(s), valid-through null [org.apache.iceberg.connect.channel.Coordinator]
2024-12-16 06:40:16,187 INFO Handled event of type: DATA_COMPLETE [org.apache.iceberg.connect.channel.Channel]
2024-12-16 06:40:17,189 INFO Sending event of type: START_COMMIT [org.apache.iceberg.connect.channel.Channel]
2024-12-16 06:40:17,194 INFO Commit 931c9e74-e4c9-4549-81df-5f5e2ef9001c initiated [org.apache.iceberg.connect.channel.Coordinator]
2024-12-16 06:40:37,202 INFO Commit timeout reached. Commit ID: 931c9e74-e4c9-4549-81df-5f5e2ef9001c [org.apache.iceberg.connect.channel.CommitState]
2024-12-16 06:40:37,204 INFO Sending event of type: COMMIT_COMPLETE [org.apache.iceberg.connect.channel.Channel]
2024-12-16 06:40:37,209 INFO Commit 931c9e74-e4c9-4549-81df-5f5e2ef9001c complete, committed to 0 table(s), valid-through null [org.apache.iceberg.connect.channel.Coordinator]
2024-12-16 06:40:37,209 INFO Sending event of type: START_COMMIT [org.apache.iceberg.connect.channel.Channel]
2024-12-16 06:40:37,233 INFO Commit 44b08251-add2-4ccc-9b6a-2732fbe7677f initiated [org.apache.iceberg.connect.channel.Coordinator]
2024-12-16 06:40:56,818 INFO [AdminClient clientId=adminclient-30] Node 2 disconnected. [org.apache.kafka.clients.NetworkClient]
2024-12-16 06:40:56,834 INFO [AdminClient clientId=adminclient-31] Node 1 disconnected. [org.apache.kafka.clients.NetworkClient]
2024-12-16 06:40:57,243 INFO Commit timeout reached. Commit ID: 44b08251-add2-4ccc-9b6a-2732fbe7677f [org.apache.iceberg.connect.channel.CommitState]
2024-12-16 06:40:57,245 INFO Sending event of type: COMMIT_COMPLETE [org.apache.iceberg.connect.channel.Channel]
2024-12-16 06:40:57,250 INFO Commit 44b08251-add2-4ccc-9b6a-2732fbe7677f complete, committed to 0 table(s), valid-through null [org.apache.iceberg.connect.channel.Coordinator]
2024-12-16 06:40:57,250 INFO Sending event of type: START_COMMIT [org.apache.iceberg.connect.channel.Channel]
2024-12-16 06:40:57,274 INFO Commit e9000dcd-6e69-4ed0-add1-9dea50e95cba initiated [org.apache.iceberg.connect.channel.Coordinator]
2024-12-16 06:41:17,281 INFO Commit timeout reached. Commit ID: e9000dcd-6e69-4ed0-add1-9dea50e95cba [org.apache.iceberg.connect.channel.CommitState]
2024-12-16 06:41:17,284 INFO Sending event of type: COMMIT_COMPLETE [org.apache.iceberg.connect.channel.Channel]
2024-12-16 06:41:17,289 INFO Commit e9000dcd-6e69-4ed0-add1-9dea50e95cba complete, committed to 0 table(s), valid-through null [org.apache.iceberg.connect.channel.Coordinator]
2024-12-16 06:41:17,290 INFO Sending event of type: START_COMMIT [org.apache.iceberg.connect.channel.Channel]
2024-12-16 06:41:17,314 INFO Commit a1ab811b-0ce6-48b8-9bb4-80d2e8a084cf initiated [org.apache.iceberg.connect.channel.Coordinator]
2024-12-16 06:41:37,323 INFO Commit timeout reached. Commit ID: a1ab811b-0ce6-48b8-9bb4-80d2e8a084cf [org.apache.iceberg.connect.channel.CommitState]
2024-12-16 06:41:37,325 INFO Sending event of type: COMMIT_COMPLETE [org.apache.iceberg.connect.channel.Channel]
2024-12-16 06:41:37,329 INFO Commit a1ab811b-0ce6-48b8-9bb4-80d2e8a084cf complete, committed to 0 table(s), valid-through null [org.apache.iceberg.connect.channel.Coordinator]
2024-12-16 06:41:37,330 INFO Sending event of type: START_COMMIT [org.apache.iceberg.connect.channel.Channel]
2024-12-16 06:41:37,375 INFO Commit 14c96a1b-5a1a-4a66-ae2a-b227aeb61981 initiated [org.apache.iceberg.connect.channel.Coordinator]
2024-12-16 06:41:43,372 INFO Sending event of type: DATA_WRITTEN [org.apache.iceberg.connect.channel.Channel]
2024-12-16 06:41:43,372 INFO Sending event of type: DATA_COMPLETE [org.apache.iceberg.connect.channel.Channel]
2024-12-16 06:41:43,380 INFO Handled event of type: START_COMMIT [org.apache.iceberg.connect.channel.Channel]
2024-12-16 06:41:43,380 INFO Sending event of type: DATA_COMPLETE [org.apache.iceberg.connect.channel.Channel]
2024-12-16 06:41:43,382 INFO Handled event of type: DATA_WRITTEN [org.apache.iceberg.connect.channel.Channel]
2024-12-16 06:41:43,382 INFO Commit 14c96a1b-5a1a-4a66-ae2a-b227aeb61981 not ready, received responses for 0 of 1 partitions, waiting for more [org.apache.iceberg.connect.channel.CommitState]
2024-12-16 06:41:43,382 INFO Handled event of type: DATA_COMPLETE [org.apache.iceberg.connect.channel.Channel]
2024-12-16 06:41:43,426 INFO Handled event of type: START_COMMIT [org.apache.iceberg.connect.channel.Channel]
2024-12-16 06:41:43,426 INFO Sending event of type: DATA_COMPLETE [org.apache.iceberg.connect.channel.Channel]
2024-12-16 06:41:43,427 INFO Commit 14c96a1b-5a1a-4a66-ae2a-b227aeb61981 not ready, received responses for 0 of 1 partitions, waiting for more [org.apache.iceberg.connect.channel.CommitState]
2024-12-16 06:41:43,427 INFO Handled event of type: DATA_COMPLETE [org.apache.iceberg.connect.channel.Channel]
2024-12-16 06:41:43,472 INFO Handled event of type: START_COMMIT [org.apache.iceberg.connect.channel.Channel]
2024-12-16 06:41:43,472 INFO Sending event of type: DATA_COMPLETE [org.apache.iceberg.connect.channel.Channel]
2024-12-16 06:41:43,474 INFO Commit 14c96a1b-5a1a-4a66-ae2a-b227aeb61981 not ready, received responses for 0 of 1 partitions, waiting for more [org.apache.iceberg.connect.channel.CommitState]
2024-12-16 06:41:43,474 INFO Handled event of type: DATA_COMPLETE [org.apache.iceberg.connect.channel.Channel]
2024-12-16 06:41:43,498 INFO Handled event of type: START_COMMIT [org.apache.iceberg.connect.channel.Channel]
2024-12-16 06:41:43,498 INFO Sending event of type: DATA_COMPLETE [org.apache.iceberg.connect.channel.Channel]
2024-12-16 06:41:43,499 INFO Commit 14c96a1b-5a1a-4a66-ae2a-b227aeb61981 not ready, received responses for 0 of 1 partitions, waiting for more [org.apache.iceberg.connect.channel.CommitState]
2024-12-16 06:41:43,499 INFO Handled event of type: DATA_COMPLETE [org.apache.iceberg.connect.channel.Channel]
2024-12-16 06:41:43,523 INFO Handled event of type: START_COMMIT [org.apache.iceberg.connect.channel.Channel]
2024-12-16 06:41:43,525 INFO Commit 14c96a1b-5a1a-4a66-ae2a-b227aeb61981 ready, received responses for all 1 partitions [org.apache.iceberg.connect.channel.CommitState]
2024-12-16 06:41:43,532 INFO Refreshing table metadata from new version: abfss://[email protected]/iceberg/evaliceberg.db/kc_ingestion_testing_1000300255_long_ingest_delay/metadata/08741-8579a9ff-8d13-4089-abef-087b19d19f97.metadata.json [org.apache.iceberg.BaseMetastoreTableOperations]
2024-12-16 06:41:43,707 INFO Table loaded by catalog: iceberg.evaliceberg.kc_ingestion_testing_1000300255_long_ingest_delay [org.apache.iceberg.BaseMetastoreCatalog]
2024-12-16 06:41:44,094 INFO Committed to table iceberg.evaliceberg.kc_ingestion_testing_1000300255_long_ingest_delay with the new metadata location abfss://[email protected]/iceberg/evaliceberg.db/kc_ingestion_testing_1000300255_long_ingest_delay/metadata/08742-42fb5ca7-3714-40c7-b511-427dc6bf1612.metadata.json [org.apache.iceberg.hive.HiveTableOperations]
2024-12-16 06:41:44,094 INFO Successfully committed to table iceberg.evaliceberg.kc_ingestion_testing_1000300255_long_ingest_delay in 266 ms [org.apache.iceberg.BaseMetastoreTableOperations]
2024-12-16 06:41:44,094 INFO Committed snapshot 6751585210189296099 (MergeAppend) [org.apache.iceberg.SnapshotProducer]
2024-12-16 06:41:44,116 INFO Refreshing table metadata from new version: abfss://[email protected]/iceberg/evaliceberg.db/kc_ingestion_testing_1000300255_long_ingest_delay/metadata/08742-42fb5ca7-3714-40c7-b511-427dc6bf1612.metadata.json [org.apache.iceberg.BaseMetastoreTableOperations]
2024-12-16 06:41:44,261 INFO Received metrics report: CommitReport{tableName=iceberg.evaliceberg.kc_ingestion_testing_1000300255_long_ingest_delay, snapshotId=6751585210189296099, sequenceNumber=8741, operation=append, commitMetrics=CommitMetricsResult{totalDuration=TimerResult{timeUnit=NANOSECONDS, totalDuration=PT0.404934664S, count=1}, attempts=CounterResult{unit=COUNT, value=1}, addedDataFiles=CounterResult{unit=COUNT, value=1}, removedDataFiles=null, totalDataFiles=CounterResult{unit=COUNT, value=8764}, addedDeleteFiles=null, addedEqualityDeleteFiles=null, addedPositionalDeleteFiles=null, addedDVs=null, removedDeleteFiles=null, removedEqualityDeleteFiles=null, removedPositionalDeleteFiles=null, removedDVs=null, totalDeleteFiles=CounterResult{unit=COUNT, value=0}, addedRecords=CounterResult{unit=COUNT, value=1}, removedRecords=null, totalRecords=CounterResult{unit=COUNT, value=80409}, addedFilesSizeInBytes=CounterResult{unit=BYTES, value=29788}, removedFilesSizeInBytes=null, totalFilesSizeInBytes=CounterResult{unit=BYTES, value=625902108}, addedPositionalDeletes=null, removedPositionalDeletes=null, totalPositionalDeletes=CounterResult{unit=COUNT, value=0}, addedEqualityDeletes=null, removedEqualityDeletes=null, totalEqualityDeletes=CounterResult{unit=COUNT, value=0}}, metadata={iceberg-version=Apache Iceberg 1.8.0-SNAPSHOT (commit ff813445916bfd6ec1cc30a02b02f8bade7a26f6)}} [org.apache.iceberg.metrics.LoggingMetricsReporter]
2024-12-16 06:41:44,261 INFO Sending event of type: COMMIT_TO_TABLE [org.apache.iceberg.connect.channel.Channel]
2024-12-16 06:41:44,268 INFO Commit complete to table evaliceberg.kc_ingestion_testing_1000300255_long_ingest_delay, snapshot 6751585210189296099, commit ID 14c96a1b-5a1a-4a66-ae2a-b227aeb61981, valid-through null [org.apache.iceberg.connect.channel.Coordinator]
2024-12-16 06:41:44,278 INFO Sending event of type: COMMIT_COMPLETE [org.apache.iceberg.connect.channel.Channel]
2024-12-16 06:41:44,282 INFO Commit 14c96a1b-5a1a-4a66-ae2a-b227aeb61981 complete, committed to 1 table(s), valid-through null [org.apache.iceberg.connect.channel.Coordinator]
2024-12-16 06:41:44,283 INFO Handled event of type: DATA_COMPLETE [org.apache.iceberg.connect.channel.Channel]
Willingness to contribute
I can contribute a fix for this bug independently
I would be willing to contribute a fix for this bug with guidance from the Iceberg community
I cannot contribute a fix for this bug at this time
The text was updated successfully, but these errors were encountered:
Also added some debug logging inside the class org.apache.iceberg.connect.channel.CommitterImpl
@Overridepublicvoidsave(Collection<SinkRecord> sinkRecords) {
if (sinkRecords != null && !sinkRecords.isEmpty()) {
worker.save(sinkRecords);
LOG.info(
"Saving {} records. Last offset {}",
sinkRecords.size(),
sinkRecords.stream()
.reduce((first, second) -> second)
.map(SinkRecord::kafkaOffset)
.orElse(-1L));
}
processControlEvents();
}
Which produced the following logs for the above problem:
2024-12-16 06:40:15,390 INFO Saving 1 records. Last offset 151629 [org.apache.iceberg.connect.channel.CommitterImpl]
2024-12-16 06:40:15,661 INFO Saving 1 records. Last offset 151630 [org.apache.iceberg.connect.channel.CommitterImpl]
So the message is read from the source kafka topic when it should be.
Apache Iceberg version
1.7.1 (latest release)
Query engine
Kafka Connect
Please describe the bug 🐞
We experience unexplainable large delays on the commits from time to time.
The setup is using the kafka-connect connector. Tested both with the latest release (1.7.1) and with main. (1.8.0-SNAPSHOT)
These are the important settings from the connector config:
So we commit every 10 seconds and timeout after 20 seconds. This work well approximately 75% of the time. But occasionally (every 4-5 commits or so) we experience a delay that cannot be explained.
When this happens, a message is received. The connector goes through the commit logic and times out as if there is no message received. This happens a couple of times until it is finally committed (no other messages are received in this time period.
Here's an example:
These messages are from the input topic (The timestamp is in GMT+1)
This is the kafka offset and consumer offset for the source topic, showing the delay in offset commit (matches the iceberg commit)
This is the same for the iceberg connector control topic, which does not have any delays (so that's not the issue)
At 07:40:15 two messages are received. One of them is committed immediately after, but the other one is not. No more messages are received and the commit loop goes through several cyles (4 commits and timeouts), before it is actually committed at 07:41:44.
The logs can be seen below, as well as a filtered part of the log, which only shows when a commit interval starts, times out and completes. Log files are in GMT
Any idea what could be causing this? Another thing I notice, is that when no messages have been received since the last commit, it still has to wait for the timeout. This does not seem like the most feasible solution. Wouldn't it be better to track that and then simply return?
Filtered log:
Complete log:
Willingness to contribute
The text was updated successfully, but these errors were encountered: