[CORE-8530] Handle out-of-sync producers by writing records to past-schema parquet #24955

oleiman · 2025-01-28T03:20:36Z

builds on #24862 . interesting commits start at e215bfe

Backports Required

Release Notes

none

vbotbuildovich · 2025-01-30T22:43:29Z

CI test results

test_id	test_kind	job_url	test_status	passed
rptest.tests.archival_test.ArchivalTest.test_all_partitions_leadership_transfer.cloud_storage_type=CloudStorageType.S3	ducktape	https://buildkite.com/redpanda/redpanda/builds/61406#0194b8ca-ef8b-4e94-8cdf-dc65e16620d1	FLAKY	1/2
rptest.tests.compaction_recovery_test.CompactionRecoveryTest.test_index_recovery	ducktape	https://buildkite.com/redpanda/redpanda/builds/61406#0194b8dd-56f2-4777-945b-eb3271a519fa	FLAKY	1/2
rptest.tests.compaction_recovery_test.CompactionRecoveryUpgradeTest.test_index_recovery_after_upgrade	ducktape	https://buildkite.com/redpanda/redpanda/builds/61406#0194b8dd-56f3-4f40-bac0-2cefaaba54ea	FLAKY	1/2
rptest.tests.random_node_operations_test.RandomNodeOperationsTest.test_node_operations.enable_failures=False.mixed_versions=False.with_tiered_storage=False.with_iceberg=True.with_chunked_compaction=True.cloud_storage_type=CloudStorageType.S3	ducktape	https://buildkite.com/redpanda/redpanda/builds/61406#0194b8ca-ef8b-4e94-8cdf-dc65e16620d1	FLAKY	1/2
rptest.tests.scaling_up_test.ScalingUpTest.test_scaling_up_with_recovered_topic	ducktape	https://buildkite.com/redpanda/redpanda/builds/61406#0194b8ca-ef8c-4348-be15-900097961eff	FLAKY	1/2

test results on build#61444

test_id	test_kind	job_url	test_status	passed
rptest.tests.compaction_recovery_test.CompactionRecoveryTest.test_index_recovery	ducktape	https://buildkite.com/redpanda/redpanda/builds/61444#0194bd88-6b8b-459d-948c-f3adbe590209	FLAKY	1/2
rptest.tests.compaction_recovery_test.CompactionRecoveryTest.test_index_recovery	ducktape	https://buildkite.com/redpanda/redpanda/builds/61444#0194bd8d-80f3-4231-bd1c-8143122d9f09	FLAKY	1/2
rptest.tests.compaction_recovery_test.CompactionRecoveryUpgradeTest.test_index_recovery_after_upgrade	ducktape	https://buildkite.com/redpanda/redpanda/builds/61444#0194bd88-6b8c-43da-97a1-4558b2638ed0	FLAKY	1/2
rptest.tests.compaction_recovery_test.CompactionRecoveryUpgradeTest.test_index_recovery_after_upgrade	ducktape	https://buildkite.com/redpanda/redpanda/builds/61444#0194bd8d-80f0-4139-b9b2-91f2b1163b67	FLAKY	1/2
rptest.tests.scaling_up_test.ScalingUpTest.test_scaling_up_with_recovered_topic	ducktape	https://buildkite.com/redpanda/redpanda/builds/61444#0194bd8d-80f0-4139-b9b2-91f2b1163b67	FLAKY	1/2

test results on build#61469

test_id	test_kind	job_url	test_status	passed
gtest_raft_rpunit.gtest_raft_rpunit	unit	https://buildkite.com/redpanda/redpanda/builds/61469#0194bea3-41a9-42b1-9fae-882f35ba3afd	FLAKY	1/2
rptest.tests.e2e_shadow_indexing_test.ShadowIndexingWhileBusyTest.test_create_or_delete_topics_while_busy.short_retention=True.cloud_storage_type=CloudStorageType.ABS	ducktape	https://buildkite.com/redpanda/redpanda/builds/61469#0194bf02-be31-48a6-8d5f-96391010ac62	FLAKY	1/3

andrwng

The C++ changes could use some testing, though this functionally looks pretty good to me

src/v/iceberg/table_metadata.h

src/v/iceberg/compatibility_utils.cc

andrwng · 2025-01-31T20:45:44Z

src/v/iceberg/compatibility_utils.cc

+namespace iceberg {
+bool schemas_equivalent(const struct_type& source, const struct_type& dest) {
+    chunked_vector<const nested_field*> source_stk;


Could use some simple tests?

Maybe same with table metadata and/or catalog_schema_manager

yeah fair point. slipped my mind.

andrwng · 2025-01-31T20:50:18Z

src/v/datalake/catalog_schema_manager.cc

+    auto source_copy = schema->schema_struct.copy();
+    auto compat_res = check_schema_compat(dest_type, schema->schema_struct);


Not necessarily related to this PR, but this seems like a really easy footgun to hit. If the solution in general is to make a copy of the struct beforehand, should we make check_schema_compat take the source schema as non-const?

yeah good point. perhaps it would be most clear to pass by value.

Checks whether two structs are precisely equivalent[1] using a simultaneous depth-first traversal. The use case is for performing schema lookups on cached table metadata by type rather than by ID. [1] - Exclusive of IDs but inclusive of order. Signed-off-by: Oren Leiman <[email protected]>

Search for a schema that matches the provided type. Signed-off-by: Oren Leiman <[email protected]>

For catalog_schema_manager, we can use this to perform a type-wise schema lookup on cached metadata, resulting in table_info bound to an arbitrary schema rather (possibly) other than the current table schema. Also update catalog_schema_manager::get_ids_from_table_meta to try a type-wise lookup before performing the usual compat check. This way we can short-circuit a schema update if the desired schema is already present in the table. Also pass source struct to check_schema_compat by value to avoid polluting cached table metadata with compat annotations. Signed-off-by: Oren Leiman <[email protected]>

Rather than current schema ID. By this point we should have ensured that the record schema exists in the table (either historically or currently). This change lets us look past the current schema to build a writer for historical data. Signed-off-by: Oren Leiman <[email protected]>

Signed-off-by: Oren Leiman <[email protected]>

rockwotj

LGTM!

rockwotj · 2025-02-03T21:58:11Z

src/v/datalake/catalog_schema_manager.cc

@@ -230,7 +235,10 @@ catalog_schema_manager::get_table_info(
    }
    const auto& table = load_res.value();

-    auto cur_schema = table.get_schema(table.current_schema_id);
+    const auto* cur_schema = desired_type.has_value()
+                               ? table.get_equivalent_schema(


I wonder if there are perf implications here? Might be good to check what the impact here is and if there some other mechanism we want to have here (or does this get cached per schema ID in the translation path - I will look).

ya I've been thinking about this. i got the impression from @andrwng that there's some caching in play, but I still need to chase down details.

in retrospect, not sure whether we can expect table_metadata::schemas to be in sorted (oldest first) order. maybe slightly better to look up by current_schema_id up front and compare if necessary, since that's the common case.

Yeah there are other options like using a fingerprint to quickly sort out mismatches. Hopefully it's all behind a cache and this won't matter

Just to clarify, the caching I was referring to is pretty indirect -- it's caching at the level of the record multiplexer, such that each translation of an offset range will have a shared map of parquet writers that's more or less keyed by schema ID. So we will need to do this lookup once per record schema per translation.

Right right. So assuming that:

the steady state of most translators is a stream of records conforming to the corresponding table's current schema

the first thing we look up in get_table_info is the current table schema

Then incurring a single additional equivalence check in the common case seems basically fine?

In principle we should only get in the out-of-sync producer state when the schema changes, but I don't have a good sense of how long we might spend there.

maybe an on-shard cache keyed by some fingerprint is good enough? Vs the additional work of wiring something into the coordinator

@andrwng - completely misread your comment 🤦.

once per record schema per translation

should be totally fine then as written, yeah?

Oops I missed the earlier comment. Yeah I wasn't implying that what we have is insufficient, just thought i'd clarify where there is schema reuse. I think it's fine as is (or at least, let's wait and see as the translation scheduling policy crystalizes, though I'd be surprised if it becomes a bottleneck)

I will say, maybe this is worth trying out with some massive schemas

no worries. yeah for sure - what i read on mobile was "check per batch", so I went off an wrote a pointless benchmark... entirely my bad.

maybe this is worth trying out with some massive schemas

yeah i can whip something up at some point.

wait and see...surprised if it becomes a bottleneck

agree

oleiman self-assigned this Jan 28, 2025

github-actions bot added area/build area/redpanda labels Jan 28, 2025

oleiman mentioned this pull request Jan 28, 2025

[CORE-8530] Integrate struct evolution machinery into catalog_schema_manager #24862

Merged

7 tasks

oleiman force-pushed the dlib/core-8530/write-historical-schema branch from 20ddb11 to 8e9d51a Compare January 28, 2025 03:26

oleiman marked this pull request as ready for review January 28, 2025 03:26

oleiman marked this pull request as draft January 28, 2025 06:06

oleiman force-pushed the dlib/core-8530/write-historical-schema branch 4 times, most recently from b4ffa8f to 0d4f346 Compare January 30, 2025 08:07

oleiman marked this pull request as ready for review January 30, 2025 08:07

This comment was marked as outdated.

Sign in to view

oleiman force-pushed the dlib/core-8530/write-historical-schema branch 2 times, most recently from 770eb1d to 4940eea Compare January 30, 2025 18:37

oleiman force-pushed the dlib/core-8530/write-historical-schema branch from 4940eea to ac6735d Compare January 31, 2025 16:47

oleiman requested review from rockwotj, andrwng and nvartolomei January 31, 2025 16:48

andrwng reviewed Jan 31, 2025

View reviewed changes

oleiman added 5 commits January 31, 2025 14:20

ice/table_meta: Introduce get_equivalent_schema(struct_type)

8b0aee7

Search for a schema that matches the provided type. Signed-off-by: Oren Leiman <[email protected]>

dt/dl: Integration test for past-schema fallback

f5a8ad8

Signed-off-by: Oren Leiman <[email protected]>

oleiman force-pushed the dlib/core-8530/write-historical-schema branch from ac6735d to f5a8ad8 Compare January 31, 2025 23:11

oleiman requested a review from andrwng February 1, 2025 04:19

andrwng approved these changes Feb 3, 2025

View reviewed changes

oleiman merged commit b174696 into redpanda-data:dev Feb 3, 2025
18 checks passed

rockwotj reviewed Feb 3, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[CORE-8530] Handle out-of-sync producers by writing records to past-schema parquet #24955

[CORE-8530] Handle out-of-sync producers by writing records to past-schema parquet #24955

oleiman commented Jan 28, 2025 •

edited

Loading

This comment was marked as outdated.

vbotbuildovich commented Jan 30, 2025 •

edited

Loading

andrwng left a comment

andrwng Jan 31, 2025

andrwng Jan 31, 2025

oleiman Jan 31, 2025

andrwng Jan 31, 2025

oleiman Jan 31, 2025

rockwotj left a comment

rockwotj Feb 3, 2025

oleiman Feb 3, 2025

rockwotj Feb 4, 2025

andrwng Feb 4, 2025

oleiman Feb 4, 2025

oleiman Feb 5, 2025

andrwng Feb 5, 2025

oleiman Feb 5, 2025

		auto source_copy = schema->schema_struct.copy();
		auto compat_res = check_schema_compat(dest_type, schema->schema_struct);

[CORE-8530] Handle out-of-sync producers by writing records to past-schema parquet #24955

[CORE-8530] Handle out-of-sync producers by writing records to past-schema parquet #24955

Conversation

oleiman commented Jan 28, 2025 • edited Loading

Backports Required

Release Notes

This comment was marked as outdated.

vbotbuildovich commented Jan 30, 2025 • edited Loading

CI test results

andrwng left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rockwotj left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

oleiman commented Jan 28, 2025 •

edited

Loading

vbotbuildovich commented Jan 30, 2025 •

edited

Loading