Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CORE-8530] Handle out-of-sync producers by writing records to past-schema parquet #24955

Merged

Conversation

oleiman
Copy link
Member

@oleiman oleiman commented Jan 28, 2025

builds on #24862 . interesting commits start at e215bfe

Backports Required

  • none - not a bug fix
  • none - this is a backport
  • none - issue does not exist in previous branches
  • none - papercut/not impactful enough to backport
  • v24.3.x
  • v24.2.x
  • v24.1.x

Release Notes

  • none

@oleiman oleiman self-assigned this Jan 28, 2025
@oleiman oleiman force-pushed the dlib/core-8530/write-historical-schema branch from 20ddb11 to 8e9d51a Compare January 28, 2025 03:26
@oleiman oleiman marked this pull request as ready for review January 28, 2025 03:26
@oleiman oleiman marked this pull request as draft January 28, 2025 06:06
@oleiman oleiman force-pushed the dlib/core-8530/write-historical-schema branch 4 times, most recently from b4ffa8f to 0d4f346 Compare January 30, 2025 08:07
@oleiman oleiman marked this pull request as ready for review January 30, 2025 08:07
@vbotbuildovich

This comment was marked as outdated.

@oleiman oleiman force-pushed the dlib/core-8530/write-historical-schema branch 2 times, most recently from 770eb1d to 4940eea Compare January 30, 2025 18:37
@vbotbuildovich
Copy link
Collaborator

vbotbuildovich commented Jan 30, 2025

CI test results

test results on build#61406
test_id test_kind job_url test_status passed
rptest.tests.archival_test.ArchivalTest.test_all_partitions_leadership_transfer.cloud_storage_type=CloudStorageType.S3 ducktape https://buildkite.com/redpanda/redpanda/builds/61406#0194b8ca-ef8b-4e94-8cdf-dc65e16620d1 FLAKY 1/2
rptest.tests.compaction_recovery_test.CompactionRecoveryTest.test_index_recovery ducktape https://buildkite.com/redpanda/redpanda/builds/61406#0194b8dd-56f2-4777-945b-eb3271a519fa FLAKY 1/2
rptest.tests.compaction_recovery_test.CompactionRecoveryUpgradeTest.test_index_recovery_after_upgrade ducktape https://buildkite.com/redpanda/redpanda/builds/61406#0194b8dd-56f3-4f40-bac0-2cefaaba54ea FLAKY 1/2
rptest.tests.random_node_operations_test.RandomNodeOperationsTest.test_node_operations.enable_failures=False.mixed_versions=False.with_tiered_storage=False.with_iceberg=True.with_chunked_compaction=True.cloud_storage_type=CloudStorageType.S3 ducktape https://buildkite.com/redpanda/redpanda/builds/61406#0194b8ca-ef8b-4e94-8cdf-dc65e16620d1 FLAKY 1/2
rptest.tests.scaling_up_test.ScalingUpTest.test_scaling_up_with_recovered_topic ducktape https://buildkite.com/redpanda/redpanda/builds/61406#0194b8ca-ef8c-4348-be15-900097961eff FLAKY 1/2
test results on build#61444
test_id test_kind job_url test_status passed
rptest.tests.compaction_recovery_test.CompactionRecoveryTest.test_index_recovery ducktape https://buildkite.com/redpanda/redpanda/builds/61444#0194bd88-6b8b-459d-948c-f3adbe590209 FLAKY 1/2
rptest.tests.compaction_recovery_test.CompactionRecoveryTest.test_index_recovery ducktape https://buildkite.com/redpanda/redpanda/builds/61444#0194bd8d-80f3-4231-bd1c-8143122d9f09 FLAKY 1/2
rptest.tests.compaction_recovery_test.CompactionRecoveryUpgradeTest.test_index_recovery_after_upgrade ducktape https://buildkite.com/redpanda/redpanda/builds/61444#0194bd88-6b8c-43da-97a1-4558b2638ed0 FLAKY 1/2
rptest.tests.compaction_recovery_test.CompactionRecoveryUpgradeTest.test_index_recovery_after_upgrade ducktape https://buildkite.com/redpanda/redpanda/builds/61444#0194bd8d-80f0-4139-b9b2-91f2b1163b67 FLAKY 1/2
rptest.tests.scaling_up_test.ScalingUpTest.test_scaling_up_with_recovered_topic ducktape https://buildkite.com/redpanda/redpanda/builds/61444#0194bd8d-80f0-4139-b9b2-91f2b1163b67 FLAKY 1/2
test results on build#61469
test_id test_kind job_url test_status passed
gtest_raft_rpunit.gtest_raft_rpunit unit https://buildkite.com/redpanda/redpanda/builds/61469#0194bea3-41a9-42b1-9fae-882f35ba3afd FLAKY 1/2
rptest.tests.e2e_shadow_indexing_test.ShadowIndexingWhileBusyTest.test_create_or_delete_topics_while_busy.short_retention=True.cloud_storage_type=CloudStorageType.ABS ducktape https://buildkite.com/redpanda/redpanda/builds/61469#0194bf02-be31-48a6-8d5f-96391010ac62 FLAKY 1/3

@oleiman oleiman force-pushed the dlib/core-8530/write-historical-schema branch from 4940eea to ac6735d Compare January 31, 2025 16:47
Copy link
Contributor

@andrwng andrwng left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The C++ changes could use some testing, though this functionally looks pretty good to me

src/v/iceberg/table_metadata.h Outdated Show resolved Hide resolved
src/v/iceberg/compatibility_utils.cc Outdated Show resolved Hide resolved
Comment on lines +13 to +15
namespace iceberg {
bool schemas_equivalent(const struct_type& source, const struct_type& dest) {
chunked_vector<const nested_field*> source_stk;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could use some simple tests?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe same with table metadata and/or catalog_schema_manager

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah fair point. slipped my mind.

Comment on lines 291 to 292
auto source_copy = schema->schema_struct.copy();
auto compat_res = check_schema_compat(dest_type, schema->schema_struct);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not necessarily related to this PR, but this seems like a really easy footgun to hit. If the solution in general is to make a copy of the struct beforehand, should we make check_schema_compat take the source schema as non-const?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah good point. perhaps it would be most clear to pass by value.

Checks whether two structs are precisely equivalent[1] using a simultaneous
depth-first traversal.

The use case is for performing schema lookups on cached table metadata by type
rather than by ID.

[1] - Exclusive of IDs but inclusive of order.

Signed-off-by: Oren Leiman <[email protected]>
Search for a schema that matches the provided type.

Signed-off-by: Oren Leiman <[email protected]>
For catalog_schema_manager, we can use this to perform a type-wise schema
lookup on cached metadata, resulting in table_info bound to an arbitrary
schema rather (possibly) other than the current table schema.

Also update catalog_schema_manager::get_ids_from_table_meta to try a type-wise
lookup before performing the usual compat check. This way we can short-circuit
a schema update if the desired schema is already present in the table.

Also pass source struct to check_schema_compat by value to avoid polluting
cached table metadata with compat annotations.

Signed-off-by: Oren Leiman <[email protected]>
Rather than current schema ID. By this point we should have ensured that the
record schema exists in the table (either historically or currently). This
change lets us look past the current schema to build a writer for historical
data.

Signed-off-by: Oren Leiman <[email protected]>
@oleiman oleiman force-pushed the dlib/core-8530/write-historical-schema branch from ac6735d to f5a8ad8 Compare January 31, 2025 23:11
@oleiman oleiman requested a review from andrwng February 1, 2025 04:19
@oleiman oleiman merged commit b174696 into redpanda-data:dev Feb 3, 2025
18 checks passed
Copy link
Contributor

@rockwotj rockwotj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@@ -230,7 +235,10 @@ catalog_schema_manager::get_table_info(
}
const auto& table = load_res.value();

auto cur_schema = table.get_schema(table.current_schema_id);
const auto* cur_schema = desired_type.has_value()
? table.get_equivalent_schema(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if there are perf implications here? Might be good to check what the impact here is and if there some other mechanism we want to have here (or does this get cached per schema ID in the translation path - I will look).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ya I've been thinking about this. i got the impression from @andrwng that there's some caching in play, but I still need to chase down details.

in retrospect, not sure whether we can expect table_metadata::schemas to be in sorted (oldest first) order. maybe slightly better to look up by current_schema_id up front and compare if necessary, since that's the common case.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah there are other options like using a fingerprint to quickly sort out mismatches. Hopefully it's all behind a cache and this won't matter

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to clarify, the caching I was referring to is pretty indirect -- it's caching at the level of the record multiplexer, such that each translation of an offset range will have a shared map of parquet writers that's more or less keyed by schema ID. So we will need to do this lookup once per record schema per translation.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right right. So assuming that:

  • the steady state of most translators is a stream of records conforming to the corresponding table's current schema
  • the first thing we look up in get_table_info is the current table schema

Then incurring a single additional equivalence check in the common case seems basically fine?

In principle we should only get in the out-of-sync producer state when the schema changes, but I don't have a good sense of how long we might spend there.

maybe an on-shard cache keyed by some fingerprint is good enough? Vs the additional work of wiring something into the coordinator

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@andrwng - completely misread your comment 🤦.

once per record schema per translation

should be totally fine then as written, yeah?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops I missed the earlier comment. Yeah I wasn't implying that what we have is insufficient, just thought i'd clarify where there is schema reuse. I think it's fine as is (or at least, let's wait and see as the translation scheduling policy crystalizes, though I'd be surprised if it becomes a bottleneck)

I will say, maybe this is worth trying out with some massive schemas

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no worries. yeah for sure - what i read on mobile was "check per batch", so I went off an wrote a pointless benchmark... entirely my bad.

maybe this is worth trying out with some massive schemas

yeah i can whip something up at some point.

wait and see...surprised if it becomes a bottleneck

agree

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants