Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Read and write snapshot metadata from the correct place #2161

Merged
merged 1 commit into from
Feb 3, 2025

Conversation

poodlewars
Copy link
Collaborator

@poodlewars poodlewars commented Jan 31, 2025

Between v4.5.0 and v5.2.1 we changed to save snapshot metadata in the segment header in a TimeseriesDescriptor field. This was changed in the binary encoding change 516d169 .

Before v4.5.0 we wrote it directly in a UserDefinedMetadata field.

This means that the current code falsely thinks that snapshots written before v4.5.0 have no snapshot metadata.

This PR:

  • Changes readers to check both places for any metadata, so we can read snapshot metadata written before v4.5.0 again
  • Changes writers back to writing directly in a UserDefinedMetadata field

This means that clients with versions between v4.5.0 and v5.2.1 will not see snapshot metadata written by modern versions (they will say it is None) and will need to upgrade.

Since the metadata gets saved in the SegmentHeader in both schemes, it is not possible to write data in a way that is compatible with both early readers and readers between v4.5.0 and v5.2.1. We could make the alternative decision and break readers older than v4.5.0.

We should apply a similar change to arcticc's reading code so it can understand snapshot metadata written by newer versions.

@poodlewars poodlewars force-pushed the list-snaps-metadata-regression branch from 8370f8e to dd1d870 Compare January 31, 2025 17:43
@poodlewars poodlewars marked this pull request as ready for review January 31, 2025 17:50
Copy link
Collaborator

@IvoDD IvoDD left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, only minor not required comments

old_lib = old_ac.create_library(lib_name)

# Write snapshot metadata using current version
set_config_int("VersionMap.ReloadInterval", 0) # We disable the cache to be able to read the data written from old_venv
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you think about adding the set_config_int("VersionMap.ReloadInterval", 0) to the with CurrentVersion? I feel like all compatibility tests would want this. I guess our previous compat tests were not complicated enought to require it.

@@ -117,3 +118,72 @@ def test_pandas_pickling(pandas_v1_venv, s3_ssl_disabled_storage, lib_name):
idx = pd.Index([1, 2, 3], dtype="int64")
expected_df.index = idx
assert_frame_equal(read_df, expected_df)


def test_compat_snapshot_metadata_read_write(pandas_v1_venv, s3_ssl_disabled_storage, lib_name):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be nice for this test to be able to test not only pandas_v1_venv but to test all versions apart from anything in between 4.5.0 and 5.2.1, so when we add new versions to support in the old_venv fixture we could have them tested as well.

I think we can easily do that if we use old_venv_and_arctic_uri and have somthing like:

adb_version = Version(old_venv.version)
if adb_version >= Version("4.5.0") and adb_version <= Version("5.2.1"):
    pytest.skip(reason="Versions between 4.5.0 and 5.2.1 store snapshot metadata in timeseries descriptor which is incompatible with any other versions")

output.PackFrom(user_meta_proto);
snapshot_agg.set_metadata(std::move(output));

// Bewared: Between v4.5.0 and v5.2.1 we only saved this metadata on the
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo bewared

@poodlewars poodlewars merged commit 958a6ff into master Feb 3, 2025
152 checks passed
@poodlewars poodlewars deleted the list-snaps-metadata-regression branch February 3, 2025 10:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants