-
Notifications
You must be signed in to change notification settings - Fork 114
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Read and write snapshot metadata from the correct place #2161
Conversation
8370f8e
to
dd1d870
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, only minor not required comments
old_lib = old_ac.create_library(lib_name) | ||
|
||
# Write snapshot metadata using current version | ||
set_config_int("VersionMap.ReloadInterval", 0) # We disable the cache to be able to read the data written from old_venv |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you think about adding the set_config_int("VersionMap.ReloadInterval", 0)
to the with CurrentVersion
? I feel like all compatibility tests would want this. I guess our previous compat tests were not complicated enought to require it.
@@ -117,3 +118,72 @@ def test_pandas_pickling(pandas_v1_venv, s3_ssl_disabled_storage, lib_name): | |||
idx = pd.Index([1, 2, 3], dtype="int64") | |||
expected_df.index = idx | |||
assert_frame_equal(read_df, expected_df) | |||
|
|||
|
|||
def test_compat_snapshot_metadata_read_write(pandas_v1_venv, s3_ssl_disabled_storage, lib_name): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be nice for this test to be able to test not only pandas_v1_venv
but to test all versions apart from anything in between 4.5.0
and 5.2.1
, so when we add new versions to support in the old_venv
fixture we could have them tested as well.
I think we can easily do that if we use old_venv_and_arctic_uri
and have somthing like:
adb_version = Version(old_venv.version)
if adb_version >= Version("4.5.0") and adb_version <= Version("5.2.1"):
pytest.skip(reason="Versions between 4.5.0 and 5.2.1 store snapshot metadata in timeseries descriptor which is incompatible with any other versions")
output.PackFrom(user_meta_proto); | ||
snapshot_agg.set_metadata(std::move(output)); | ||
|
||
// Bewared: Between v4.5.0 and v5.2.1 we only saved this metadata on the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typo bewared
Between v4.5.0 and v5.2.1 we changed to save snapshot metadata in the segment header in a
TimeseriesDescriptor
field. This was changed in the binary encoding change 516d169 .Before v4.5.0 we wrote it directly in a
UserDefinedMetadata
field.This means that the current code falsely thinks that snapshots written before
v4.5.0
have no snapshot metadata.This PR:
v4.5.0
againUserDefinedMetadata
fieldThis means that clients with versions between v4.5.0 and v5.2.1 will not see snapshot metadata written by modern versions (they will say it is
None
) and will need to upgrade.Since the metadata gets saved in the
SegmentHeader
in both schemes, it is not possible to write data in a way that is compatible with both early readers and readers betweenv4.5.0
andv5.2.1
. We could make the alternative decision and break readers older thanv4.5.0
.We should apply a similar change to
arcticc
's reading code so it can understand snapshot metadata written by newer versions.