fix: `Dataset` `eq` and `set_manager` #512

aaraney · 2024-01-30T18:34:57Z

Reintroduce Dataset's __eq__ method. This was lost during the pydantic refactor. The current behavior is incorrect for several reasons. The most glaring being Dataset's must have the same manager_uuid field values.

`dmod.core` -- `0.13.0`

Additions

Dataset.set_manager method. This replaces setting a manager instance as if it were a property.

Changes

Reintroduce Dataset's __eq__ method. This was lost during the pydantic refactor. The current behavior is incorrect for several reasons. The most glaring being Dataset's must have the same manager_uuid field values.

`dmod.modeldata` -- `0.9.4`

Changes

Use dmod.core.Dataset's set_manager api.

aaraney · 2024-01-30T18:59:31Z

Pretty sure I missed one place where a setter should be used now.

Edit: found the place and fixed it.

robertbartel

There are a few things I have a few questions about.

robertbartel · 2024-02-01T19:47:09Z

python/lib/core/dmod/core/dataset.py

+            and self.category == other.category
+            and self.created_on == other.created_on
+            and self.data_domain == other.data_domain
+            and self.dataset_type == other.dataset_type


There are several properties that are not examined here. For some, like manager_uuid, I think this makes sense. For others, like uuid, my first reaction is that it doesn't makes sense to omit it. And for still others, like derived_from, I'm not sure. (FWIW, I think those that are optional like uuid should be optionally checked, with a None value not disqualifying from equality.)

Any particular reason why some of these weren't added?

Thanks for bringing up these questions! I had similar questions myself. I went back in the git history to the first point before the pydantic refactor work and copied and pasted the __eq__ implementation from that point. I thought that would at least be a good starting point.

I am not sure if we should compare uuid's or not? I lean toward yes, but I could hear the argument for no. Our idea of a dataset has slightly changed since pre-pydantic refactor as we now use, more or less, aggregate datasets for everything but forcing data for a simulation. I think that change in thinking was probably why they were not compared in the previous version of the code when the thinking was to determine suitable existing datasets that could be used to satisfy a given simulation request? Is that fair and what are your thoughts?

Our idea of a dataset has slightly changed since pre-pydantic refactor as we now use, more or less, aggregate datasets for everything but forcing data for a simulation.

I don't think our abstract idea has change; we just evolved another more practical data format.

Here's what I suggest:

expires: test ~~both are in the future~~ that neither has an explicit expire time value in the past (i.e., None is allowed)

disagreeing about how long data is good for doesn't imply disagreeing on whether your are talking about the same data, as long as everyone agrees the data is still good

derived_from: optionally test for equality

different explicit origins implies different data, but lack of information about origin does not

derivations: ignore

doesn't really say anything about "this" data

last_updated: optionally test for equality

if one side says things have updated more recently than the other, chances are these are not truly talking about the same data

uuid: ??? ... for now, optionally test for equality, but this needs its own issue for follow-up

I thought more about this, and I'm less sure of what to do

does it make sense to have a universally unique identifier if it is not universally unique (i.e., if there are ever two non-equal objects with the same value, even if the value is None)?

let's not allow this PR to get stuck here, but we might need to do more regarding this attribute to make the design consistent

Is it okay if expires is None on both datasets?

Ah, yes, I should have clarified, but I think that's fine also. None implies the dataset is not temporary. Certainly if both are None, this fits with equality. And I don't think disagreeing on whether the dataset is temporary should necessarily imply looking at different data (again, as long as the expire time hasn't passed yet).

Thanks for clarifying!

python/lib/core/dmod/core/dataset.py

robertbartel · 2024-02-02T15:32:02Z

python/lib/core/dmod/core/dataset.py

+        ----------
+        value: DatasetManager | None
+            The new value for ::attribute:`manager`.
+        """
        self._manager = value if isinstance(value, DatasetManager) else None

        if value is not None:


I think we need to do this to make sure manager_uuid also gets properly unset.

Suggested change

if value is not None:

if value is None:

self.manager_uuid = None

else:

Thanks for pointing this out, @robertbartel. Just pushed up a fix. I slightly altered the possible behavior, but I think its for the best. LMK if you think otherwise.

Hmmm, so what about the case when you are initializing from a deserialized json message that includes manager_uuid? This boils down to is the invariant that an instance's manager and manager_uuid fields are always either None or some valid value? Or can you have a manager_uuid field without a manager being set? I think the answer if for sure the latter.

FWIW the previous behavior pre-pydantic was to leave the manager_uuid field if you set the manager with a None value (or any non-covariant DatasetManager type).

Just pushed up a change to only modify manager_uuid if a DatasetManager type is passed as the value.

Hmmm, so what about the case when you are initializing from a deserialized json message that includes manager_uuid?

Assuming we keep the manager_uuid attribute (I'll come back to this), we need to support this case. The motive for including the manager_uuid field is that we can't serialize the manager, so instead we serialize a way to identify the right manager.

This boils down to is the invariant that an instance's manager and manager_uuid fields are always either None or some valid value?

Not strictly always. Practically, but not strictly. E.g., not immediately after deserialization.

FWIW the previous behavior pre-pydantic was to leave the manager_uuid field if you set the manager with a None value (or any non-covariant DatasetManager type).

Right, I thought earlier in the discussion for the PR that this was an oversight. The only reason to support setting the manager to None, beyond when we are first initializing the object, would be to unset the manager. And we aren't properly unsetting the manager if the Dataset, when serialized, still reflects said manager's uuid. For now though, leave this as you have it; I'm starting to think it more a consequence of a problem in the design than an oversight.

I don't want us to get further caught up here, so I'll open a separate issue, but I think we should examine whether it is really appropriate/necessary/useful to serialize the manager's uuid with a dataset. Manager instance uuids end up being dynamically generated each time, and the service only permits one manager per DatasetType (i.e., there's only ever one OBJECT_STORE type manager at a time, which gets an arbitrary uuid, so if we know a Dataset is OBJECT_STORE, maybe we don't need to and shouldn't keep a manager uuid separately from a manager reference).

aaraney · 2024-02-02T19:04:47Z

@robertbartel, can you give this one quick last glance? I think we are just about ready to merge.

aaraney · 2024-02-02T19:50:29Z

Just rebased to master and force pushed after the merge of #461.

b.c. pydantic does not allow defining custom property setter via a property decorator

aaraney requested review from christophertubbs and robertbartel January 30, 2024 18:54

aaraney marked this pull request as draft January 30, 2024 18:59

aaraney force-pushed the dataset-manager branch from 00705ec to e59386c Compare January 30, 2024 19:01

aaraney marked this pull request as ready for review January 30, 2024 19:01

aaraney mentioned this pull request Feb 1, 2024

test: account for nested directories #516

Merged

robertbartel reviewed Feb 1, 2024

View reviewed changes

aaraney force-pushed the dataset-manager branch from 9cba841 to 1294bae Compare February 1, 2024 21:42

aaraney commented Feb 1, 2024

View reviewed changes

python/lib/core/dmod/core/dataset.py Outdated Show resolved Hide resolved

aaraney requested a review from robertbartel February 1, 2024 21:45

aaraney force-pushed the dataset-manager branch from 1294bae to 8ce99ce Compare February 1, 2024 22:48

aaraney requested review from robertbartel and removed request for robertbartel February 1, 2024 22:49

aaraney force-pushed the dataset-manager branch from 8ce99ce to ce2732b Compare February 1, 2024 23:21

robertbartel requested changes Feb 2, 2024

View reviewed changes

aaraney force-pushed the dataset-manager branch from ce2732b to b6e5e1b Compare February 2, 2024 19:03

aaraney requested a review from robertbartel February 2, 2024 19:04

aaraney added 2 commits February 2, 2024 14:48

feat: overload Dataset eq

4119ada

docs: update manager prop doc string

5f0e54c

aaraney force-pushed the dataset-manager branch from b6e5e1b to 3c9045a Compare February 2, 2024 19:49

aaraney force-pushed the dataset-manager branch from 3c9045a to a3e3abe Compare February 2, 2024 19:58

aaraney added 4 commits February 2, 2024 15:10

feat: add set_manager

9567624

b.c. pydantic does not allow defining custom property setter via a property decorator

fix: use set_manager

db7b6f2

chore: bump ngen.core 0.12.0 -> 0.13.0

ae4f886

chore: bump dmod.modeldata 0.9.3 -> 0.9.4

e800511

aaraney force-pushed the dataset-manager branch from a3e3abe to e800511 Compare February 2, 2024 20:10

robertbartel mentioned this pull request Feb 2, 2024

Determine if Datasets should serialize and separately track manager uuids internally #518

Open

robertbartel approved these changes Feb 2, 2024

View reviewed changes

robertbartel merged commit ecec45d into NOAA-OWP:master Feb 2, 2024
4 of 8 checks passed

robertbartel added the maas MaaS Workstream label Feb 2, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: `Dataset` `eq` and `set_manager` #512

fix: `Dataset` `eq` and `set_manager` #512

aaraney commented Jan 30, 2024 •

edited

Loading

aaraney commented Jan 30, 2024 •

edited

Loading

robertbartel left a comment

robertbartel Feb 1, 2024

aaraney Feb 1, 2024

robertbartel Feb 1, 2024 •

edited

Loading

aaraney Feb 1, 2024

robertbartel Feb 2, 2024

aaraney Feb 2, 2024

robertbartel Feb 2, 2024

aaraney Feb 2, 2024

aaraney Feb 2, 2024

aaraney Feb 2, 2024

aaraney Feb 2, 2024

robertbartel Feb 2, 2024

aaraney commented Feb 2, 2024

aaraney commented Feb 2, 2024

-        if value is not None:
+        if value is None:
+            self.manager_uuid = None
+        else:

fix: Dataset __eq__ and set_manager #512

fix: Dataset __eq__ and set_manager #512

Conversation

aaraney commented Jan 30, 2024 • edited Loading

dmod.core -- 0.13.0

Additions

Changes

dmod.modeldata -- 0.9.4

Changes

aaraney commented Jan 30, 2024 • edited Loading

robertbartel left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

robertbartel Feb 1, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

aaraney commented Feb 2, 2024

aaraney commented Feb 2, 2024

fix: `Dataset` `eq` and `set_manager` #512

fix: `Dataset` `eq` and `set_manager` #512

aaraney commented Jan 30, 2024 •

edited

Loading

`dmod.core` -- `0.13.0`

`dmod.modeldata` -- `0.9.4`

aaraney commented Jan 30, 2024 •

edited

Loading

robertbartel Feb 1, 2024 •

edited

Loading