-
Notifications
You must be signed in to change notification settings - Fork 15
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: Dataset
__eq__
and set_manager
#512
Conversation
Pretty sure I missed one place where a setter should be used now. Edit: found the place and fixed it. |
00705ec
to
e59386c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are a few things I have a few questions about.
and self.category == other.category | ||
and self.created_on == other.created_on | ||
and self.data_domain == other.data_domain | ||
and self.dataset_type == other.dataset_type |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are several properties that are not examined here. For some, like manager_uuid
, I think this makes sense. For others, like uuid
, my first reaction is that it doesn't makes sense to omit it. And for still others, like derived_from
, I'm not sure. (FWIW, I think those that are optional like uuid
should be optionally checked, with a None
value not disqualifying from equality.)
Any particular reason why some of these weren't added?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for bringing up these questions! I had similar questions myself. I went back in the git history to the first point before the pydantic
refactor work and copied and pasted the __eq__
implementation from that point. I thought that would at least be a good starting point.
I am not sure if we should compare uuid
's or not? I lean toward yes, but I could hear the argument for no. Our idea of a dataset has slightly changed since pre-pydantic
refactor as we now use, more or less, aggregate datasets for everything but forcing data for a simulation. I think that change in thinking was probably why they were not compared in the previous version of the code when the thinking was to determine suitable existing datasets that could be used to satisfy a given simulation request? Is that fair and what are your thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Our idea of a dataset has slightly changed since pre-pydantic refactor as we now use, more or less, aggregate datasets for everything but forcing data for a simulation.
I don't think our abstract idea has change; we just evolved another more practical data format.
Here's what I suggest:
expires
: testboth are in the futurethat neither has an explicit expire time value in the past (i.e.,None
is allowed)- disagreeing about how long data is good for doesn't imply disagreeing on whether your are talking about the same data, as long as everyone agrees the data is still good
derived_from
: optionally test for equality- different explicit origins implies different data, but lack of information about origin does not
derivations
: ignore- doesn't really say anything about "this" data
last_updated
: optionally test for equality- if one side says things have updated more recently than the other, chances are these are not truly talking about the same data
uuid
: ??? ... for now, optionally test for equality, but this needs its own issue for follow-up- I thought more about this, and I'm less sure of what to do
- does it make sense to have a universally unique identifier if it is not universally unique (i.e., if there are ever two non-equal objects with the same value, even if the value is
None
)? - let's not allow this PR to get stuck here, but we might need to do more regarding this attribute to make the design consistent
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it okay if expires
is None on both datasets?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, yes, I should have clarified, but I think that's fine also. None
implies the dataset is not temporary. Certainly if both are None
, this fits with equality. And I don't think disagreeing on whether the dataset is temporary should necessarily imply looking at different data (again, as long as the expire time hasn't passed yet).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for clarifying!
9cba841
to
1294bae
Compare
1294bae
to
8ce99ce
Compare
8ce99ce
to
ce2732b
Compare
python/lib/core/dmod/core/dataset.py
Outdated
---------- | ||
value: DatasetManager | None | ||
The new value for ::attribute:`manager`. | ||
""" | ||
self._manager = value if isinstance(value, DatasetManager) else None | ||
|
||
if value is not None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we need to do this to make sure manager_uuid
also gets properly unset.
if value is not None: | |
if value is None: | |
self.manager_uuid = None | |
else: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for pointing this out, @robertbartel. Just pushed up a fix. I slightly altered the possible behavior, but I think its for the best. LMK if you think otherwise.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmmm, so what about the case when you are initializing from a deserialized json message that includes manager_uuid
? This boils down to is the invariant that an instance's manager
and manager_uuid
fields are always either None
or some valid value? Or can you have a manager_uuid
field without a manager
being set? I think the answer if for sure the latter.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FWIW the previous behavior pre-pydantic was to leave the manager_uuid
field if you set the manager with a None
value (or any non-covariant DatasetManager
type).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just pushed up a change to only modify manager_uuid
if a DatasetManager
type is passed as the value.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmmm, so what about the case when you are initializing from a deserialized json message that includes manager_uuid?
Assuming we keep the manager_uuid
attribute (I'll come back to this), we need to support this case. The motive for including the manager_uuid
field is that we can't serialize the manager, so instead we serialize a way to identify the right manager.
This boils down to is the invariant that an instance's manager and manager_uuid fields are always either None or some valid value?
Not strictly always. Practically, but not strictly. E.g., not immediately after deserialization.
FWIW the previous behavior pre-pydantic was to leave the manager_uuid field if you set the manager with a None value (or any non-covariant DatasetManager type).
Right, I thought earlier in the discussion for the PR that this was an oversight. The only reason to support setting the manager to None
, beyond when we are first initializing the object, would be to unset the manager. And we aren't properly unsetting the manager if the Dataset, when serialized, still reflects said manager's uuid. For now though, leave this as you have it; I'm starting to think it more a consequence of a problem in the design than an oversight.
I don't want us to get further caught up here, so I'll open a separate issue, but I think we should examine whether it is really appropriate/necessary/useful to serialize the manager's uuid with a dataset. Manager instance uuids end up being dynamically generated each time, and the service only permits one manager per DatasetType (i.e., there's only ever one OBJECT_STORE
type manager at a time, which gets an arbitrary uuid, so if we know a Dataset is OBJECT_STORE
, maybe we don't need to and shouldn't keep a manager uuid separately from a manager reference).
ce2732b
to
b6e5e1b
Compare
@robertbartel, can you give this one quick last glance? I think we are just about ready to merge. |
b6e5e1b
to
3c9045a
Compare
Just rebased to master and force pushed after the merge of #461. |
3c9045a
to
a3e3abe
Compare
b.c. pydantic does not allow defining custom property setter via a property decorator
a3e3abe
to
e800511
Compare
Reintroduce
Dataset
's__eq__
method. This was lost during the pydantic refactor. The current behavior is incorrect for several reasons. The most glaring beingDataset
's must have the samemanager_uuid
field values.dmod.core
--0.13.0
Additions
Dataset.set_manager
method. This replaces setting a manager instance as if it were a property.Changes
Dataset
's__eq__
method. This was lost during the pydantic refactor. The current behavior is incorrect for several reasons. The most glaring beingDataset
's must have the samemanager_uuid
field values.dmod.modeldata
--0.9.4
Changes
dmod.core.Dataset
'sset_manager
api.