Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: Dataset __eq__ and set_manager #512

Merged
merged 6 commits into from
Feb 2, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion python/lib/core/dmod/core/_version.py
Original file line number Diff line number Diff line change
@@ -1 +1 @@
__version__ = '0.12.5'
__version__ = '0.13.0'
83 changes: 66 additions & 17 deletions python/lib/core/dmod/core/dataset.py
Original file line number Diff line number Diff line change
Expand Up @@ -168,28 +168,77 @@ def _serialize_datetime(self: "Dataset", value: Optional[datetime]) -> Optional[

def __init__(self, manager: DatasetManager = None, **kwargs):
super().__init__(**kwargs)

self._manager = manager if isinstance(manager, DatasetManager) else None

if manager is not None:
# pydantic will not validate this, so we need to check it
if not isinstance(manager.uuid, UUID):
raise ValueError(f"Expected UUID got {type(manager.uuid)}")
self.manager_uuid = manager.uuid
self.set_manager(manager)

def __eq__(self, other):
now = datetime.now()
def cond_eq(a, b):
return a is None or b is None or a == b
return (
isinstance(other, Dataset)
and (
(self.expires is not None and self.expires > now)
or self.expires is None
)
and (
(other.expires is not None and other.expires > now)
or other.expires is None
)
and self.access_location == other.access_location
and self.category == other.category
and self.created_on == other.created_on
and self.data_domain == other.data_domain
and self.dataset_type == other.dataset_type
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are several properties that are not examined here. For some, like manager_uuid, I think this makes sense. For others, like uuid, my first reaction is that it doesn't makes sense to omit it. And for still others, like derived_from, I'm not sure. (FWIW, I think those that are optional like uuid should be optionally checked, with a None value not disqualifying from equality.)

Any particular reason why some of these weren't added?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for bringing up these questions! I had similar questions myself. I went back in the git history to the first point before the pydantic refactor work and copied and pasted the __eq__ implementation from that point. I thought that would at least be a good starting point.

I am not sure if we should compare uuid's or not? I lean toward yes, but I could hear the argument for no. Our idea of a dataset has slightly changed since pre-pydantic refactor as we now use, more or less, aggregate datasets for everything but forcing data for a simulation. I think that change in thinking was probably why they were not compared in the previous version of the code when the thinking was to determine suitable existing datasets that could be used to satisfy a given simulation request? Is that fair and what are your thoughts?

Copy link
Contributor

@robertbartel robertbartel Feb 1, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Our idea of a dataset has slightly changed since pre-pydantic refactor as we now use, more or less, aggregate datasets for everything but forcing data for a simulation.

I don't think our abstract idea has change; we just evolved another more practical data format.

Here's what I suggest:

  • expires: test both are in the future that neither has an explicit expire time value in the past (i.e., None is allowed)
    • disagreeing about how long data is good for doesn't imply disagreeing on whether your are talking about the same data, as long as everyone agrees the data is still good
  • derived_from: optionally test for equality
    • different explicit origins implies different data, but lack of information about origin does not
  • derivations: ignore
    • doesn't really say anything about "this" data
  • last_updated: optionally test for equality
    • if one side says things have updated more recently than the other, chances are these are not truly talking about the same data
  • uuid: ??? ... for now, optionally test for equality, but this needs its own issue for follow-up
    • I thought more about this, and I'm less sure of what to do
    • does it make sense to have a universally unique identifier if it is not universally unique (i.e., if there are ever two non-equal objects with the same value, even if the value is None)?
    • let's not allow this PR to get stuck here, but we might need to do more regarding this attribute to make the design consistent

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it okay if expires is None on both datasets?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, yes, I should have clarified, but I think that's fine also. None implies the dataset is not temporary. Certainly if both are None, this fits with equality. And I don't think disagreeing on whether the dataset is temporary should necessarily imply looking at different data (again, as long as the expire time hasn't passed yet).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for clarifying!

and self.is_read_only == other.is_read_only
and self.name == other.name
and cond_eq(self.derived_from, other.derived_from)
and cond_eq(self.last_updated, other.last_updated)
and cond_eq(self.uuid, other.uuid)
)

@property
def manager(self) -> Optional[DatasetManager]:
"""
The ::class:`DatasetManager` for this instance.

Returns
-------
DatasetManager
The ::class:`DatasetManager` for this instance.
"""
return self._manager

@manager.setter
def manager(self, value: DatasetManager = None):
self._manager = value if isinstance(value, DatasetManager) else None
def set_manager(self, value: Union[DatasetManager, None]):
"""
Sets the ::class:`DatasetManager` and updates the
::attribute:`manager_uuid` property on this instances.
If `None` is passed, the instances
::attribute:`manager_uuid` property is untouched.

Parameters
----------
value: DatasetManager | None
The new value for ::attribute:`manager`.

Raises
-------
TypeError
Raised if non-None value not a DatasetManager
ValueError
Raised if DatasetManager value's uuid property is not a uuid.UUID
"""
if value is None:
self._manager = None
return

# pydantic will not validate this, so we need to check it
if not isinstance(value, DatasetManager):
raise TypeError(f"Expected DatasetManager got {type(value)}")
if not isinstance(value.uuid, UUID):
raise ValueError(f"Expected UUID got {type(value.uuid)}")

if value is not None:
# pydantic will not validate this, so we need to check it
if not isinstance(value.uuid, UUID):
raise ValueError(f"Expected UUID got {type(value.uuid)}")
self.manager_uuid = value.uuid
self._manager = value
aaraney marked this conversation as resolved.
Show resolved Hide resolved
self.manager_uuid = value.uuid

def __hash__(self):
members = [
Expand Down Expand Up @@ -795,7 +844,7 @@ def is_managed_dataset(self, dataset: Dataset) -> bool:
return False

if dataset.manager is None and self.uuid == dataset.manager_uuid:
dataset.manager = self
dataset.set_manager(self)

return

Expand Down
2 changes: 1 addition & 1 deletion python/lib/modeldata/dmod/modeldata/_version.py
Original file line number Diff line number Diff line change
@@ -1 +1 @@
__version__ = '0.9.3'
__version__ = '0.9.4'
Original file line number Diff line number Diff line change
Expand Up @@ -182,7 +182,7 @@ def reload(self, reload_from: str, serialized_item: Optional[str] = None) -> Dat
dataset = Dataset.factory_init_from_deserialized_json(dataset_json)
if dataset is None:
raise DmodRuntimeError("Unable to reload dataset: could not deserialize a object from the loaded JSON data")
dataset.manager = self
dataset.set_manager(self)
self.datasets[dataset.name] = dataset
return dataset

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -583,7 +583,7 @@ def reload(self, reload_from: str, serialized_item: Optional[str] = None) -> Dat
response_data["type"] = list(self.supported_dataset_types)[0].name

dataset = Dataset.factory_init_from_deserialized_json(response_data)
dataset.manager = self
dataset.set_manager(self)
self.datasets[dataset_name] = dataset
return dataset

Expand Down
Loading