-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DM-46129: Make collections.query_info a single HTTP call #1074
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #1074 +/- ##
==========================================
+ Coverage 89.66% 89.68% +0.01%
==========================================
Files 359 360 +1
Lines 46935 46988 +53
Branches 9652 9653 +1
==========================================
+ Hits 42086 42139 +53
+ Misses 3481 3480 -1
- Partials 1368 1369 +1 ☔ View full report in Codecov by Sentry. |
f42382b
to
cdb1bfe
Compare
DM-45993 was merged, please rebase and add forwarding of new |
Previously collections.query_info would make an HTTP call for each resolved collection. It now does everything in a single call to the server instead.
There is no reason to add a duplicate API for this, because get_info is identical to calling query_info with a single collection name.
The various Butler sub-objects (registry, collections) need to access the registry defaults. Added a small helper class so that they don't need a reference to RemoteButler or each other in order to fetch the defaults.
Now that there is an independent implementation of ButlerCollections for RemoteButler, forward the equivalent Registry methods to it to reduce duplication.
Update RemoteButlerCollections to pass the new optimization parameters added in DM-45993 to the server.
cdb1bfe
to
8c3ca9c
Compare
c2d6947
to
cc654d1
Compare
When attempting to add a test for the summary_datasets parameter to ButlerCollectionInfo for RemoteButler, it turned out that it does not doing anything in the DirectButler version. This was occurring because a caching context causes this parameter to be ignored, and the cache was always enabled in query_info previously. This cache does not have a benefit for the new implementation.
cc654d1
to
701bd48
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great. Thanks for adding a unit test for new query_info
parameters! Maybe we should try to simplify things by passing dataset type names instead of dataset types, but that is up to you.
@@ -796,3 +796,21 @@ def _unpickle_via_factory(factory: Callable, args: Any, kwargs: Any) -> DatasetT | |||
arguments as well as positional arguments. | |||
""" | |||
return factory(*args, **kwargs) | |||
|
|||
|
|||
def get_dataset_type_name(datasetTypeOrName: DatasetType | str) -> str: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe make it a classmethod of DatasetType
to avoid extra imports?
from ...._dataset_ref import DatasetId, DatasetIdGenEnum, DatasetRef, DatasetType | ||
from ...._dataset_type import get_dataset_type_name |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
DatasetType
above should also be coming from _dataset_type
.
@@ -280,7 +283,7 @@ def query_info( | |||
include_parents: bool = False, | |||
include_summary: bool = False, | |||
include_doc: bool = False, | |||
summary_datasets: Iterable[DatasetType] | None = None, | |||
summary_datasets: Iterable[DatasetType] | Iterable[str] | None = None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we allow Iterable[str]
only to avoid all complications? I think that clients of this method that actually pass summary_datasets
should already have a list of names for other purposes. I guess this also means that datasets manager has to accept Iterable[str]
instead of Iterable[DatasetType]
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was thinking the same thing, but the convention in the rest of the new query system is that dataset types can be specified as either the name or DatasetType
instance so it makes sense to follow that. It's also common for this parameter to come from the result of a queryDatasetTypes call.
datasetTypeOrName : `DatasetType` | `str` | ||
A DatasetType, or the name of a DatasetType. This union is a common | ||
parameter in many `Butler` methods. | ||
""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Docstring also needs Returns
.
Previously
RemoteButlerCollections.query_info
would make an HTTP call for each resolved collection. It now does everything in a single call to the server instead.RemoteButlerCollections.get_info
and other collection methods inRemoteButlerRegistry
have also been updated to re-use the new endpoint.Also fixed an issue where the
summary_datasets
parameter toDirectButlerCollections.query_info
was not doing anything.Checklist
doc/changes
configs/old_dimensions