Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DM-39939: use query-results' grouping when processing iterables of DatasetRefs #863

Merged
merged 2 commits into from
Jul 14, 2023

Conversation

TallJimbo
Copy link
Member

@TallJimbo TallJimbo commented Jul 8, 2023

Checklist

  • ran Jenkins
  • added a release note for user-visible changes to doc/changes

@codecov
Copy link

codecov bot commented Jul 8, 2023

Codecov Report

Patch coverage: 89.51% and project coverage change: +0.01 🎉

Comparison is base (5b07c4d) 87.90% compared to head (8213990) 87.92%.

❗ Current head 8213990 differs from pull request most recent head a3ec38a. Consider uploading reports for the commit a3ec38a to get more accurate results

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #863      +/-   ##
==========================================
+ Coverage   87.90%   87.92%   +0.01%     
==========================================
  Files         273      270       -3     
  Lines       35764    35697      -67     
  Branches     7474     7478       +4     
==========================================
- Hits        31440    31388      -52     
+ Misses       3166     3150      -16     
- Partials     1158     1159       +1     
Impacted Files Coverage Δ
python/lsst/daf/butler/registries/sql.py 85.07% <ø> (+0.02%) ⬆️
python/lsst/daf/butler/core/progress.py 86.99% <79.62%> (+0.57%) ⬆️
...ython/lsst/daf/butler/registry/queries/_results.py 89.94% <80.00%> (+<0.01%) ⬆️
python/lsst/daf/butler/core/datasets/ref.py 84.33% <100.00%> (+0.36%) ⬆️
tests/test_progress.py 100.00% <100.00%> (ø)

... and 29 files with indirect coverage changes

☔ View full report in Codecov by Sentry.
📢 Do you have feedback about the report comment? Let us know in this issue.

@TallJimbo TallJimbo force-pushed the tickets/DM-39939 branch 2 times, most recently from 45e7f53 to 8213990 Compare July 10, 2023 15:27
Copy link
Contributor

@andy-slac andy-slac left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, one suggestion to make it a bit more type-checkable.

Comment on lines 614 to 637
if hasattr(refs, "_iter_by_dataset_type"):
return refs._iter_by_dataset_type()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not super-happy with this dynamic approach, particularly because our favorite mypy cannot type-check this. Would it be possible to add an abstraction (or maybe Protocol with runtime check) so that isinstance can be used instead?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've switched to a runtime-checkable protocol, but I've given it a leading underscore (and documented it as "package private") since I don't want external code to start using this.

refs : `~collections.abc.Iterable` [ `DatasetRef` ]
`DatasetRef` instances to group. If this has a
``_iter_by_dataset_type`` method, it will be called with no
arguments and the result reutrnd.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
arguments and the result reutrnd.
arguments and the result returned.

self.assertEqual(MockProgressBar.last.total, 2)

def test_iter_item_chunks_not_sized(self):
"""Test using `Progress.iter_item_chunks` with an unsized iterable
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"""Test using `Progress.iter_item_chunks` with an unsized iterable
"""Test using `Progress.iter_item_chunks` with an unsized iterable of

When processing all dataset types in a collection together, this can
represent a huge decrease in memory usage, by querying for and then
processing only one dataset type at a time.
@TallJimbo TallJimbo merged commit f670e8b into main Jul 14, 2023
@TallJimbo TallJimbo deleted the tickets/DM-39939 branch July 14, 2023 16:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants