-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DM-38850: Make trailedAssociatorTask #173
Conversation
56c586d
to
62f1e4b
Compare
The pull request should have the ticket number in it: https://developer.lsst.io/work/flow.html#make-a-pull-request |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have you run this? I think it would have failed a linter, at least. There were a lot of typos and leftover print/import statements that were clearly from debugging; please try to have those cleaned up before you send it for review.
There are no tests specific to the new Task itself: please add a file to test the new Task output.
Is this really worth having a separate Task, instead of just adding a config, branch, and method to AssociationTask
? It's just ~3 lines of code; do we expect that we will be significantly expanding functionality compared to what you've implemented here, and are there any other places where we need this as a Task?
43b2a7d
to
d7c4173
Compare
3dedc44
to
2c8f5ae
Compare
Here is the jenkins run using this branch. |
9e17d3b
to
d555112
Compare
@parejkoj Are you happy with the changes? If so I can rebase onto main and merge the changes. |
- ``trailed_dia_sources`` : DIASources that have trailed more | ||
than 0.416 arcseconds/second*exposure_time. (`pandas.DataFrame`) | ||
""" | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No newlines after function docstrings. https://developer.lsst.io/python/numpydoc.html#docstrings-of-methods-and-functions-should-not-be-preceded-or-followed-by-a-blank-line
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah yes, I fixed in one place but not another. I've fixed them all now.
if len(diaTrailedResult.trailedDiaSources) > 0: | ||
print("Trailed sources cleaned.") | ||
else: | ||
print("No trailed sources to clean.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like you removed print statements, but didn't add a log.info: I think we absolutely want to have a log statement about how many sources were removed.
"""A simple implementation of source association task for ap_verify. | ||
""" | ||
|
||
__all__ = ["TrailedSourceFilterTask", "TrailedSourceFilterConfig"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
KT has gone the other way on my dev guide suggestion, so lets not touch anything else right now while we sort that out.
result : `lsst.pipe.base.Struct` | ||
Results struct with components. | ||
|
||
- ``"dia_sources"`` : DiaSource table that is free from unwanted |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is how they should be documented: https://developer.lsst.io/python/numpydoc.html#struct-types
Please file a ticket to go through all of ap_assocation
and fix the Struct
docstrings, if you know some are wrong.
Boolean mask for DIASources which are greater than the | ||
cutoff length. | ||
""" | ||
diffIm_time = diffIm.getInfo().getVisitInfo().getExposureTime() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Difference images come from visits. Given we can take 2 snaps and there is a gap between the snaps, is the relevant time here the exposure time of a snap or the duration of the visit including the gap (which is greater than the exposure time)?
(also, if diffim
is an exposure then the docstring for diffIm
is wrong).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a policy and/or actual code that defines what the "exposure time" of a snap-combined image is? The docs for VisitInfo
just say:
get exposure duration (shutter open time); (sec)
Since we can't yet run CharacterizeImageTask
or CalibrateImageTask
on multi-snap visits, I'm pretty sure that this case is untestable either way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At the moment it's defined to be the sum of the exposure time of the snaps. Unlike ObservationInfo, VisitInfo doesn't record the start and end time, only the midpoint (which for two snaps might be a time where no data are being taken).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like you marked some comments resolved without making the requested changes (e.g. excessive newlines, only needing the exposure time not the full exposure), and didn't incorporate some others (e.g. referring to the default value in docstrings, making the check_dia_source_trail
method private)? I've flagged some of them with eyes, but please check for others that were missed.
tests/test_association_task.py
Outdated
for test_obj_id, expected_obj_id in zip( | ||
results.matchedDiaSources["diaObjectId"].to_numpy(), | ||
[1, 2, 3, 4]): | ||
self.assertEqual(test_obj_id, expected_obj_id) | ||
for test_obj_id, expected_obj_id in zip( | ||
results.unAssocDiaSources["diaObjectId"].to_numpy(), | ||
[0]): | ||
self.assertEqual(test_obj_id, expected_obj_id) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why can't you write these like self.assertEqual(results.matchedDiaSources["diaObjectId"], [1,2,3,4]
? Or at worst use np.testing.assert_arrays_equal
? The loops make it hard to follow what is being tested.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This results in ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()
for the arrays, and the original person who wrote the tests I followed likely did it in this way to get around this problem. I looked up ways around this, and this popped up as one of the suggested ways of comparing values in an array for unit testing. The other option seems to be self.assertTrue(np.array_equal(results.matchedDiaSources["diaObjectId"].values, [1,2], equal_nan=True))
, which I've swapped to since it seems a tad more explicit in what its doing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use np.testing.assert_arrays_equal
then, instead. That will give explicit information which values mismatch. These cannot be NaN, so there's no need for NaN-safe comparisons.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On that note, would it add clarity if I changed the other unit tests to follow this format?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That would definitely be helpful, yes! Please do it on a separate commit, though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Swapped to np.testing and also made a separate commit updating the prior unit tests
tests/test_trailedSourceFilter.py
Outdated
|
||
results = trailedSourceFilterTask.run(self.diaSources, self.exposure) | ||
|
||
self.assertEqual(len(results.diaSources), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Each of these tests should also have an assert on the contents of the other array returned in the struct (the sources that were filtered). Better to test on which ones were included than just the length: I think this is the first 3 in the first list, and the last two in the second? Similarly in test_run_short_max_trail
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added tests similar to self.assertTrue(np.array_equal(results.matchedDiaSources["diaObjectId"].values, [1,2], equal_nan=True))
to check that the output arrays are what is expected.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated to np.testing.
1af91a5
to
19a1c29
Compare
Update unit tests in test_association_task.py to use np.testing.assert_array_equal for testing array equality.
41bc3f9
to
20baae2
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few more small comments, and I unresolved and put eyes on a few others that it looks like you missed. Clean these up, and you're good to go.
diaTrailedResult = self.trailedSourceFilter.run(diaSources, exposure_time) | ||
matchResult = self.associate_sources(diaObjects, diaTrailedResult.diaSources) | ||
|
||
self.log.warning("%i DIASources exceed maxTrailLength, dropping " |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why a warning? Nothing went wrong, since the intent was to remove sources. log.info
is probably fine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I saw that the other log.info that drops a source was a warning. However, you are right since dropping the source makes it work as intended so it should be a log.info.
assocResults = self.associator.run(diaSourceTable, | ||
loaderResult.diaObjects) | ||
assocResults = self.associator.run(diaSourceTable, loaderResult.diaObjects, | ||
exposure_time=diffIm.getInfo().getVisitInfo().getExposureTime()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use the properties:
exposure_time=diffIm.getInfo().getVisitInfo().getExposureTime()) | |
exposure_time=diffIm.visitInfo.exposureTime) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Swapped to using properties.
"""Config class for TrailedSourceFilterTask. | ||
""" | ||
|
||
maxTrailLength = pexConfig.Field( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess I missed this earlier: if we're going with snake_case throughout, we should make the configs also snake_case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've swapped to snake case in just trailedSourceFilter.py and left it camel in association.py
Creates a mask for sources with lengths greater than 0.416 | ||
arcseconds/second multiplied by the exposure time. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again, don't mention the default value, since it's configurable.
Creates a mask for sources with lengths greater than 0.416 | |
arcseconds/second multiplied by the exposure time. | |
Return a mask of sources with lengths greater than ``config.maxTrailLength`` multiplied by the exposure time. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Swapped the wording to what you've suggested.
5f3e408
to
a907931
Compare
Plus several edits from review
a907931
to
a5f9b31
Compare
Jenkins run with this passing: https://rubin-ci.slac.stanford.edu/blue/organizations/jenkins/stack-os-matrix/detail/stack-os-matrix/354/pipeline |
Make trailedAssociatorTask which filters out trails whose lengths are above 0.416 arcseconds/second in length. The trailed sources are currently dropped once they are filtered out. More complexity for filtering will be added at a later date.