-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DM-45218: Refactoring diaPipe #237
Conversation
dc15d6a
to
a712fa2
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good overall. I have a few requests about organization and commit history that I would like to take another look at after you address these comments.
Also, please check that all dataset type definitions in the docstrings use back-ticks ` instead of single quotes '. Our documentation build system uses the back-ticks to make links and references, so the formatting is important.
python/lsst/ap/association/utils.py
Outdated
diaForcedSources = pd.DataFrame(columns=[ | ||
"diaForcedSourceId", "diaObjectID", "ccdVisitID", "psfFlux", "psfFluxErr", | ||
"ra", "dec", "midpointMjdTai", "band", | ||
]) | ||
diaForcedSources = convertTableToSdmSchema(schema, diaForcedSources, | ||
tableName="DiaForcedSource", | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking at this more carefully, all of this could be reduced to:
diaForcedSources = convertTableToSdmSchema(schema, pd.DataFrame(), tableName="DiaForcedSource")
since that function will add the necessary columns and will delete any additional columns.
objectIds = diaObjectIds[rng.randint(len(diaObjectIds), size=nSources)] | ||
objectIds = diaObjectIds[rng.integers(len(diaObjectIds), size=nSources)] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Move this change to its own commit (it's a real bug fix, not just a refactor). It should be before the refactor commit in the commit history.
@@ -240,6 +268,7 @@ def makeExposure(flipX=False, flipY=False): | |||
exposure.setDetector(detector) | |||
exposure.info.setVisitInfo(visit) | |||
exposure.setFilter(afwImage.FilterLabel(band='g')) | |||
exposure.setPhotoCalib(afwImage.PhotoCalib(1., 0., exposure.getBBox())) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As above, move this change to its own commit (it's a real bug fix, not just a refactor). It should be before the refactor commit in the commit history.
return (associatedDiaSources, createResults.newDiaObjects) | ||
|
||
@timeMethod | ||
def associationCatalogMerge(self, preloadedDiaSources, associatedDiaSources, diaObjects, newDiaObjects): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LSST code style is to name methods in active tense, when possible. Here I would use mergeAssociatedCatalogs
# alertPackager needs correct columns | ||
diaForcedSources = makeEmptyForcedSourceTable(self.schema) | ||
|
||
# Write results to Alert Production Data Base (APDB) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Database
is one word
Previously detected DiaSources, loaded from the APDB. | ||
preloadedDiaForcedSources : `pandas.DataFrame` | ||
Catalog of previously detected forced DiaSources, from the APDB. | ||
associatedDiaSources : 'pandas.DataFrame' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rename as above.
preloadedDiaForcedSources : `pandas.DataFrame` | ||
Catalog of previously detected forced DiaSources, from the APDB. | ||
associatedDiaSources : 'pandas.DataFrame' | ||
Associated DiaSources with DiaObjects. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about:
Updated DiaSource catalog with associated diaObjectIds.
tests/test_diaPipe.py
Outdated
self.apdb = daxApdb.Apdb.from_config(apdb_config) | ||
self.schema = readSchemaFromApdb(self.apdb) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Neither self.apdb
or self.schema
are used, so remove them.
tests/test_diaPipe.py
Outdated
def _testRun(self, doPackageAlerts=False, doSolarSystemAssociation=False, doRunForcedMeasurement=False): | ||
"""Test the normal workflow of each ap_pipe step. | ||
""" | ||
config = self._makeDefaultConfig( | ||
config_file=self.config_file.name, | ||
doPackageAlerts=doPackageAlerts, | ||
doSolarSystemAssociation=doSolarSystemAssociation) | ||
doSolarSystemAssociation=doSolarSystemAssociation, | ||
doRunForcedMeasurement=doRunForcedMeasurement) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove doRunForcedMeasurement
for now
def makeSolarSystemSources(nSources, diaObjectIds, exposure, rng, randomizeObjects=False): | ||
"""Make a test set of solar system sources. | ||
|
||
Parameters | ||
---------- | ||
nSources : `int` | ||
Number of sources to create. | ||
diaObjectIds : `numpy.ndarray` | ||
Integer Ids of diaobjects to "associate" with the DiaSources. | ||
exposure : `lsst.afw.image.Exposure` | ||
Exposure to create sources over. | ||
randomizeObjects : `bool`, optional | ||
If True, randomly draw from `diaObjectIds` to generate the ids in the | ||
output catalog, otherwise just iterate through them, repeating as | ||
necessary to get nSources objectIds. | ||
|
||
Returns | ||
------- | ||
solarSystemSources : `pandas.DataFrame` | ||
Solar system sources generated across the exposure. | ||
""" | ||
solarSystemSources = makeDiaSources(nSources, diaObjectIds, exposure, rng, randomizeObjects=False) | ||
solarSystemSources["ssObjectId"] = rng.integers(0, high=2**63-1, size=nSources) | ||
solarSystemSources["Err(arcsec)"] = rng.uniform(0.2, 0.4, size=nSources) | ||
|
||
return solarSystemSources | ||
|
||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add this function on its own commit
fea6e8a
to
5dfb07c
Compare
5dfb07c
to
94431bc
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good! Just a couple of minor cleanups this time around.
# columns "ra" and "dec" are required for spatial sharding in Cassandra | ||
diaForcedSources.rename(columns={"coord_ra": "ra", "coord_dec": "dec"}, inplace=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These lines were removed on main
after you made your branch, and should be removed from the refactor.
@@ -451,11 +451,138 @@ def run(self, | |||
# Accept either legacySolarSystemTable or optional solarSystemObjectTable. | |||
if legacySolarSystemTable is not None and solarSystemObjectTable is None: | |||
solarSystemObjectTable = legacySolarSystemTable | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not your change, but solarSystemObjectTable
should be added to the docstring above (I can't comment on unchanged lines, so I had to put it here). You can just copy what you have in associateDiaSources
@@ -493,6 +620,38 @@ def run(self, | |||
createResults.nNewDiaObjects, | |||
nTotalSsObjects, | |||
nAssociatedSsObjects) | |||
return (associatedDiaSources, createResults.newDiaObjects) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be helpful to add a log message just before the return:
self.log.info("%i updated and %i unassociated diaObjects. Creating %i new diaObjects",
assocResults.nUpdatedDiaObjects,
assocResults.nUnassociatedDiaObjects,
createResults.nNewDiaObjects,
)
return diaForcedSources | ||
|
||
@timeMethod | ||
def writeToApdb(self, associatedDiaSources, diaForcedSources, updatedDiaObjects): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Re-order parameters to match the order supplied to apdb.store
:
def writeToApdb(self, updatedDiaObjects, associatedDiaSources, diaForcedSources):
Make sure to also update the order of the parameters in run
where self.writeToApdb
is called.
|
||
|
||
def makeEmptyForcedSourceTable(schema): | ||
"""Return a dataframe with the correct columns for diaForcedSources table. | ||
|
||
Returns | ||
------- | ||
diaForcedSources : `pandas.DataFrame` | ||
Empty dataframe. | ||
""" | ||
diaForcedSources = convertTableToSdmSchema(schema, pd.DataFrame(), tableName="DiaForcedSource") | ||
return diaForcedSources |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Edit: leave this as-is.
e02711f
to
f1c5103
Compare
No description provided.