Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DM-45218: Refactoring diaPipe #237

Merged
merged 5 commits into from
Sep 20, 2024
Merged

DM-45218: Refactoring diaPipe #237

merged 5 commits into from
Sep 20, 2024

Conversation

abudlong
Copy link
Contributor

No description provided.

Copy link
Contributor

@isullivan isullivan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good overall. I have a few requests about organization and commit history that I would like to take another look at after you address these comments.

Also, please check that all dataset type definitions in the docstrings use back-ticks ` instead of single quotes '. Our documentation build system uses the back-ticks to make links and references, so the formatting is important.

Comment on lines 343 to 349
diaForcedSources = pd.DataFrame(columns=[
"diaForcedSourceId", "diaObjectID", "ccdVisitID", "psfFlux", "psfFluxErr",
"ra", "dec", "midpointMjdTai", "band",
])
diaForcedSources = convertTableToSdmSchema(schema, diaForcedSources,
tableName="DiaForcedSource",
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at this more carefully, all of this could be reduced to:
diaForcedSources = convertTableToSdmSchema(schema, pd.DataFrame(), tableName="DiaForcedSource")
since that function will add the necessary columns and will delete any additional columns.

objectIds = diaObjectIds[rng.randint(len(diaObjectIds), size=nSources)]
objectIds = diaObjectIds[rng.integers(len(diaObjectIds), size=nSources)]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Move this change to its own commit (it's a real bug fix, not just a refactor). It should be before the refactor commit in the commit history.

@@ -240,6 +268,7 @@ def makeExposure(flipX=False, flipY=False):
exposure.setDetector(detector)
exposure.info.setVisitInfo(visit)
exposure.setFilter(afwImage.FilterLabel(band='g'))
exposure.setPhotoCalib(afwImage.PhotoCalib(1., 0., exposure.getBBox()))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As above, move this change to its own commit (it's a real bug fix, not just a refactor). It should be before the refactor commit in the commit history.

return (associatedDiaSources, createResults.newDiaObjects)

@timeMethod
def associationCatalogMerge(self, preloadedDiaSources, associatedDiaSources, diaObjects, newDiaObjects):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LSST code style is to name methods in active tense, when possible. Here I would use mergeAssociatedCatalogs

# alertPackager needs correct columns
diaForcedSources = makeEmptyForcedSourceTable(self.schema)

# Write results to Alert Production Data Base (APDB)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Database is one word

Previously detected DiaSources, loaded from the APDB.
preloadedDiaForcedSources : `pandas.DataFrame`
Catalog of previously detected forced DiaSources, from the APDB.
associatedDiaSources : 'pandas.DataFrame'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rename as above.

preloadedDiaForcedSources : `pandas.DataFrame`
Catalog of previously detected forced DiaSources, from the APDB.
associatedDiaSources : 'pandas.DataFrame'
Associated DiaSources with DiaObjects.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about:
Updated DiaSource catalog with associated diaObjectIds.

Comment on lines 92 to 93
self.apdb = daxApdb.Apdb.from_config(apdb_config)
self.schema = readSchemaFromApdb(self.apdb)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Neither self.apdb or self.schema are used, so remove them.

Comment on lines 150 to 157
def _testRun(self, doPackageAlerts=False, doSolarSystemAssociation=False, doRunForcedMeasurement=False):
"""Test the normal workflow of each ap_pipe step.
"""
config = self._makeDefaultConfig(
config_file=self.config_file.name,
doPackageAlerts=doPackageAlerts,
doSolarSystemAssociation=doSolarSystemAssociation)
doSolarSystemAssociation=doSolarSystemAssociation,
doRunForcedMeasurement=doRunForcedMeasurement)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove doRunForcedMeasurement for now

Comment on lines +136 to +163
def makeSolarSystemSources(nSources, diaObjectIds, exposure, rng, randomizeObjects=False):
"""Make a test set of solar system sources.

Parameters
----------
nSources : `int`
Number of sources to create.
diaObjectIds : `numpy.ndarray`
Integer Ids of diaobjects to "associate" with the DiaSources.
exposure : `lsst.afw.image.Exposure`
Exposure to create sources over.
randomizeObjects : `bool`, optional
If True, randomly draw from `diaObjectIds` to generate the ids in the
output catalog, otherwise just iterate through them, repeating as
necessary to get nSources objectIds.

Returns
-------
solarSystemSources : `pandas.DataFrame`
Solar system sources generated across the exposure.
"""
solarSystemSources = makeDiaSources(nSources, diaObjectIds, exposure, rng, randomizeObjects=False)
solarSystemSources["ssObjectId"] = rng.integers(0, high=2**63-1, size=nSources)
solarSystemSources["Err(arcsec)"] = rng.uniform(0.2, 0.4, size=nSources)

return solarSystemSources


Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add this function on its own commit

@abudlong abudlong force-pushed the tickets/DM-45218 branch 3 times, most recently from fea6e8a to 5dfb07c Compare September 19, 2024 20:18
Copy link
Contributor

@isullivan isullivan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! Just a couple of minor cleanups this time around.

Comment on lines 737 to 738
# columns "ra" and "dec" are required for spatial sharding in Cassandra
diaForcedSources.rename(columns={"coord_ra": "ra", "coord_dec": "dec"}, inplace=True)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These lines were removed on main after you made your branch, and should be removed from the refactor.

@@ -451,11 +451,138 @@ def run(self,
# Accept either legacySolarSystemTable or optional solarSystemObjectTable.
if legacySolarSystemTable is not None and solarSystemObjectTable is None:
solarSystemObjectTable = legacySolarSystemTable

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not your change, but solarSystemObjectTable should be added to the docstring above (I can't comment on unchanged lines, so I had to put it here). You can just copy what you have in associateDiaSources

@@ -493,6 +620,38 @@ def run(self,
createResults.nNewDiaObjects,
nTotalSsObjects,
nAssociatedSsObjects)
return (associatedDiaSources, createResults.newDiaObjects)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be helpful to add a log message just before the return:

        self.log.info("%i updated and %i unassociated diaObjects. Creating %i new diaObjects",
                      assocResults.nUpdatedDiaObjects,
                      assocResults.nUnassociatedDiaObjects,
                      createResults.nNewDiaObjects,
                      )

return diaForcedSources

@timeMethod
def writeToApdb(self, associatedDiaSources, diaForcedSources, updatedDiaObjects):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Re-order parameters to match the order supplied to apdb.store:

def writeToApdb(self, updatedDiaObjects, associatedDiaSources, diaForcedSources):

Make sure to also update the order of the parameters in run where self.writeToApdb is called.

Comment on lines +333 to +344


def makeEmptyForcedSourceTable(schema):
"""Return a dataframe with the correct columns for diaForcedSources table.

Returns
-------
diaForcedSources : `pandas.DataFrame`
Empty dataframe.
"""
diaForcedSources = convertTableToSdmSchema(schema, pd.DataFrame(), tableName="DiaForcedSource")
return diaForcedSources
Copy link
Contributor

@isullivan isullivan Sep 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Edit: leave this as-is.

@abudlong abudlong merged commit 71bdef3 into main Sep 20, 2024
2 checks passed
@abudlong abudlong deleted the tickets/DM-45218 branch September 20, 2024 22:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants