Skip to content

Commit

Permalink
Merge pull request #9 from LSSTDESC/user/aimalz/renaming
Browse files Browse the repository at this point in the history
Estimation syntax consistency
  • Loading branch information
sschmidt23 authored Oct 26, 2023
2 parents d2adcd9 + 08fd581 commit c0195c4
Show file tree
Hide file tree
Showing 5 changed files with 35 additions and 37 deletions.
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -25,6 +25,7 @@ share/python-wheels/
.installed.cfg
*.egg
MANIFEST
_version.py

# PyInstaller
# Usually these files are written by a python script from a template
Expand Down
10 changes: 5 additions & 5 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,11 +12,11 @@ See https://ui.adsabs.harvard.edu/abs/2018AJ....155....1G/abstract
for more details on the code
Any use of `rail_cmnn` in a paper or report should cite [Graham et al. (2018)](https://ui.adsabs.harvard.edu/abs/2018AJ....155....1G/abstract).

The current version of the code consists of a training stage, `Inform_CMNNPDF`, that computes colors for a set of training data and an estimation stage `CMNNPDF` that calculates the Mahalanobis distance to each training galaxy for each test galaxy. The mean value of this Guassian PDF can be estimated in one of three ways (see `selection mode` below), and the width is determined by the standard deviation of training galaxy redshifts within the threshold Mahalanobis distance. Future implementation improvements may change the output format to include multiple Gaussians.
The current version of the code consists of a training stage, `CMNNInformer`, that computes colors for a set of training data and an estimation stage `CMNNEstimator` that calculates the Mahalanobis distance to each training galaxy for each test galaxy. The mean value of this Guassian PDF can be estimated in one of three ways (see `selection mode` below), and the width is determined by the standard deviation of training galaxy redshifts within the threshold Mahalanobis distance. Future implementation improvements may change the output format to include multiple Gaussians.

For the color calculation, there is an option for how to treat the "non-detections" in a band: the default choice is to ignore any colors that contain a non-detect magnitude and adjust the number of degrees of freedom in the Mahalanobis distance accordingly (this is how the CMNN algorithm was originally implemented). Or, if the configuration parameter `nondetect_replace` is set to `True` in `Inform_CMNNPDF`, the non-detected magnitudes will be replaced with the 1-sigma limiting magnitude in each band as supplied by the user via the `mag_limits` configuration parameter (or by the default 1-sigma limits if the user does not supply specific numbers). We have not done any exploration of the relative performance of these two choices, but note that there is not a significant performance difference in terms of runtime between the two methods.
For the color calculation, there is an option for how to treat the "non-detections" in a band: the default choice is to ignore any colors that contain a non-detect magnitude and adjust the number of degrees of freedom in the Mahalanobis distance accordingly (this is how the CMNN algorithm was originally implemented). Or, if the configuration parameter `nondetect_replace` is set to `True` in `CMNNInformer`, the non-detected magnitudes will be replaced with the 1-sigma limiting magnitude in each band as supplied by the user via the `mag_limits` configuration parameter (or by the default 1-sigma limits if the user does not supply specific numbers). We have not done any exploration of the relative performance of these two choices, but note that there is not a significant performance difference in terms of runtime between the two methods.

`Inform_CMNNPDF` takes in a training data set and returns a model file that simply consists of the computed colors and color errors (magnitude errors added in quadrature) for that dataset, the model to be used in the `CMNNPDF` stage. A modification of the original CMNN algorithm, "nondetections" are now replaced by the 1-sigma limiting magnitudes and the non-detect magnitude errors replaced with a value of 1.0. The config parameters that can be set by the user for `Inform_CMNNPDF` are:<br>
`CMNNInformer` takes in a training data set and returns a model file that simply consists of the computed colors and color errors (magnitude errors added in quadrature) for that dataset, the model to be used in the `CMNNEstimator` stage. A modification of the original CMNN algorithm, "nondetections" are now replaced by the 1-sigma limiting magnitudes and the non-detect magnitude errors replaced with a value of 1.0. The config parameters that can be set by the user for `CMNNInformer` are:<br>
- `bands`: list of the band names that should be present in the input data.<br>
- `err_bands`: list of the magnitude error column names that should be present in the input data.<br>
- `redshift_col`: a string giving the name for the redshift column present in the input data.<br>
Expand All @@ -25,8 +25,8 @@ For the color calculation, there is an option for how to treat the "non-detectio
- `nondetect_replace`: bool, if set to False (the default) this option ignores colors with non-detected values in the Mahalanobis distance calculation, with a corresponding drop in the degrees of freedom value. If set to True, the method will replace non-detections with the 1-sigma limiting magnitudes specified via `mag_limits` (or default 1-sigma limits if not supplied), and will use all colors in the Mahalanobis distance calculation.


The parameters that can be set via the `config_params` in `CMNNPDF` are described in brief below:<br>
- `bands`, `err_bands`, `redshift_col`, `mag_limits` are all the same as described above for `Inform_CMNNPDF.`<br>
The parameters that can be set via the `config_params` in `CMNNEstimator` are described in brief below:<br>
- `bands`, `err_bands`, `redshift_col`, `mag_limits` are all the same as described above for `CMNNInformer`.<br>
- `ppf_value`: float, usually 0.68 or 0.95, which sets the value of the PPF used in the Mahalanobis distance calculation.<br>
- `selection_mode`: int, selects how the central value of the Gaussian PDF is calculated in the algorithm, if set to `0` randomly chooses from set within the Mahalanobis distance, if set to `1` chooses the nearest neighbor point, if set to `2` adds a distance weight to the random choice.<br>
- `min_n`: int, the minimum number of training galaxies to use.<br>
Expand Down
24 changes: 12 additions & 12 deletions examples/CMNN_demo.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -13,13 +13,13 @@
"\n",
"CMNN stands for color-matched nearest-neighbor, and as this name implies, the method works by finding the Mahalanobis distance between each test galaxy and the training galaxies, and selecting one of those \"nearby\" in color space as the redshift estimate. The algorithm also estimates the \"width\" of the resulting PDF based on the standard deviation of this nearby set and returns a single Gaussian with a mean and width defined as such.<br>\n",
"\n",
"The current version of the code consists of a training stage, `Inform_CMNNPDF`, that computes colors for a set of training data and an estimation stage `CMNNPDF` that calculates the Mahalanobis distance to each training galaxy for each test galaxy and returns a single Guassian PDF for each galaxy. The mean value of this Gaussian PDF can be estimated in one of three ways (see selection mode below), and the width is determined by the standard deviation of training galaxy redshifts within the threshold Mahalanobis distance. Future implementation improvements may change the output format to include multiple Gaussians.\n",
"The current version of the code consists of a training stage, `CMNNInformer`, that computes colors for a set of training data and an estimation stage `CMNNEstimator` that calculates the Mahalanobis distance to each training galaxy for each test galaxy and returns a single Guassian PDF for each galaxy. The mean value of this Gaussian PDF can be estimated in one of three ways (see selection mode below), and the width is determined by the standard deviation of training galaxy redshifts within the threshold Mahalanobis distance. Future implementation improvements may change the output format to include multiple Gaussians.\n",
"\n",
"For the color calculation, there is an option for how to treat the \"non-detections\" in a band: the default choice is to ignore any colors that contain a non-detect magnitude and adjust the number of degrees of freedom in the Mahalanobis distance accordingly (this is how the CMNN algorithm was originally implemented). Or, if the configuration parameter `nondetect_replace` is set to `True` in `Inform_CMNNPDF`, the non-detected magnitudes will be replaced with the 1-sigma limiting magnitude in each band as supplied by the user via the `mag_limits` configuration parameter (or by the default 1-sigma limits if the user does not supply specific numbers). We have not done any exploration of the relative performance of these two choices, but note that there is not a significant performance difference in terms of runtime between the two methods.<br>\n",
"For the color calculation, there is an option for how to treat the \"non-detections\" in a band: the default choice is to ignore any colors that contain a non-detect magnitude and adjust the number of degrees of freedom in the Mahalanobis distance accordingly (this is how the CMNN algorithm was originally implemented). Or, if the configuration parameter `nondetect_replace` is set to `True` in `CMNNInformer`, the non-detected magnitudes will be replaced with the 1-sigma limiting magnitude in each band as supplied by the user via the `mag_limits` configuration parameter (or by the default 1-sigma limits if the user does not supply specific numbers). We have not done any exploration of the relative performance of these two choices, but note that there is not a significant performance difference in terms of runtime between the two methods.<br>\n",
"\n",
"In addition to the Gaussian PDF for each test galaxy, two ancillary quantities are stored: `zmode`: the mode of the redshift PDF and `Ncm`, the integer number of \"nearby\" galaxies considered as neighbors for each galaxy.<br>\n",
"\n",
"`Inform_CMNNPDF` takes in a training data set and returns a model file that simply consists of the computed colors and color errors (magnitude errors added in quadrature) for that dataset, the model to be used in the `CMNNPDF` stage. A modification of the original CMNN algorithm, \"nondetections\" are now replaced by the 1-sigma limiting magnitudes and the non-detect magnitude errors replaced with a value of 1.0. The config parameters that can be set by the user for `Inform_CMNNPDF` are:\n",
"`CMNNInformer` takes in a training data set and returns a model file that simply consists of the computed colors and color errors (magnitude errors added in quadrature) for that dataset, the model to be used in the `CMNNEstimator` stage. A modification of the original CMNN algorithm, \"nondetections\" are now replaced by the 1-sigma limiting magnitudes and the non-detect magnitude errors replaced with a value of 1.0. The config parameters that can be set by the user for `CMNNInformer` are:\n",
"\n",
"- bands: list of the band names that should be present in the input data.<br>\n",
"- err_bands: list of the magnitude error column names that should be present in the input data.<br>\n",
Expand All @@ -29,9 +29,9 @@
"- nondetect_replace: bool, if set to `False` (the default) this option ignores colors with non-detected values in the Mahalanobis distance calculation, with a corresponding drop in the degrees of freedom value. If set to `True`, the method will replace non-detections with the 1-sigma limiting magnitudes specified via `mag_limits` (or default 1-sigma limits if not supplied), and will use all colors in the Mahalanobis distance calculation.<br>\n",
"\n",
"\n",
"The parameters that can be set via the config_params in `CMNNPDF` are described in brief below:\n",
"The parameters that can be set via the config_params in `CMNNEstimator` are described in brief below:\n",
"\n",
"- bands, err_bands, redshift_col, mag_limits are all the same as described above for Inform_CMNNPDF.\n",
"- bands, err_bands, redshift_col, mag_limits are all the same as described above for CMNNInformer.\n",
"- ppf_value: float, usually 0.68 or 0.95, which sets the value of the PPF used in the Mahalanobis distance calculation.\n",
"- selection_mode: int, selects how the central value of the Gaussian PDF is calculated in the algorithm, if set to **0** randomly chooses from set within the Mahalanobis distance, if set to **1** chooses the nearest neighbor point, if set to **2** adds a distance weight to the random choice.\n",
"- min_n: int, the minimum number of training galaxies to use.\n",
Expand All @@ -41,7 +41,7 @@
"- bad_redshift_err: float, in the unlikely case that there are not enough training galaxies, this Gaussian width will be assigned to galaxies.\n",
"\n",
"\n",
"Let's grab some example data, train the model by running the `Inform_CMNNPDF` `inform` method, then calculate a set of photo-z's using `CMNNPDF` `estimate`. Much of the following is copied from the `RAIL_estiation_demo.ipynb` in the RAIL repo, so look at that notebook for general questions on setting up the RAIL infrastructure for estimators."
"Let's grab some example data, train the model by running the `CMNNInformer` `inform` method, then calculate a set of photo-z's using `CMNNEstimator` `estimate`. Much of the following is copied from the `RAIL_estiation_demo.ipynb` in the RAIL repo, so look at that notebook for general questions on setting up the RAIL infrastructure for estimators."
]
},
{
Expand Down Expand Up @@ -92,7 +92,7 @@
"metadata": {},
"source": [
"## The code-specific parameters\n",
"As mentioned above, CMNN has particular configuration options that can be set when setting up an instance of our `Inform_CMNNPDF` stage, we'll define those in a dictionary. Any parameters not specifically assigned will take on default values."
"As mentioned above, CMNN has particular configuration options that can be set when setting up an instance of our `CMNNInformer` stage, we'll define those in a dictionary. Any parameters not specifically assigned will take on default values."
]
},
{
Expand All @@ -117,8 +117,8 @@
"metadata": {},
"outputs": [],
"source": [
"from rail.estimation.algos.cmnn import Inform_CMNNPDF, CMNNPDF\n",
"pz_train = Inform_CMNNPDF.make_stage(name='inform_CMNN', model='demo_cmnn_model.pkl', **cmnn_dict)"
"from rail.estimation.algos.cmnn import CMNNInformer, CMNNEstimator\n",
"pz_train = CMNNInformer.make_stage(name='inform_CMNN', model='demo_cmnn_model.pkl', **cmnn_dict)"
]
},
{
Expand Down Expand Up @@ -174,7 +174,7 @@
"outputs": [],
"source": [
"%%time\n",
"pz = CMNNPDF.make_stage(name='CMNN', hdf5_groupname='photometry',\n",
"pz = CMNNEstimator.make_stage(name='CMNN', hdf5_groupname='photometry',\n",
" model=pz_train.get_handle('model'),\n",
" min_n=20,\n",
" selection_mode=1,\n",
Expand Down Expand Up @@ -252,7 +252,7 @@
"metadata": {},
"outputs": [],
"source": [
"pz_rand = CMNNPDF.make_stage(name='CMNN_rand', hdf5_groupname='photometry',\n",
"pz_rand = CMNNEstimator.make_stage(name='CMNN_rand', hdf5_groupname='photometry',\n",
" model=pz_train.get_handle('model'),\n",
" min_n=20,\n",
" selection_mode=0,\n",
Expand Down Expand Up @@ -289,7 +289,7 @@
"metadata": {},
"outputs": [],
"source": [
"pz_weight = CMNNPDF.make_stage(name='CMNN_weight', hdf5_groupname='photometry',\n",
"pz_weight = CMNNEstimator.make_stage(name='CMNN_weight', hdf5_groupname='photometry',\n",
" model=pz_train.get_handle('model'),\n",
" min_n=20,\n",
" selection_mode=2,\n",
Expand Down
10 changes: 5 additions & 5 deletions src/rail/estimation/algos/cmnn.py
Original file line number Diff line number Diff line change
Expand Up @@ -30,11 +30,11 @@ def _computecolordata(data, column_names, err_names):
return coldata.T, errdata.T


class Inform_CMNNPDF(CatInformer):
class CMNNInformer(CatInformer):
"""compute colors and color errors for CMNN training set and
store in a model file that will be used by the CMNNPDF stage
store in a model file that will be used by the CMNNEstimator stage
"""
name = 'Inform_CMNNPDF'
name = 'CMNNInformer'
config_options = CatInformer.config_options.copy()
config_options.update(bands=SHARED_PARAMS,
err_bands=SHARED_PARAMS,
Expand Down Expand Up @@ -78,7 +78,7 @@ def run(self):
self.add_data('model', self.model)


class CMNNPDF(CatEstimator):
class CMNNEstimator(CatEstimator):
"""Color Matched Nearest Neighbor Estimator
Note that there are several modifications from the original CMNN, mainly that
the original estimator dropped non-detections from the Mahalnobis distance
Expand All @@ -103,7 +103,7 @@ class CMNNPDF(CatEstimator):
should only happen if the number of training galaxies is smaller than
min_n, which is unlikely, but is included here for completeness.
"""
name = 'CMNNPDF'
name = 'CMNNEstimator'
config_options = CatEstimator.config_options.copy()
config_options.update(zmin=SHARED_PARAMS,
zmax=SHARED_PARAMS,
Expand Down
27 changes: 12 additions & 15 deletions tests/test_algos.py
Original file line number Diff line number Diff line change
Expand Up @@ -30,13 +30,11 @@ def test_cmnn(out_method, zb_expected):
estim_config_dict["selection_mode"] = out_method
# zb_expected = np.array([0.13, 0.13, 0.13, 0.12, 0.12, 0.13, 0.12, 0.13,
# 0.12, 0.12])
train_algo = cmnn.Inform_CMNNPDF
pz_algo = cmnn.CMNNPDF
results, rerun_results, rerun3_results = one_algo( # pylint: disable=unused-variable
"CMNN", train_algo, pz_algo, train_config_dict, estim_config_dict
)
assert np.isclose(results.ancil["zmode"], zb_expected, atol=0.02).all()
assert np.isclose(results.ancil["zmode"], rerun_results.ancil["zmode"]).all()
train_algo = cmnn.CMNNInformer
pz_algo = cmnn.CMNNEstimator
results, rerun_results, rerun3_results = one_algo("CMNN", train_algo, pz_algo, train_config_dict, estim_config_dict)
assert np.isclose(results.ancil['zmode'], zb_expected, atol=0.02).all()
assert np.isclose(results.ancil['zmode'], rerun_results.ancil['zmode']).all()


def test_cmnn_nondetect_replace():
Expand All @@ -48,11 +46,10 @@ def test_cmnn_nondetect_replace():

estim_config_dict["hdf5_groupname"] = "photometry"
estim_config_dict["model"] = "model.tmp"
zb_expected = np.array([0.11, 0.15, 0.14, 0.13, 0.11, 0.13, 0.15, 0.15, 0.11, 0.11])
train_algo = cmnn.Inform_CMNNPDF
pz_algo = cmnn.CMNNPDF
results, rerun_results, rerun3_results = one_algo( # pylint: disable=unused-variable
"CMNN", train_algo, pz_algo, train_config_dict, estim_config_dict
)
assert np.isclose(results.ancil["zmode"], zb_expected, atol=0.02).all()
assert np.isclose(results.ancil["zmode"], rerun_results.ancil["zmode"]).all()
zb_expected = np.array([0.11, 0.15, 0.14, 0.13, 0.11, 0.13, 0.15, 0.15,
0.11, 0.11])
train_algo = cmnn.CMNNInformer
pz_algo = cmnn.CMNNEstimator
results, rerun_results, rerun3_results = one_algo("CMNN", train_algo, pz_algo, train_config_dict, estim_config_dict)
assert np.isclose(results.ancil['zmode'], zb_expected, atol=0.02).all()
assert np.isclose(results.ancil['zmode'], rerun_results.ancil['zmode']).all()

0 comments on commit c0195c4

Please sign in to comment.