Merge pull request #9 from LSSTDESC/user/aimalz/renaming

Estimation syntax consistency
LSSTDESC · Oct 26, 2023 · c0195c4 · c0195c4
2 parents d2adcd9 + 08fd581
commit c0195c4
Show file tree

Hide file tree

Showing 5 changed files with 35 additions and 37 deletions.
diff --git a/.gitignore b/.gitignore
@@ -25,6 +25,7 @@ share/python-wheels/
 .installed.cfg
 *.egg
 MANIFEST
+_version.py
 
 # PyInstaller
 #  Usually these files are written by a python script from a template

diff --git a/README.md b/README.md
@@ -12,11 +12,11 @@ See https://ui.adsabs.harvard.edu/abs/2018AJ....155....1G/abstract
 for more details on the code
 Any use of `rail_cmnn` in a paper or report should cite [Graham et al. (2018)](https://ui.adsabs.harvard.edu/abs/2018AJ....155....1G/abstract).
 
-The current version of the code consists of a training stage, `Inform_CMNNPDF`, that computes colors for a set of training data and an estimation stage `CMNNPDF` that calculates the Mahalanobis distance to each training galaxy for each test galaxy. The mean value of this Guassian PDF can be estimated in one of three ways (see `selection mode` below), and the width is determined by the standard deviation of training galaxy redshifts within the threshold Mahalanobis distance.  Future implementation improvements may change the output format to include multiple Gaussians.
+The current version of the code consists of a training stage, `CMNNInformer`, that computes colors for a set of training data and an estimation stage `CMNNEstimator` that calculates the Mahalanobis distance to each training galaxy for each test galaxy. The mean value of this Guassian PDF can be estimated in one of three ways (see `selection mode` below), and the width is determined by the standard deviation of training galaxy redshifts within the threshold Mahalanobis distance.  Future implementation improvements may change the output format to include multiple Gaussians.
 
-For the color calculation, there is an option for how to treat the "non-detections" in a band: the default choice is to ignore any colors that contain a non-detect magnitude and adjust the number of degrees of freedom in the Mahalanobis distance accordingly (this is how the CMNN algorithm was originally implemented). Or, if the configuration parameter `nondetect_replace` is set to `True` in `Inform_CMNNPDF`, the non-detected magnitudes will be replaced with the 1-sigma limiting magnitude in each band as supplied by the user via the `mag_limits` configuration parameter (or by the default 1-sigma limits if the user does not supply specific numbers). We have not done any exploration of the relative performance of these two choices, but note that there is not a significant performance difference in terms of runtime between the two methods.
+For the color calculation, there is an option for how to treat the "non-detections" in a band: the default choice is to ignore any colors that contain a non-detect magnitude and adjust the number of degrees of freedom in the Mahalanobis distance accordingly (this is how the CMNN algorithm was originally implemented). Or, if the configuration parameter `nondetect_replace` is set to `True` in `CMNNInformer`, the non-detected magnitudes will be replaced with the 1-sigma limiting magnitude in each band as supplied by the user via the `mag_limits` configuration parameter (or by the default 1-sigma limits if the user does not supply specific numbers). We have not done any exploration of the relative performance of these two choices, but note that there is not a significant performance difference in terms of runtime between the two methods.
 
-`Inform_CMNNPDF` takes in a training data set and returns a model file that simply consists of the computed colors and color errors (magnitude errors added in quadrature) for that dataset, the model to be used in the `CMNNPDF` stage. A modification of the original CMNN algorithm, "nondetections" are now replaced by the 1-sigma limiting magnitudes and the non-detect magnitude errors replaced with a value of 1.0.  The config parameters that can be set by the user for `Inform_CMNNPDF` are:<br>
+`CMNNInformer` takes in a training data set and returns a model file that simply consists of the computed colors and color errors (magnitude errors added in quadrature) for that dataset, the model to be used in the `CMNNEstimator` stage. A modification of the original CMNN algorithm, "nondetections" are now replaced by the 1-sigma limiting magnitudes and the non-detect magnitude errors replaced with a value of 1.0.  The config parameters that can be set by the user for `CMNNInformer` are:<br>
 - `bands`: list of the band names that should be present in the input data.<br>
 - `err_bands`: list of the magnitude error column names that should be present in the input data.<br>
 - `redshift_col`: a string giving the name for the redshift column present in the input data.<br>
@@ -25,8 +25,8 @@ For the color calculation, there is an option for how to treat the "non-detectio
 - `nondetect_replace`: bool, if set to False (the default) this option ignores colors with non-detected values in the Mahalanobis distance calculation, with a corresponding drop in the degrees of freedom value. If set to True, the method will replace non-detections with the 1-sigma limiting magnitudes specified via `mag_limits` (or default 1-sigma limits if not supplied), and will use all colors in the Mahalanobis distance calculation.
 
 
-The parameters that can be set via the `config_params` in `CMNNPDF` are described in brief below:<br>
-- `bands`, `err_bands`, `redshift_col`, `mag_limits` are all the same as described above for `Inform_CMNNPDF.`<br>
+The parameters that can be set via the `config_params` in `CMNNEstimator` are described in brief below:<br>
+- `bands`, `err_bands`, `redshift_col`, `mag_limits` are all the same as described above for `CMNNInformer`.<br>
 - `ppf_value`: float, usually 0.68 or 0.95, which sets the value of the PPF used in the Mahalanobis distance calculation.<br>
 - `selection_mode`: int, selects how the central value of the Gaussian PDF is calculated in the algorithm, if set to `0` randomly chooses from set within the Mahalanobis distance, if set to `1` chooses the nearest neighbor point, if set to `2` adds a distance weight to the random choice.<br>
 - `min_n`: int, the minimum number of training galaxies to use.<br>

diff --git a/examples/CMNN_demo.ipynb b/examples/CMNN_demo.ipynb
@@ -13,13 +13,13 @@
     "\n",
     "CMNN stands for color-matched nearest-neighbor, and as this name implies, the method works by finding the Mahalanobis distance between each test galaxy and the training galaxies, and selecting one of those \"nearby\" in color space as the redshift estimate.  The algorithm also estimates the \"width\" of the resulting PDF based on the standard deviation of this nearby set and returns a single Gaussian with a mean and width defined as such.<br>\n",
     "\n",
-    "The current version of the code consists of a training stage, `Inform_CMNNPDF`, that computes colors for a set of training data and an estimation stage `CMNNPDF` that calculates the Mahalanobis distance to each training galaxy for each test galaxy and returns a single Guassian PDF for each galaxy.   The mean value of this Gaussian PDF can be estimated in one of three ways (see selection mode below), and the width is determined by the standard deviation of training galaxy redshifts within the threshold Mahalanobis distance. Future implementation improvements may change the output format to include multiple Gaussians.\n",
+    "The current version of the code consists of a training stage, `CMNNInformer`, that computes colors for a set of training data and an estimation stage `CMNNEstimator` that calculates the Mahalanobis distance to each training galaxy for each test galaxy and returns a single Guassian PDF for each galaxy.   The mean value of this Gaussian PDF can be estimated in one of three ways (see selection mode below), and the width is determined by the standard deviation of training galaxy redshifts within the threshold Mahalanobis distance. Future implementation improvements may change the output format to include multiple Gaussians.\n",
     "\n",
-    "For the color calculation, there is an option for how to treat the \"non-detections\" in a band: the default choice is to ignore any colors that contain a non-detect magnitude and adjust the number of degrees of freedom in the Mahalanobis distance accordingly (this is how the CMNN algorithm was originally implemented).  Or, if the configuration parameter `nondetect_replace` is set to `True` in `Inform_CMNNPDF`, the non-detected magnitudes will be replaced with the 1-sigma limiting magnitude in each band as supplied by the user via the `mag_limits` configuration parameter (or by the default 1-sigma limits if the user does not supply specific numbers).  We have not done any exploration of the relative performance of these two choices, but note that there is not a significant performance difference in terms of runtime between the two methods.<br>\n",
+    "For the color calculation, there is an option for how to treat the \"non-detections\" in a band: the default choice is to ignore any colors that contain a non-detect magnitude and adjust the number of degrees of freedom in the Mahalanobis distance accordingly (this is how the CMNN algorithm was originally implemented).  Or, if the configuration parameter `nondetect_replace` is set to `True` in `CMNNInformer`, the non-detected magnitudes will be replaced with the 1-sigma limiting magnitude in each band as supplied by the user via the `mag_limits` configuration parameter (or by the default 1-sigma limits if the user does not supply specific numbers).  We have not done any exploration of the relative performance of these two choices, but note that there is not a significant performance difference in terms of runtime between the two methods.<br>\n",
     "\n",
     "In addition to the Gaussian PDF for each test galaxy, two ancillary quantities are stored: `zmode`: the mode of the redshift PDF and `Ncm`, the integer number of \"nearby\" galaxies considered as neighbors for each galaxy.<br>\n",
     "\n",
-    "`Inform_CMNNPDF` takes in a training data set and returns a model file that simply consists of the computed colors and color errors (magnitude errors added in quadrature) for that dataset, the model to be used in the `CMNNPDF` stage. A modification of the original CMNN algorithm, \"nondetections\" are now replaced by the 1-sigma limiting magnitudes and the non-detect magnitude errors replaced with a value of 1.0. The config parameters that can be set by the user for `Inform_CMNNPDF` are:\n",
+    "`CMNNInformer` takes in a training data set and returns a model file that simply consists of the computed colors and color errors (magnitude errors added in quadrature) for that dataset, the model to be used in the `CMNNEstimator` stage. A modification of the original CMNN algorithm, \"nondetections\" are now replaced by the 1-sigma limiting magnitudes and the non-detect magnitude errors replaced with a value of 1.0. The config parameters that can be set by the user for `CMNNInformer` are:\n",
     "\n",
     "- bands: list of the band names that should be present in the input data.<br>\n",
     "- err_bands: list of the magnitude error column names that should be present in the input data.<br>\n",
@@ -29,9 +29,9 @@
     "- nondetect_replace: bool, if set to `False` (the default) this option ignores colors with non-detected values in the Mahalanobis distance calculation, with a corresponding drop in the degrees of freedom value.  If set to `True`, the method will replace non-detections with the 1-sigma limiting magnitudes specified via `mag_limits` (or default 1-sigma limits if not supplied), and will use all colors in the Mahalanobis distance calculation.<br>\n",
     "\n",
     "\n",
-    "The parameters that can be set via the config_params in `CMNNPDF` are described in brief below:\n",
+    "The parameters that can be set via the config_params in `CMNNEstimator` are described in brief below:\n",
     "\n",
-    "- bands, err_bands, redshift_col, mag_limits are all the same as described above for Inform_CMNNPDF.\n",
+    "- bands, err_bands, redshift_col, mag_limits are all the same as described above for CMNNInformer.\n",
     "- ppf_value: float, usually 0.68 or 0.95, which sets the value of the PPF used in the Mahalanobis distance calculation.\n",
     "- selection_mode: int, selects how the central value of the Gaussian PDF is calculated in the algorithm, if set to **0** randomly chooses from set within the Mahalanobis distance, if set to **1** chooses the nearest neighbor point, if set to **2** adds a distance weight to the random choice.\n",
     "- min_n: int, the minimum number of training galaxies to use.\n",
@@ -41,7 +41,7 @@
     "- bad_redshift_err: float, in the unlikely case that there are not enough training galaxies, this Gaussian width will be assigned to galaxies.\n",
     "\n",
     "\n",
-    "Let's grab some example data, train the model by running the `Inform_CMNNPDF` `inform` method, then calculate a set of photo-z's using `CMNNPDF` `estimate`.  Much of the following is copied from the `RAIL_estiation_demo.ipynb` in the RAIL repo, so look at that notebook for general questions on setting up the RAIL infrastructure for estimators."
+    "Let's grab some example data, train the model by running the `CMNNInformer` `inform` method, then calculate a set of photo-z's using `CMNNEstimator` `estimate`.  Much of the following is copied from the `RAIL_estiation_demo.ipynb` in the RAIL repo, so look at that notebook for general questions on setting up the RAIL infrastructure for estimators."
    ]
   },
   {
@@ -92,7 +92,7 @@
    "metadata": {},
    "source": [
     "## The code-specific parameters\n",
-    "As mentioned above, CMNN has particular configuration options that can be set when setting up an instance of our `Inform_CMNNPDF` stage, we'll define those in a dictionary.  Any parameters not specifically assigned will take on default values."
+    "As mentioned above, CMNN has particular configuration options that can be set when setting up an instance of our `CMNNInformer` stage, we'll define those in a dictionary.  Any parameters not specifically assigned will take on default values."
    ]
   },
   {
@@ -117,8 +117,8 @@
    "metadata": {},
    "outputs": [],
    "source": [
-    "from rail.estimation.algos.cmnn import Inform_CMNNPDF, CMNNPDF\n",
-    "pz_train = Inform_CMNNPDF.make_stage(name='inform_CMNN', model='demo_cmnn_model.pkl', **cmnn_dict)"
+    "from rail.estimation.algos.cmnn import CMNNInformer, CMNNEstimator\n",
+    "pz_train = CMNNInformer.make_stage(name='inform_CMNN', model='demo_cmnn_model.pkl', **cmnn_dict)"
    ]
   },
   {
@@ -174,7 +174,7 @@
    "outputs": [],
    "source": [
     "%%time\n",
-    "pz = CMNNPDF.make_stage(name='CMNN', hdf5_groupname='photometry',\n",
+    "pz = CMNNEstimator.make_stage(name='CMNN', hdf5_groupname='photometry',\n",
     "                        model=pz_train.get_handle('model'),\n",
     "                        min_n=20,\n",
     "                        selection_mode=1,\n",
@@ -252,7 +252,7 @@
    "metadata": {},
    "outputs": [],
    "source": [
-    "pz_rand = CMNNPDF.make_stage(name='CMNN_rand', hdf5_groupname='photometry',\n",
+    "pz_rand = CMNNEstimator.make_stage(name='CMNN_rand', hdf5_groupname='photometry',\n",
     "                             model=pz_train.get_handle('model'),\n",
     "                             min_n=20,\n",
     "                             selection_mode=0,\n",
@@ -289,7 +289,7 @@
    "metadata": {},
    "outputs": [],
    "source": [
-    "pz_weight = CMNNPDF.make_stage(name='CMNN_weight', hdf5_groupname='photometry',\n",
+    "pz_weight = CMNNEstimator.make_stage(name='CMNN_weight', hdf5_groupname='photometry',\n",
     "                               model=pz_train.get_handle('model'),\n",
     "                               min_n=20,\n",
     "                               selection_mode=2,\n",

diff --git a/src/rail/estimation/algos/cmnn.py b/src/rail/estimation/algos/cmnn.py
@@ -30,11 +30,11 @@ def _computecolordata(data, column_names, err_names):
     return coldata.T, errdata.T
 
 
-class Inform_CMNNPDF(CatInformer):
+class CMNNInformer(CatInformer):
     """compute colors and color errors for CMNN training set and
-       store in a model file that will be used by the CMNNPDF stage
+       store in a model file that will be used by the CMNNEstimator stage
     """
-    name = 'Inform_CMNNPDF'
+    name = 'CMNNInformer'
     config_options = CatInformer.config_options.copy()
     config_options.update(bands=SHARED_PARAMS,
                           err_bands=SHARED_PARAMS,
@@ -78,7 +78,7 @@ def run(self):
         self.add_data('model', self.model)
 
 
-class CMNNPDF(CatEstimator):
+class CMNNEstimator(CatEstimator):
     """Color Matched Nearest Neighbor Estimator
     Note that there are several modifications from the original CMNN, mainly that
     the original estimator dropped non-detections from the Mahalnobis distance
@@ -103,7 +103,7 @@ class CMNNPDF(CatEstimator):
     should only happen if the number of training galaxies is smaller than
     min_n, which is unlikely, but is included here for completeness.
     """
-    name = 'CMNNPDF'
+    name = 'CMNNEstimator'
     config_options = CatEstimator.config_options.copy()
     config_options.update(zmin=SHARED_PARAMS,
                           zmax=SHARED_PARAMS,

diff --git a/tests/test_algos.py b/tests/test_algos.py
@@ -30,13 +30,11 @@ def test_cmnn(out_method, zb_expected):
     estim_config_dict["selection_mode"] = out_method
     # zb_expected = np.array([0.13, 0.13, 0.13, 0.12, 0.12, 0.13, 0.12, 0.13,
     #                         0.12, 0.12])
-    train_algo = cmnn.Inform_CMNNPDF
-    pz_algo = cmnn.CMNNPDF
-    results, rerun_results, rerun3_results = one_algo(  # pylint: disable=unused-variable
-        "CMNN", train_algo, pz_algo, train_config_dict, estim_config_dict
-    )
-    assert np.isclose(results.ancil["zmode"], zb_expected, atol=0.02).all()
-    assert np.isclose(results.ancil["zmode"], rerun_results.ancil["zmode"]).all()
+    train_algo = cmnn.CMNNInformer
+    pz_algo = cmnn.CMNNEstimator
+    results, rerun_results, rerun3_results = one_algo("CMNN", train_algo, pz_algo, train_config_dict, estim_config_dict)
+    assert np.isclose(results.ancil['zmode'], zb_expected, atol=0.02).all()
+    assert np.isclose(results.ancil['zmode'], rerun_results.ancil['zmode']).all()
 
 
 def test_cmnn_nondetect_replace():
@@ -48,11 +46,10 @@ def test_cmnn_nondetect_replace():
 
     estim_config_dict["hdf5_groupname"] = "photometry"
     estim_config_dict["model"] = "model.tmp"
-    zb_expected = np.array([0.11, 0.15, 0.14, 0.13, 0.11, 0.13, 0.15, 0.15, 0.11, 0.11])
-    train_algo = cmnn.Inform_CMNNPDF
-    pz_algo = cmnn.CMNNPDF
-    results, rerun_results, rerun3_results = one_algo(  # pylint: disable=unused-variable
-        "CMNN", train_algo, pz_algo, train_config_dict, estim_config_dict
-    )
-    assert np.isclose(results.ancil["zmode"], zb_expected, atol=0.02).all()
-    assert np.isclose(results.ancil["zmode"], rerun_results.ancil["zmode"]).all()
+    zb_expected = np.array([0.11, 0.15, 0.14, 0.13, 0.11, 0.13, 0.15, 0.15,
+                            0.11, 0.11])
+    train_algo = cmnn.CMNNInformer
+    pz_algo = cmnn.CMNNEstimator
+    results, rerun_results, rerun3_results = one_algo("CMNN", train_algo, pz_algo, train_config_dict, estim_config_dict)
+    assert np.isclose(results.ancil['zmode'], zb_expected, atol=0.02).all()
+    assert np.isclose(results.ancil['zmode'], rerun_results.ancil['zmode']).all()