Merge pull request #49 from LSSTDESC/user/aimalz/renaming

naming consistency/clarity within src/rail/estimation
LSSTDESC · Jul 15, 2023 · 45ecb4c · 45ecb4c
2 parents 63eb5e7 + f679441
commit 45ecb4c
Show file tree

Hide file tree

Showing 17 changed files with 170 additions and 159 deletions.
diff --git a/README.md b/README.md
@@ -1,35 +1,31 @@
-# pz-rail-hub
+# pz-rail
 
-[![Template](https://img.shields.io/badge/Template-LINCC%20Frameworks%20Python%20Project%20Template-brightgreen)](https://lincc-ppt.readthedocs.io/en/latest/)
-[![codecov](https://codecov.io/gh/LSSTDESC/pz-rail-hub/branch/main/graph/badge.svg)](https://codecov.io/gh/LSSTDESC/pz-rail-hub)
 [![PyPI](https://img.shields.io/pypi/v/hub?color=blue&logo=pypi&logoColor=white)](https://pypi.org/project/hub/)
+[![DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.7017551.svg)](https://doi.org/10.5281/zenodo.7017551)
+[![codecov](https://codecov.io/gh/LSSTDESC/pz-rail/branch/main/graph/badge.svg)](https://codecov.io/gh/LSSTDESC/pz-rail)
+[![Template](https://img.shields.io/badge/Template-LINCC%20Frameworks%20Python%20Project%20Template-brightgreen)](https://lincc-ppt.readthedocs.io/en/latest/)
 
-TODO - add more about your project here.
-
-## RAIL: Redshift Assessment Infrastructure Layers
+# RAIL: Redshift Assessment Infrastructure Layers
 
-This package is part of the larger ecosystem of Photometric Redshifts
-in [RAIL](https://github.com/LSSTDESC/RAIL).
+RAIL is a flexible software library providing tools to produce at-scale photometric redshift data products, including uncertainties and summary statistics, and stress-test them under realistically complex systematics.
+A detailed description of RAIL's modular structure is available in the [Overview](https://lsstdescrail.readthedocs.io/en/stable/source/overview.html) on ReadTheDocs.
 
-### Citing RAIL
+RAIL serves as the infrastructure supporting many extragalactic applications of the Legacy Survey of Space and Time (LSST) on the Vera C. Rubin Observatory, including Rubin-wide commissioning activities. 
+RAIL was initiated by the Photometric Redshifts (PZ) Working Group (WG) of the LSST Dark Energy Science Collaboration (DESC) as a result of the lessons learned from the [Data Challenge 1 (DC1) experiment](https://academic.oup.com/mnras/article/499/2/1587/5905416) to enable the PZ WG Deliverables in [the LSST-DESC Science Roadmap (see Sec. 5.18)](https://lsstdesc.org/assets/pdf/docs/DESC_SRM_latest.pdf), aiming to guide the selection and implementation of redshift estimators in DESC analysis pipelines.
+RAIL is developed and maintained by a diverse team comprising DESC Pipeline Scientists (PSs), international in-kind contributors, LSST Interdisciplinary Collaboration for Computing (LINCC) Frameworks software engineers, and other volunteers, but all are welcome to join the team regardless of LSST data rights. 
 
-This code, while public on GitHub, has not yet been released by DESC and is
-still under active development. Our release of v1.0 will be accompanied by a
-journal paper describing the development and validation of RAIL.
+## Installation
 
-If you make use of the ideas or software in RAIL, please cite the repository 
-<https://github.com/LSSTDESC/RAIL>. You are welcome to re-use the code, which
-is open source and available under terms consistent with the MIT license.
+Installation instructions are available under [Installation](https://lsstdescrail.readthedocs.io/en/stable/source/installation.html) on ReadTheDocs.
 
-External contributors and DESC members wishing to use RAIL for non-DESC projects
-should consult with the Photometric Redshifts (PZ) Working Group conveners,
-ideally before the work has started, but definitely before any publication or 
-posting of the work to the arXiv.
+## Contributing
 
-### Citing this package
+The greatest strength of RAIL is its extensibility; those interested in contributing to RAIL should start by consulting the [Contributing guidelines](https://lsstdescrail.readthedocs.io/en/stable/source/contributing.html) on ReadTheDocs.
 
-If you use this package, you should also cite the appropriate papers for each
-code used.  A list of such codes is included in the 
-[Citing RAIL](https://lsstdescrail.readthedocs.io/en/stable/source/citing.html)
-section of the main RAIL Read The Docs page.
+## Citing RAIL
 
+RAIL is open source and may be used according to the terms of its [LICENSE](https://github.com/LSSTDESC/RAIL/blob/main/LICENSE) [(BSD 3-Clause)](https://opensource.org/licenses/BSD-3-Clause).
+If you make use of the ideas or software here in any publication, you must cite this repository <https://github.com/LSSTDESC/RAIL> as "LSST-DESC PZ WG (in prep)" with the [Zenodo DOI](https://doi.org/10.5281/zenodo.7017551).
+Please consider also inviting the developers as co-authors on publications resulting from your use of RAIL by [making an issue](https://github.com/LSSTDESC/rail/issues/new/choose).
+Additionally, several of the codes accessible through the RAIL ecosystem must be cited if used in a publication.
+A convenient list of what to cite may be found under [Citing RAIL](https://lsstdescrail.readthedocs.io/en/stable/source/citing.html) on ReadTheDocs.
diff --git a/docs/source/citing.rst b/docs/source/citing.rst
@@ -27,7 +27,7 @@ The following list provides the necessary references for external codes accessib
 
 | GPz: 
 
-| PZFlowPDF:
+| PZFlowEstimator:
 | J. F. Crenshaw et al (in prep)
 | `Zenodo link <https://zenodo.org/record/6369625#.Ylcpjy-cYW8>`_
 
@@ -38,4 +38,4 @@ The following list provides the necessary references for external codes accessib
 | trainZ:
 | `Schmidt, Malz et al (2020) <https://ui.adsabs.harvard.edu/abs/2020MNRAS.499.1587S/abstract>`_
 
-| varInference: 
+| VarInfStackSummarizer: 
diff --git a/docs/source/contributing.rst b/docs/source/contributing.rst
@@ -83,6 +83,8 @@ Once you are satisfied with your PR, request that other team members review and
 approve it. You could send the request to someone whom you've worked with on the 
 topic, or one of the core maintainers of rail.
 
+**TODO what to call branches goes here**
+
 
 Merge
 -----
@@ -93,6 +95,15 @@ Once the changes in your PR have been approved, these are your next steps:
 2. enter ``closes #[#]`` in the comment field to close the resolved issue
 3. delete your branch using the button on the merged pull request.
 
+If you are making changes that affect multiple repositories, make a branch and PR on each one.
+The PRs should be merged and new releases made in the following order without long delays between steps:
+1. `rail_base`
+2. all per-algorithm repositories in any order
+3. `rail`
+4. `rail_pipelines`
+This will minimize the time when new installations from PyPI could be broken by conflicts.
+
+
 Reviewing a PR
 --------------
 
@@ -118,36 +129,39 @@ Naming conventions
 We follow the `pep8 <https://peps.python.org/pep-0008/#descriptive-naming-styles>`_ 
 recommendations for naming new modules and ``RailStage`` classes within them.
 
+
 Modules
 -------
 
 Modules should use all lowercase, with underscores where it aids the readability
-of the module name. If the module performs only one of p(z) or n(z) calculations,
-it is convenient to include that in the module name.
+of the module name. 
 
-e.g. 
+For example:
 
-*  ``simple_neurnet`` is a module name for algorithms that use simple neural networks from sklearn to compute p(z) or n(z)
-*  ``random_pz`` is an algorithm that computes p(z)
+*  ``skl_neurnet`` is a module name for algorithms that use scikit-learn's simple neural network implementation to estimate p(z)
+*  ``random_gauss`` is a module name for a p(z) estimation algorithm that assigns each galaxy a random Gaussian distribution
+
+It's good for the module name to specify the source of the implementation of a particularly common algorithm, e.g. ``minisom_som`` and ``somoclu_som`` are distinct.
+Note that these names should not be identical to the name of the package the algorithm came from, to avoid introducing namespace collisions for users who have imported the original package as well, i.e. ``pzflow_nf`` is a safer name than ``pzflow``.
 
 
 Stages
 ------
 
-RailStages are python classes and so should use CapWords convention. All rail 
-stages using the same algorithm should use the same short, descriptive prefix, 
-and the suffix is the type of stage.
+RailStages are python classes and so should use the CapWords convention. All 
+rail stages using the same algorithm should use the same short, descriptive 
+prefix, and the suffix is the type of stage.
 
 e.g.
 
-*  ``SimpleNNInformer`` is an informer using a simple neural network
-*  ``SimpleNNEstimator`` is an estimator using a simple neural network
+*  ``KNearNeighInformer`` is an informer using the k-nearest neighbors algorithm
+*  ``KNearNeighEstimator`` is an estimator using the k-nearest neighbors algorithm
 
 Possible suffixes include:
 
-* Summarizer
 * Informer
 * Estimator
+* Summarizer
 * Classifier
 * Creator
 * Degrader
@@ -164,3 +178,4 @@ for those workflows:
 * :ref:`Adding a new Rail Stage` without new dependencies
 * :ref:`Adding a new algorithm` (new engine or package)
 * :ref:`Sharing a Rail Pipeline`
+
diff --git a/docs/source/installation.rst b/docs/source/installation.rst
@@ -187,15 +187,15 @@ For Delight you should be able to just do:
 
     pip install pz-rail-delight
 
-However, the particular estimator `Delight` is built with `Cython` and uses `openmp`.  Mac has dropped native support for `openmp`, which will likely cause problems when trying to run the `delightPZ` estimation code in RAIL.  See the notes below for instructions on installing Delight if you wish to use this particular estimator.
+However, the particular estimator `Delight` is built with `Cython` and uses `openmp`.  Mac has dropped native support for `openmp`, which will likely cause problems when trying to run the `DelightEstimator` estimation code in RAIL.  See the notes below for instructions on installing Delight if you wish to use this particular estimator.
 
-If you are installing RAIL on a Mac, as noted above the `delightPZ` estimator requires that your machine's `gcc` be set up to work with `openmp`. If you are installing on a Mac and do not plan on using `delightPZ`, then you can simply install RAIL with `pip install .[base]` rather than `pip install .[all]`, which will skip the Delight package.  If you are on a Mac and *do* expect to run `delightPZ`, then follow the instructions `here <https://github.com/LSSTDESC/Delight/blob/master/Mac_installation.md>`_ to install Delight before running `pip install .[all]`.
+If you are installing RAIL on a Mac, as noted above the `DelightEstimator` estimator requires that your machine's `gcc` be set up to work with `openmp`. If you are installing on a Mac and do not plan on using `DelightEstimator`, then you can simply install RAIL with `pip install .[base]` rather than `pip install .[all]`, which will skip the Delight package.  If you are on a Mac and *do* expect to run `DelightEstimator`, then follow the instructions `here <https://github.com/LSSTDESC/Delight/blob/master/Mac_installation.md>`_ to install Delight before running `pip install .[all]`.
 
 
-Installing FZBoost
+Installing FlexZBoost
 ------------------
 
-For FZBoost, you should be able to just do
+For FlexZBoost, you should be able to just do
 
 .. code-block:: bash
 
@@ -229,7 +229,7 @@ Using GPU-optimization for pzflow
 Note that the Creation Module depends on pzflow, which has an optional GPU-compatible installation.
 For instructions, see the `pzflow Github repo <https://github.com/jfcrenshaw/pzflow/>`_.
 
-On some systems that are slightly out of date, e.g. an older version of python's `setuptools`, there can be some problems installing packages hosted on GitHub rather than PyPi.  We recommend that you update your system; however, some users have still reported problems with installation of subpackages necessary for `FZBoost` and `bpz_lite`.  If this occurs, try the following procedure:
+On some systems that are slightly out of date, e.g. an older version of python's `setuptools`, there can be some problems installing packages hosted on GitHub rather than PyPi.  We recommend that you update your system; however, some users have still reported problems with installation of subpackages necessary for `flexzboost` and `bpz_lite`.  If this occurs, try the following procedure:
 
 Once you have installed RAIL, you can import the package (via `import rail`) in any of your scripts and notebooks.
 For examples demonstrating how to use the different pieces, see the notebooks in the `examples/` directory.

diff --git a/docs/source/overview.rst b/docs/source/overview.rst
@@ -72,8 +72,8 @@ Methods that estimate per-galaxy PDFs directly from photometry are referred to a
 Individual estimation and summarization codes are "wrapped" as RAIL stages so that they can be run in a controlled way.  
 
 **base design**: 
-Estimators for several popular codes `BPZ_lite` (a slimmed down version of the popular template-based BPZ code), `FlexZBoost`, and delight `Delight` are included in rail/estimation, as are an estimator `PZFlowPDF` that uses the same normalizing flow employed in the creation module, and `KNearNeighPDF` for a simple color-based nearest neighbor estimator.  
-The pathological `trainZ` estimator is also implemented.  
+Estimators for several popular codes `BPZliteEstimator` (a slimmed down version of the popular template-based BPZ code), `FlexZBoostEstimator`, and `DelightEstimator` are included in rail/estimation, as are an estimator `PZFlowEstimator` that uses the same normalizing flow employed in the creation module, and `KNearNeighEstimator` for a simple color-based nearest neighbor estimator.  
+The pathological `TrainZEstimator` estimator is also implemented.  
 Several very basic summarizers such as a histogram of point source estimates, the naive "stacking"/summing of PDFs, and a variational inference-based summarizer are also included in RAIL.
 
 **Usage**: 

diff --git a/examples/core_examples/FileIO_DataStore.ipynb b/examples/core_examples/FileIO_DataStore.ipynb
@@ -221,7 +221,7 @@
    "source": [
     "# Using the data in a pipeline stage: photo-z estimation example\n",
     "\n",
-    "Now that we have our data in place, we can use it in a RAIL stage.  As an example, we'll estimate photo-z's for our data.  Let's train the `KNearNeighPDF` algorithm with our train_data, and then estimate photo-z's for the test_data.  We need to make the RAIL stages for each of these steps, first we need to train/inform our nearest neighbor algorithm with the train_data:"
+    "Now that we have our data in place, we can use it in a RAIL stage.  As an example, we'll estimate photo-z's for our data.  Let's train the `KNearNeighEstimator` algorithm with our train_data, and then estimate photo-z's for the test_data.  We need to make the RAIL stages for each of these steps, first we need to train/inform our nearest neighbor algorithm with the train_data:"
    ]
   },
   {
@@ -230,7 +230,7 @@
    "metadata": {},
    "outputs": [],
    "source": [
-    "from rail.estimation.algos.knnpz import Inform_KNearNeighPDF, KNearNeighPDF"
+    "from rail.estimation.algos.k_nearneigh import KNearNeighInformer, KNearNeighEstimator"
    ]
   },
   {
@@ -239,7 +239,7 @@
    "metadata": {},
    "outputs": [],
    "source": [
-    "inform_knn = Inform_KNearNeighPDF.make_stage(name='inform_knn', input='train_data', \n",
+    "inform_knn = KNearNeighInformer.make_stage(name='inform_knn', input='train_data', \n",
     "                                            nondetect_val=99.0, model='knnpz.pkl',\n",
     "                                            hdf5_groupname='')\n"
    ]
@@ -268,7 +268,7 @@
    "metadata": {},
    "outputs": [],
    "source": [
-    "estimate_knn = KNearNeighPDF.make_stage(name='estimate_knn', hdf5_groupname='photometry', nondetect_val=99.0,\n",
+    "estimate_knn = KNearNeighEstimator.make_stage(name='estimate_knn', hdf5_groupname='photometry', nondetect_val=99.0,\n",
     "                                        model='knnpz.pkl', output=\"KNNPZ_estimates.hdf5\")"
    ]
   },

diff --git a/examples/estimation_examples/NZDir.ipynb b/examples/estimation_examples/NZDir.ipynb
@@ -37,7 +37,7 @@
    "metadata": {},
    "outputs": [],
    "source": [
-    "from rail.estimation.algos.NZDir import NZDir, Inform_NZDir\n",
+    "from rail.estimation.algos.nz_dir import NZDirSummarizer, NZDirInformer\n",
     "from rail.core.data import TableHandle\n",
     "from rail.core.stage import RailStage"
    ]
@@ -161,7 +161,7 @@
    "id": "f65d4835-2ff6-4206-b017-cdf1d7cad828",
    "metadata": {},
    "source": [
-    "Now, let's set up or estimator, first creating a stage for the informer.  We define any input variables in a dictionary and then use that with `make_stage` to create an instance of our NZDir summarizer.  We'll create a histogram of 25 bins, using 5 nearest neighbors to define our specz neighborhood, and above we defined our bin column as \"bin\":"
+    "Now, let's set up or estimator, first creating a stage for the informer.  We define any input variables in a dictionary and then use that with `make_stage` to create an instance of our NZDirSummarizer.  We'll create a histogram of 25 bins, using 5 nearest neighbors to define our specz neighborhood, and above we defined our bin column as \"bin\":"
    ]
   },
   {
@@ -171,7 +171,7 @@
    "metadata": {},
    "outputs": [],
    "source": [
-    "train_nzdir = Inform_NZDir.make_stage(name='train_nzdir', n_neigh=5,\n",
+    "train_nzdir = NZDirInformer.make_stage(name='train_nzdir', n_neigh=5,\n",
     "                                      szweightcol='weight', model=\"NZDir_model.pkl\")"
    ]
   },
@@ -225,7 +225,7 @@
     "binnames = ['low', 'mid', 'hi']\n",
     "bin_datasets = [low_bin, mid_bin, hi_bin]\n",
     "for bin, indata in zip(binnames, bin_datasets):\n",
-    "    nzsumm = NZDir.make_stage(name=f'nzsumm_{bin}', **summdict)\n",
+    "    nzsumm = NZDirSummarizer.make_stage(name=f'nzsumm_{bin}', **summdict)\n",
     "    bin_ens[f'{bin}'] = nzsumm.estimate(indata)"
    ]
   },
@@ -381,7 +381,7 @@
    "source": [
     "xinformdict = dict(n_neigh=5, bincol=\"bin\", szweightcol='weight',\n",
     "                   model=\"NZDir_model_incompl.pkl\", hdf5_groupname='')\n",
-    "newsumm_inform = Inform_NZDir.make_stage(name='newsumm_inform', **xinformdict)"
+    "newsumm_inform = NZDirInformer.make_stage(name='newsumm_inform', **xinformdict)"
    ]
   },
   {
@@ -416,7 +416,7 @@
     "binnames = ['low', 'mid', 'hi']\n",
     "bin_datasets = [low_bin, mid_bin, hi_bin]\n",
     "for bin, indata in zip(binnames, bin_datasets):\n",
-    "    nzsumm = NZDir.make_stage(name=f'nzsumm_{bin}', **xestimatedict)\n",
+    "    nzsumm = NZDirSummarizer.make_stage(name=f'nzsumm_{bin}', **xestimatedict)\n",
     "    new_ens[f'{bin}'] = nzsumm.estimate(indata)"
    ]
   },