Models and data documentation

deepskies · Jun 21, 2024 · 3db2b86 · 3db2b86
1 parent 7e57616
commit 3db2b86
Show file tree

Hide file tree

Showing 15 changed files with 320 additions and 63 deletions.
diff --git a/docs/source/API/client.rst b/docs/source/API/client.rst
diff --git a/docs/source/API/utils.rst b/docs/source/API/utils.rst
diff --git a/docs/source/client.rst b/docs/source/client.rst
@@ -0,0 +1,35 @@
+Client 
+========
+
+.. note:: 
+    When running the client, you can supply **either** the configuration yaml file, or the CLI arguments. 
+    You do not need to supply both. 
+
+Use the command `diagnose -h` to view all usage of the CLI helper at any time.  
+Specific argument descriptions and explanations can be found on the :ref:`configuration` page. 
+
+.. code-block:: bash
+    
+    usage: diagnose [-h] [--config CONFIG] [--model_path MODEL_PATH] [--model_engine {SBIModel}] [--data_path DATA_PATH] [--data_engine {H5Data,PickleData}]
+                    [--simulator SIMULATOR] [--out_dir OUT_DIR] [--metrics [{CoverageFraction,AllSBC,LC2ST}]]
+                    [--plots [{CDFRanks,CoverageFraction,Ranks,TARP,LC2ST,PPC}]]
+
+    options:
+    -h, --help            show this help message and exit
+    --config CONFIG, -c CONFIG
+                            .yaml file with all arguments to run.
+    --model_path MODEL_PATH, -m MODEL_PATH
+                            String path to a model. Must be compatible with your model_engine choice.
+    --model_engine {SBIModel}, -e {SBIModel}
+                            Way to load your model. See each module's documentation page for requirements and specifications.
+    --data_path DATA_PATH, -d DATA_PATH
+                            String path to data. Must be compatible with data_engine choice.
+    --data_engine {H5Data,PickleData}, -g {H5Data,PickleData}
+                            Way to load your data. See each module's documentation page for requirements and specifications.
+    --simulator SIMULATOR, -s SIMULATOR
+                            String name of the simulator to use with generative metrics and plots. Must be pre-register with the `utils.register_simulator` method.
+    --out_dir OUT_DIR     Where the results will be saved. Path need not exist, it will be created.
+    --metrics [{CoverageFraction,AllSBC,LC2ST}]
+                            List of metrics to run. To not run any, supply `--metrics `
+    --plots [{CDFRanks,CoverageFraction,Ranks,TARP,LC2ST,PPC}]
+                            List of plots to run. To not run any, supply `--plots `
diff --git a/docs/source/configuration.rst b/docs/source/configuration.rst
@@ -1,3 +1,5 @@
+.. _configuration:
+
 Configuration 
 ===============
 
@@ -23,7 +25,7 @@ but it can be specified to quickly access variables to avoid re-writing initiali
 
 .. code-block:: python 
 
-    from DeepDiagnostics.utils.configuration import Config 
+    from deepdiagnostics.utils.configuration import Config 
 
 
     Config("path/to/your/config.yaml")
@@ -54,7 +56,7 @@ Configuration Description
 
     :param model_path: Path to stored model. Required. 
 
-    :param model_engine: Loading method to use. Choose from methods listed in :ref:`plots<plots>`
+    :param model_engine: Loading method to use. Choose from methods listed in :ref:`models`.
 
 .. code-block:: yaml 
 
@@ -66,11 +68,11 @@ Configuration Description
 
     :param data_path: Path to stored data. Required.
 
-    :param data_engine: Loading method to use. Choose from methods listed in  :ref:`plots<plots>`
+    :param data_engine: Loading method to use. Choose from methods listed in :ref:`data`.
 
     :param simulator: String name of the simulator. Must be pre-registered .
 
-    :param prior: Prior distribution used in training. Used if "prior" is not included in the passed data. Choose from []
+    :param prior: Prior distribution used in training. Used if "prior" is not included in the passed data. 
 
     :param prior_kwargs: kwargs to use with the initialization of the prior
 

diff --git a/docs/source/API/data.rst → docs/source/data.rst b/docs/source/API/data.rst → docs/source/data.rst
@@ -1,3 +1,5 @@
+.. _data:
+
 Data 
 ======
 
@@ -8,4 +10,7 @@ Data
     :members:
 
 .. autoclass:: data.PickleData
-    :members:
+    :members:
+
+.. autoclass:: data.simulator.Simulator
+    :members: 
diff --git a/docs/source/index.rst b/docs/source/index.rst
@@ -14,10 +14,9 @@ Welcome to DeepDiagnostics's documentation!
    configuration
    plots
    metrics
-   API/client
-   API/utils 
-   API/data
-   API/models 
+   client
+   data
+   models 
 
 Indices and tables
 ==================

diff --git a/docs/source/API/models.rst → docs/source/models.rst b/docs/source/API/models.rst → docs/source/models.rst
@@ -1,3 +1,5 @@
+.. _models:
+
 Models 
 ========
 

diff --git a/docs/source/quickstart.rst b/docs/source/quickstart.rst
@@ -4,7 +4,7 @@ Quickstart
 Notebook Example 
 -----------------
 
-`An example notebook can be found here for an interactive walkthrough. <https://github.com/deepskies/DeepDiagnostics/blob/main/notebooks/example.ipynb>`_. 
+`An example notebook can be found here for an interactive walkthrough <https://github.com/deepskies/DeepDiagnostics/blob/main/notebooks/example.ipynb>`_. 
 
 Installation 
 --------------
@@ -27,7 +27,7 @@ Installation
 Configuration 
 ----
 
-Description of the configuration file, including defaults, can be found in :ref:`configuration<configuration>`
+Description of the configuration file, including defaults, can be found in :ref:`configuration`.
 
 Pipeline 
 ---------
@@ -75,7 +75,7 @@ All plots and metrics can be found in :ref:`plots<plots>` and :ref:`metrics<metr
 
 
 Custom Simulations
----
+-------------------
 
 To use generative model diagnostics, a simulator has to be included. 
 This is done by `registering` your simulation with a name and a class associated. 

diff --git a/src/deepdiagnostics/client/client.py b/src/deepdiagnostics/client/client.py
@@ -12,35 +12,45 @@
 
 def parser():
     parser = ArgumentParser()
-    parser.add_argument("--config", "-c", default=None)
+    parser.add_argument("--config", "-c", default=None, help=".yaml file with all arguments to run.")
 
     # Model
-    parser.add_argument("--model_path", "-m", default=None)
+    parser.add_argument("--model_path", "-m", default=None, help="String path to a model. Must be compatible with your model_engine choice.")
     parser.add_argument(
         "--model_engine",
         "-e",
         default=Defaults["model"]["model_engine"],
         choices=ModelModules.keys(),
+        help="Way to load your model. See each module's documentation page for requirements and specifications."
     )
 
     # Data
-    parser.add_argument("--data_path", "-d", default=None)
+    parser.add_argument("--data_path", "-d", default=None, help="String path to data. Must be compatible with data_engine choice.")
     parser.add_argument(
         "--data_engine",
         "-g",
         default=Defaults["data"]["data_engine"],
         choices=DataModules.keys(),
+        help="Way to load your data. See each module's documentation page for requirements and specifications."
     )
-    parser.add_argument("--simulator", "-s", default=None)
+    parser.add_argument(
+        "--simulator", "-s", 
+        default=None, 
+        help='String name of the simulator to use with generative metrics and plots. Must be pre-register with the `utils.register_simulator` method.')
     # Common
-    parser.add_argument("--out_dir", default=Defaults["common"]["out_dir"])
+    parser.add_argument(
+        "--out_dir", 
+        default=Defaults["common"]["out_dir"], 
+        help="Where the results will be saved. Path need not exist, it will be created."
+    )
 
     # List of metrics (cannot supply specific kwargs)
     parser.add_argument(
         "--metrics",
         nargs="?",
         default=list(Defaults["metrics"].keys()),
         choices=Metrics.keys(),
+        help="List of metrics to run. To not run any, supply `--metrics `"
     )
 
     # List of plots
@@ -49,6 +59,8 @@ def parser():
         nargs="?",
         default=list(Defaults["plots"].keys()),
         choices=Plots.keys(),
+        help="List of plots to run. To not run any, supply `--plots `"
+
     )
 
     args = parser.parse_args()

diff --git a/src/deepdiagnostics/data/data.py b/src/deepdiagnostics/data/data.py
@@ -1,10 +1,27 @@
-from typing import Optional
+from typing import Any, Optional, Sequence, Union
 import numpy as np
 
 from deepdiagnostics.utils.config import get_item
 from deepdiagnostics.utils.register import load_simulator
 
 class Data:
+    """
+        Load stored data to use in diagnostics
+
+        Args:
+            path (str): path to the data file.
+            simulator_name (str): Name of the register simulator. If your simulator is not registered with utils.register_simulator, it will produce an error here. 
+            simulator_kwargs (dict, optional): Any additional kwargs used set up your simulator. Defaults to None.
+            prior (str, optional): If the prior is not given in the data, use a numpy random distribution. Specified by name. Choose from: {
+                "normal"
+                "poisson"
+                "uniform"
+                "gamma"
+                "beta"
+                "binominal}. Defaults to None.
+            prior_kwargs (dict, optional): kwargs for the numpy prior. `View this page for a description <https://numpy.org/doc/stable/reference/random/generator.html#distributions>`_. Defaults to None.
+            simulation_dimensions (Optional[int], optional): 1 or 2. 1->output of the simulator has one dimensions, 2->output has two dimensions (is an image). Defaults to None.
+    """
     def __init__(
         self,
         path: str,
@@ -23,7 +40,13 @@ def __init__(
         self.n_dims = self.get_theta_true().shape[1]
         self.simulator_dimensions = simulation_dimensions if simulation_dimensions is not None else get_item("data", "simulator_dimensions", raise_exception=False)
 
-    def get_simulator_output_shape(self): 
+    def get_simulator_output_shape(self) -> tuple[Sequence[int]]: 
+        """
+        Run a single sample of the simulator to verify the out-shape. 
+
+        Returns:
+             tuple[Sequence[int]]: Output shape of a single sample of the simulator. 
+        """
         context_shape = self.true_context().shape
         sim_out = self.simulator(theta=self.get_theta_true()[0:1, :], n_samples=context_shape[-1])
         return sim_out.shape
@@ -32,16 +55,47 @@ def _load(self, path: str):
         raise NotImplementedError
 
     def true_context(self):
+        """
+        True data x values, if supplied by the data method. 
+        """
         # From Data
         raise NotImplementedError
 
-    def true_simulator_outcome(self):
+    def true_simulator_outcome(self) -> np.ndarray:
+        """
+        Run the simulator on all true theta and true x values. 
+
+        Returns:
+            np.ndarray: array of (n samples, simulator shape) showing output of the simulator on all true samples in data.
+        """
         return self.simulator(self.get_theta_true(), self.true_context())
 
-    def sample_prior(self, n_samples: int):
+    def sample_prior(self, n_samples: int) -> np.ndarray:
+        """
+        Draw samples from the simulator
+
+        Args:
+            n_samples (int): Number of samples to draw
+
+        Returns:
+            np.ndarray: 
+        """
         return self.prior_dist(size=(n_samples, self.n_dims))
 
-    def simulator_outcome(self, theta, condition_context=None, n_samples=None):
+    def simulator_outcome(self, theta:np.ndarray, condition_context:np.ndarray=None, n_samples:int=None):
+        """_summary_
+
+        Args:
+            theta (np.ndarray): Theta value of shape (n_samples, theta_dimensions)
+            condition_context (np.ndarray, optional): If x values for theta are known, use them. Defaults to None.
+            n_samples (int, optional): If x values are not known for theta, draw them randomly. Defaults to None.
+
+        Raises:
+            ValueError: If either n samples or content samples is supplied. 
+
+        Returns:
+            np.ndarray: Simulator output of shape (n samples, simulator_dimensions)
+        """
         if condition_context is None:
             if n_samples is None:
                 raise ValueError(
@@ -51,16 +105,39 @@ def simulator_outcome(self, theta, condition_context=None, n_samples=None):
         else:
             return self.simulator.simulate(theta, condition_context)
 
-    def simulated_context(self, n_samples):
+    def simulated_context(self, n_samples:int) -> np.ndarray:
+        """
+        Call the simulator's `generate_context` method. 
+
+        Args:
+            n_samples (int): Number of samples to draw. 
+
+        Returns:
+            np.ndarray: context (x values), as defined by the simulator. 
+        """
         return self.simulator.generate_context(n_samples)
 
-    def get_theta_true(self):
+    def get_theta_true(self) -> Union[Any, float, int, np.ndarray]:
+        """
+        Look for the true theta given by data. If supplied in the method, use that, other look in the configuration file. 
+        If neither are supplied, return None.
+
+        Returns:
+            Any: Theta value selected by the search. 
+        """
         if hasattr(self, "theta_true"):
             return self.theta_true
         else:
             return get_item("data", "theta_true", raise_exception=True)
 
-    def get_sigma_true(self):
+    def get_sigma_true(self) -> Union[Any, float, int, np.ndarray]:
+        """
+        Look for the true sigma of data. If supplied in the method, use that, other look in the configuration file. 
+        If neither are supplied, return 1. 
+
+        Returns:
+            Any: Sigma value selected by the search. 
+        """
         if hasattr(self, "sigma_true"):
             return self.sigma_true()
         else:
@@ -72,7 +149,24 @@ def save(self, data, path: str):
     def read_prior(self):
         raise NotImplementedError
 
-    def load_prior(self, prior, prior_kwargs):
+    def load_prior(self, prior:str, prior_kwargs:dict[str, any]) -> callable:
+        """
+        Load the prior. 
+        Either try to get it from data (if it has been implemented for the type of data), 
+        or use numpy to initialize a random distribution using the prior argument. 
+
+        Args:
+            prior (str): Name of prior. 
+            prior_kwargs (dict[str, any]): kwargs for initializing the prior. 
+
+        Raises:
+            NotImplementedError: The selected prior is not included.
+            RuntimeError: The selected prior is missing arguments to initialize. 
+
+        Returns:
+            callable: Prior that can be sampled from by calling it with prior(n_samples)
+        """
+
         if prior is None:
             prior = get_item("data", "prior", raise_exception=False)
         try: