Sasquatch documentation updates

lsst-sqre · May 30, 2024 · 3a7f3fc · 3a7f3fc
1 parent 4a3b2c5
commit 3a7f3fc
Show file tree

Hide file tree

Showing 13 changed files with 339 additions and 93 deletions.
diff --git a/.coverage b/.coverage
diff --git a/docs/_rst_epilog.rst b/docs/_rst_epilog.rst
diff --git a/docs/conf.py b/docs/conf.py
@@ -1 +1,3 @@
 from documenteer.conf.guide import *  # noqa
+
+exclude_patterns = ["**.ipynb"]
diff --git a/docs/documenteer.toml b/docs/documenteer.toml
@@ -6,7 +6,6 @@ copyright = "2024 Association of Universities for Research in Astronomy, Inc. (A
 package = "sasquatch"
 
 [sphinx]
-rst_epilog_file = "_rst_epilog.rst"
 disable_primary_sidebars = [
     "index",
     "changelog",

diff --git a/docs/user-guide/analysistools.rst b/docs/user-guide/analysistools.rst
@@ -1,14 +1,14 @@
 .. _analysis-tools:
 
-########
-Overview
-########
+######################
+Analysis Tools metrics
+######################
 
-The `Analysis Tools`_ package is used to create QA metrics from the `LSST Pipelines`_ outputs.
+The `Analysis Tools`_ package is used to create science performance metrics from the `LSST Pipelines`_ outputs.
 
-Currently, the Analysis Tools metrics are dispatched to the ``usdfdev_efd`` Sasquatch environment under the ``lsst.dm`` namespace.
+Currently, the Analysis Tools metrics are dispatched to the ``usdfdev_efd`` environment under the ``lsst.dm`` namespace.
 
-The EFD Python client can be used to query these metrics.
+The :ref:`EFD Python client <efdclient>` can be used to query them.
 
 For example, to get the list of analysis tools in the ``lsst.dm`` namespace, you can use:
 

diff --git a/docs/user-guide/avro.rst b/docs/user-guide/avro.rst
@@ -1,25 +1,13 @@
 .. _avro:
 
-#########################
-Avro and schema evolution
-#########################
+############
+Avro schemas
+############
 
-Sasquatch uses the Avro format.
-An advantage of Avro is that it has a schema that comes with the data and supports schema evolution.
+Sasquatch uses Avro as serialization format.
+An advantage of Avro is that it provides a schema and supports schema evolution with the help of the `Confluent Schema Registry`_.
 
-Sasquatch uses the `Confluent Schema Registry`_ to ensure schemas can evolve safely.
-In Sasquatch, schema changes must be *forward-compatible* so that consumers of Sasquatch won't break.
-That includes Kafka consumers, InfluxDB queries, and even Chronograf dashboards.
-
-Forward compatibility means that data produced with a new schema can be read by consumers using the previous schema.
-An example of a forward-compatible schema change is adding a new field.
-Removing or renaming an existing field are non forward-compatible schema changes.
-
-Read more about forward compatibility in the `Confluent Schema Registry`_ documentation.
-
-.. _Confluent Schema Registry: https://docs.confluent.io/platform/current/schema-registry/fundamentals/avro.html#forward-compatibility
-
-For example, assume the ``skyFluxMetric`` metric with the following payload:
+For example, assume a metric named ``skyFluxMetric`` with the following data:
 
 .. code:: json
 
@@ -31,8 +19,135 @@ For example, assume the ``skyFluxMetric`` metric with the following payload:
         "stdevSky": 2328.906118708811,
     }
 
+A simple Avro schema for this metric would look like this:
+
+.. code:: json
+
+    {
+        "namespace": "lsst.example",
+        "name": "skyFluxMetric",
+        "type": "record",
+        "fields": [
+            {
+                "name": "timestamp",
+                "type": "long"
+            },
+            {
+                "name": "band",
+                "type": "string"
+            },
+            {
+                "name": "instrument",
+                "type": "string",
+            },
+            {
+                "name": "meanSky",
+                "type": "float"
+            },
+            {
+                "name": "stdevSky",
+                "type": "float"
+            }
+        ]
+    }
+
+The :ref:`namespace <namespaces>` and the the metric name are used to create the full qualified metric name in Sasquatch, ``lsst.example.skyFluxMetric``, which also corresponds to the Kafka topic name.
+The schema is stored in the Schema Registry and is used to validate the data sent to Sasquatch.
+
+Read more about Avro schemas and types in the `Avro specification`_.
+
+Adding units and description
+============================
+
+Adding units and description is recommended and they are used by the EFD client ``.get_schema()`` method to display this information and help users to understand the data.
+
+The convention in Sasquatch is to add the ``units`` and ``description`` keys in every field of the schema, and `astropy units`_ whenerver possible.
+
+.. code:: json
+
+    {
+        "namespace": "lsst.example",
+        "name": "skyFluxMetric",
+        "type": "record",
+        "fields": [
+            {
+                "name": "timestamp",
+                "type": "long",
+                "units": "ms",
+                "description": "The time the metric was measured in millisecons since the Unix epoch."
+            },
+            {
+                "name": "band",
+                "type": "string",
+                "units": "unitless",
+                "description": "The observation band associated to this metric."
+            },
+            {
+                "name": "instrument",
+                "type": "string",
+                "units": "unitless",
+                "description": "The name of the instrument associated to this metric."
+            },
+            {
+                "name": "meanSky",
+                "type": "float",
+                "units": "adu",
+                "description": "The mean sky flux in ADU."
+            },
+            {
+                "name": "stdevSky",
+                "type": "float",
+                "units": "adu",
+                "description": "The standard deviation of the sky flux in ADU."
+
+            }
+        ]
+    }
+
+.. _astropy units: https://docs.astropy.org/en/stable/units/
+
+Optional fields
+===============
+
+In some situations, you don’t have values for all the fields defined in the schema.
+In this case you can mark the field as optional and provide a default value.
+Sasquatch uses the Avro null value for nullable fields, and the schema for a nullable field uses the `Union`_ type:
+
+.. code:: json
+
+    {"name": "meanSky", "type": ["null", "float"], "default": null}
+
+Note that because of the union type, when sending data to Sasquatch this will not work:
+
+.. code:: json
+
+    {"meanSky": 2328.906}
+
+Intead, you must do this:
+
+.. code:: json
+
+    {"meanSky": {"float": 2328.906}}
+
+.. _Union: https://avro.apache.org/docs/1.11.1/specification/#unions
+
+Schema evolution
+================
+
+In Sasquatch, schema changes must be *forward-compatible* so that consumers won't break.
+Sasquatch consumers include Kafka consumers, any application that queries InfluxDB, and Chronograf dashboards.
+
+Forward compatibility means that data produced with a new schema can be read by consumers using the previous schema.
+An example of a forward-compatible schema change is adding a new field to the schema.
+Removing or renaming an existing field are examples of non forward-compatible schema changes.
+
+Read more about forward compatibility in the `Confluent Schema Registry`_ documentation.
+
+.. _Confluent Schema Registry: https://docs.confluent.io/platform/current/schema-registry/fundamentals/avro.html#forward-compatibility
+
+
 Suppose there's a dashboard in Chronograf with a chart that displays a time series of ``meanSky`` and ``stdevSky`` values grouped by ``band``.
-Thus the ``timestamp``, ``band``, ``meanSky`` and ``stdevSky`` fields are required in the metric record for the dashboard to work.
+The ``timestamp``, ``band``, ``meanSky`` and ``stdevSky`` fields are always required for that chart to work.
 The following Avro schema will ensure these fields are always present:
 
 .. code:: json
@@ -65,7 +180,7 @@ The following Avro schema will ensure these fields are always present:
         ]
     }
 
-Suppose you want to add a table linked to the previous chart in the dashboard to display the visit ID associated with this metric.
+Now, suppose you want to add a table linked to the previous chart to display the visit ID associated with this metric.
 Adding the ``visit`` field to the schema is a *forward-compatible* change, so that's allowed:
 
 .. code:: json
@@ -102,12 +217,8 @@ Adding the ``visit`` field to the schema is a *forward-compatible* change, so th
         ]
     }
 
-New messages sent to Sasquatch now require the ``visit`` field and a new version of the dashboard that uses the ``visit`` information can be implemented.
-Because this is a forward-compatible schema change, previous dashboard versions won't break since they don't use the ``visit`` field.
-
-In Sasquatch, a metric (or a telemetry topic) corresponds to a Kafka topic.
-The metric :ref:`namespace <namespaces>` is specified in the Avro schema, and the metric full qualified name in this example is ``lsst.example.skyFluxMetric``.
+New messages sent to Sasquatch now require the ``visit`` field and a new query that uses the ``visit`` information can be implemented.
+Because this is a forward-compatible schema change, existing queries won't break since they don't use the ``visit`` field.
 
-Read more about Avro schemas and types in the `Avro specification`_.
 
 .. _Avro specification: https://avro.apache.org/docs/1.11.1/specification/
diff --git a/docs/user-guide/efdclient.rst b/docs/user-guide/efdclient.rst
@@ -5,9 +5,12 @@
 The EFD Python client
 #####################
 
-The EFD Python client provides convenience methods for accessing EFD data.
+The EFD client built on top of the `aioinflux`_ library and provides a high-level API to interact with the EFD.
 
-For example, at USDF you can instantiate the EFD client using:
+The EFD client is designed to be used in the RSP notebook aspect.
+For services that need to access the EFD, see how to query the :ref:`InfluxDB API <influxdbapi>` directly.
+
+For example, from a notebook running at the `USDF RSP`_ you can instantiate the EFD client using:
 
 .. code::
 
@@ -16,12 +19,14 @@ For example, at USDF you can instantiate the EFD client using:
 
    await client.get_topics()
 
-where ``usdf_efd`` is an alias to the :ref:`environment <environments>`.
-It helps to discover the InfluxDB API URL and the credentials to connect to Sasquatch.
+``usdf_efd`` is an alias for the InfluxDB instance at USDF. With that the EFD client discovers the InfluxDB URL, database and credentials to connect to that
+environment.
+
+If you are using the EFD client on another environment, see the corresponding alias in the :ref:`environments <environments>` page.
 
-Read more about the methods available in the `EFD client documentation`_.
+Learn more about the methods available in the `documentation`_.
 
-.. _EFD client documentation: https://efd-client.lsst.io
+.. _documentation: https://efd-client.lsst.io
 
 InfluxQL
 --------
@@ -72,5 +77,7 @@ Example notebooks
 
       Learn how to return chunked responses with the EFD client.
 
+.. _aioinflux: https://aioinflux.readthedocs.io/
+.. _USDF RSP: https://usdf-rsp.slac.stanford.edu/
 .. _single vs. double quotes: https://www.influxdata.com/blog/tldr-influxdb-tech-tips-july-21-2016/
 .. _InfluxQL documentation: https://docs.influxdata.com/influxdb/v1.8/query_language/explore-data/
diff --git a/docs/user-guide/index.rst b/docs/user-guide/index.rst
@@ -5,16 +5,14 @@ User guide
 ##########
 
 .. toctree::
-    :caption: Observatory telemetry (EFD)
+    :caption: Accessing data
 
-    Overview <observatorytelemetry>
+    Observatory telemetry (EFD) <observatorytelemetry>
     The EFD Python client <efdclient>
     Working with timestamps <timestamps>
+    Analysis Tools metrics <analysistools>
 
-.. toctree::
-    :caption: Analysis Tools metrics
-
-    Overview <analysistools>
+    The InfluxDB API <influxdbapi>
 
 .. toctree::
     :caption: Data exploration and visualization with Chronograf
@@ -37,7 +35,7 @@ User guide
 
     Overview <sendingdata>
     Namespaces <namespaces>
-    Avro and Schema Evolution <avro>
+    Avro schemas <avro>
     Kafka REST Proxy <restproxy>
     Kafdrop <kafdrop>
Original file line number	Diff line number	Diff line change
		@@ -1 +1,3 @@
		from documenteer.conf.guide import * # noqa

		exclude_patterns = ["**.ipynb"]