Skip to content

Commit

Permalink
Sasquatch documentation updates
Browse files Browse the repository at this point in the history
  • Loading branch information
afausti committed May 30, 2024
1 parent 4a3b2c5 commit 3a7f3fc
Show file tree
Hide file tree
Showing 13 changed files with 339 additions and 93 deletions.
Binary file modified .coverage
Binary file not shown.
6 changes: 0 additions & 6 deletions docs/_rst_epilog.rst

This file was deleted.

2 changes: 2 additions & 0 deletions docs/conf.py
Original file line number Diff line number Diff line change
@@ -1 +1,3 @@
from documenteer.conf.guide import * # noqa

exclude_patterns = ["**.ipynb"]
1 change: 0 additions & 1 deletion docs/documenteer.toml
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,6 @@ copyright = "2024 Association of Universities for Research in Astronomy, Inc. (A
package = "sasquatch"

[sphinx]
rst_epilog_file = "_rst_epilog.rst"
disable_primary_sidebars = [
"index",
"changelog",
Expand Down
12 changes: 6 additions & 6 deletions docs/user-guide/analysistools.rst
Original file line number Diff line number Diff line change
@@ -1,14 +1,14 @@
.. _analysis-tools:

########
Overview
########
######################
Analysis Tools metrics
######################

The `Analysis Tools`_ package is used to create QA metrics from the `LSST Pipelines`_ outputs.
The `Analysis Tools`_ package is used to create science performance metrics from the `LSST Pipelines`_ outputs.

Currently, the Analysis Tools metrics are dispatched to the ``usdfdev_efd`` Sasquatch environment under the ``lsst.dm`` namespace.
Currently, the Analysis Tools metrics are dispatched to the ``usdfdev_efd`` environment under the ``lsst.dm`` namespace.

The EFD Python client can be used to query these metrics.
The :ref:`EFD Python client <efdclient>` can be used to query them.

For example, to get the list of analysis tools in the ``lsst.dm`` namespace, you can use:

Expand Down
163 changes: 137 additions & 26 deletions docs/user-guide/avro.rst
Original file line number Diff line number Diff line change
@@ -1,25 +1,13 @@
.. _avro:

#########################
Avro and schema evolution
#########################
############
Avro schemas
############

Sasquatch uses the Avro format.
An advantage of Avro is that it has a schema that comes with the data and supports schema evolution.
Sasquatch uses Avro as serialization format.
An advantage of Avro is that it provides a schema and supports schema evolution with the help of the `Confluent Schema Registry`_.

Sasquatch uses the `Confluent Schema Registry`_ to ensure schemas can evolve safely.
In Sasquatch, schema changes must be *forward-compatible* so that consumers of Sasquatch won't break.
That includes Kafka consumers, InfluxDB queries, and even Chronograf dashboards.

Forward compatibility means that data produced with a new schema can be read by consumers using the previous schema.
An example of a forward-compatible schema change is adding a new field.
Removing or renaming an existing field are non forward-compatible schema changes.

Read more about forward compatibility in the `Confluent Schema Registry`_ documentation.

.. _Confluent Schema Registry: https://docs.confluent.io/platform/current/schema-registry/fundamentals/avro.html#forward-compatibility

For example, assume the ``skyFluxMetric`` metric with the following payload:
For example, assume a metric named ``skyFluxMetric`` with the following data:

.. code:: json
Expand All @@ -31,8 +19,135 @@ For example, assume the ``skyFluxMetric`` metric with the following payload:
"stdevSky": 2328.906118708811,
}
A simple Avro schema for this metric would look like this:

.. code:: json
{
"namespace": "lsst.example",
"name": "skyFluxMetric",
"type": "record",
"fields": [
{
"name": "timestamp",
"type": "long"
},
{
"name": "band",
"type": "string"
},
{
"name": "instrument",
"type": "string",
},
{
"name": "meanSky",
"type": "float"
},
{
"name": "stdevSky",
"type": "float"
}
]
}
The :ref:`namespace <namespaces>` and the the metric name are used to create the full qualified metric name in Sasquatch, ``lsst.example.skyFluxMetric``, which also corresponds to the Kafka topic name.
The schema is stored in the Schema Registry and is used to validate the data sent to Sasquatch.

Read more about Avro schemas and types in the `Avro specification`_.

Adding units and description
============================

Adding units and description is recommended and they are used by the EFD client ``.get_schema()`` method to display this information and help users to understand the data.

The convention in Sasquatch is to add the ``units`` and ``description`` keys in every field of the schema, and `astropy units`_ whenerver possible.

.. code:: json
{
"namespace": "lsst.example",
"name": "skyFluxMetric",
"type": "record",
"fields": [
{
"name": "timestamp",
"type": "long",
"units": "ms",
"description": "The time the metric was measured in millisecons since the Unix epoch."
},
{
"name": "band",
"type": "string",
"units": "unitless",
"description": "The observation band associated to this metric."
},
{
"name": "instrument",
"type": "string",
"units": "unitless",
"description": "The name of the instrument associated to this metric."
},
{
"name": "meanSky",
"type": "float",
"units": "adu",
"description": "The mean sky flux in ADU."
},
{
"name": "stdevSky",
"type": "float",
"units": "adu",
"description": "The standard deviation of the sky flux in ADU."
}
]
}
.. _astropy units: https://docs.astropy.org/en/stable/units/

Optional fields
===============

In some situations, you don’t have values for all the fields defined in the schema.
In this case you can mark the field as optional and provide a default value.
Sasquatch uses the Avro null value for nullable fields, and the schema for a nullable field uses the `Union`_ type:

.. code:: json
{"name": "meanSky", "type": ["null", "float"], "default": null}
Note that because of the union type, when sending data to Sasquatch this will not work:

.. code:: json
{"meanSky": 2328.906}
Intead, you must do this:

.. code:: json
{"meanSky": {"float": 2328.906}}
.. _Union: https://avro.apache.org/docs/1.11.1/specification/#unions

Schema evolution
================

In Sasquatch, schema changes must be *forward-compatible* so that consumers won't break.
Sasquatch consumers include Kafka consumers, any application that queries InfluxDB, and Chronograf dashboards.

Forward compatibility means that data produced with a new schema can be read by consumers using the previous schema.
An example of a forward-compatible schema change is adding a new field to the schema.
Removing or renaming an existing field are examples of non forward-compatible schema changes.

Read more about forward compatibility in the `Confluent Schema Registry`_ documentation.

.. _Confluent Schema Registry: https://docs.confluent.io/platform/current/schema-registry/fundamentals/avro.html#forward-compatibility


Suppose there's a dashboard in Chronograf with a chart that displays a time series of ``meanSky`` and ``stdevSky`` values grouped by ``band``.
Thus the ``timestamp``, ``band``, ``meanSky`` and ``stdevSky`` fields are required in the metric record for the dashboard to work.
The ``timestamp``, ``band``, ``meanSky`` and ``stdevSky`` fields are always required for that chart to work.
The following Avro schema will ensure these fields are always present:

.. code:: json
Expand Down Expand Up @@ -65,7 +180,7 @@ The following Avro schema will ensure these fields are always present:
]
}
Suppose you want to add a table linked to the previous chart in the dashboard to display the visit ID associated with this metric.
Now, suppose you want to add a table linked to the previous chart to display the visit ID associated with this metric.
Adding the ``visit`` field to the schema is a *forward-compatible* change, so that's allowed:

.. code:: json
Expand Down Expand Up @@ -102,12 +217,8 @@ Adding the ``visit`` field to the schema is a *forward-compatible* change, so th
]
}
New messages sent to Sasquatch now require the ``visit`` field and a new version of the dashboard that uses the ``visit`` information can be implemented.
Because this is a forward-compatible schema change, previous dashboard versions won't break since they don't use the ``visit`` field.

In Sasquatch, a metric (or a telemetry topic) corresponds to a Kafka topic.
The metric :ref:`namespace <namespaces>` is specified in the Avro schema, and the metric full qualified name in this example is ``lsst.example.skyFluxMetric``.
New messages sent to Sasquatch now require the ``visit`` field and a new query that uses the ``visit`` information can be implemented.
Because this is a forward-compatible schema change, existing queries won't break since they don't use the ``visit`` field.

Read more about Avro schemas and types in the `Avro specification`_.

.. _Avro specification: https://avro.apache.org/docs/1.11.1/specification/
19 changes: 13 additions & 6 deletions docs/user-guide/efdclient.rst
Original file line number Diff line number Diff line change
Expand Up @@ -5,9 +5,12 @@
The EFD Python client
#####################

The EFD Python client provides convenience methods for accessing EFD data.
The EFD client built on top of the `aioinflux`_ library and provides a high-level API to interact with the EFD.

For example, at USDF you can instantiate the EFD client using:
The EFD client is designed to be used in the RSP notebook aspect.
For services that need to access the EFD, see how to query the :ref:`InfluxDB API <influxdbapi>` directly.

For example, from a notebook running at the `USDF RSP`_ you can instantiate the EFD client using:

.. code::
Expand All @@ -16,12 +19,14 @@ For example, at USDF you can instantiate the EFD client using:
await client.get_topics()
where ``usdf_efd`` is an alias to the :ref:`environment <environments>`.
It helps to discover the InfluxDB API URL and the credentials to connect to Sasquatch.
``usdf_efd`` is an alias for the InfluxDB instance at USDF. With that the EFD client discovers the InfluxDB URL, database and credentials to connect to that
environment.

If you are using the EFD client on another environment, see the corresponding alias in the :ref:`environments <environments>` page.

Read more about the methods available in the `EFD client documentation`_.
Learn more about the methods available in the `documentation`_.

.. _EFD client documentation: https://efd-client.lsst.io
.. _documentation: https://efd-client.lsst.io

InfluxQL
--------
Expand Down Expand Up @@ -72,5 +77,7 @@ Example notebooks

Learn how to return chunked responses with the EFD client.

.. _aioinflux: https://aioinflux.readthedocs.io/
.. _USDF RSP: https://usdf-rsp.slac.stanford.edu/
.. _single vs. double quotes: https://www.influxdata.com/blog/tldr-influxdb-tech-tips-july-21-2016/
.. _InfluxQL documentation: https://docs.influxdata.com/influxdb/v1.8/query_language/explore-data/
12 changes: 5 additions & 7 deletions docs/user-guide/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -5,16 +5,14 @@ User guide
##########

.. toctree::
:caption: Observatory telemetry (EFD)
:caption: Accessing data

Overview <observatorytelemetry>
Observatory telemetry (EFD) <observatorytelemetry>
The EFD Python client <efdclient>
Working with timestamps <timestamps>
Analysis Tools metrics <analysistools>

.. toctree::
:caption: Analysis Tools metrics

Overview <analysistools>
The InfluxDB API <influxdbapi>

.. toctree::
:caption: Data exploration and visualization with Chronograf
Expand All @@ -37,7 +35,7 @@ User guide

Overview <sendingdata>
Namespaces <namespaces>
Avro and Schema Evolution <avro>
Avro schemas <avro>
Kafka REST Proxy <restproxy>
Kafdrop <kafdrop>

Expand Down
Loading

0 comments on commit 3a7f3fc

Please sign in to comment.