diff --git a/docs/_static/bootstrap_forwarding_rule.png b/docs/_static/bootstrap_forwarding_rule.png new file mode 100644 index 0000000..a332840 Binary files /dev/null and b/docs/_static/bootstrap_forwarding_rule.png differ diff --git a/docs/_static/forwarding_rule_details.png b/docs/_static/forwarding_rule_details.png new file mode 100644 index 0000000..e449994 Binary files /dev/null and b/docs/_static/forwarding_rule_details.png differ diff --git a/docs/_static/gcp_ip_addresses.png b/docs/_static/gcp_ip_addresses.png new file mode 100644 index 0000000..3f77448 Binary files /dev/null and b/docs/_static/gcp_ip_addresses.png differ diff --git a/docs/_static/promote_ip_address.png b/docs/_static/promote_ip_address.png new file mode 100644 index 0000000..049fc08 Binary files /dev/null and b/docs/_static/promote_ip_address.png differ diff --git a/docs/developer-guide/index.rst b/docs/developer-guide/index.rst index 6547c88..8f74383 100644 --- a/docs/developer-guide/index.rst +++ b/docs/developer-guide/index.rst @@ -18,6 +18,7 @@ A Sasquatch developer is responsible for maintaining the Sasquatch components an kafka-shutdown broker-migration connectors + new-environment .. toctree:: :caption: Troubleshooting diff --git a/docs/developer-guide/new-environment.rst b/docs/developer-guide/new-environment.rst new file mode 100644 index 0000000..42cff72 --- /dev/null +++ b/docs/developer-guide/new-environment.rst @@ -0,0 +1,141 @@ +################################ +Deploying into a new environment +################################ + +Deploying Sasquatch into a new environment requires multiple ArgoCD syncs with some manual information gathering and updating in between. + + +Enable Sasquatch in Phalanx +=========================== + +#. Cut a `Phalanx`_ development branch. +#. Ensure the ``strimzi`` and ``strimzi-access-operator`` Phalanx applications are enabled and synced in the new environment by adding them to the :samp:`environments/values-{environment}.yaml` file, and adding a blank :samp:`values-{environment}.yaml` file to their ``applications/`` directories. + `These docs `_ can help you enable them from your development branch. +#. Enable the ``sasquatch`` app in the environment. + For the :samp:`applications/sasquatch/values-{environment}.yaml` file, copy one from an existing environment that has the same enabled services that you want in the new environment. + Change all of the environment references to the new environment, and change or add anything else you need for the new environment. +#. Comment out any ``loadBalancerIP`` entries in the :samp:`applications/sasquatch/values-{environment}.yaml` file. + We'll fill these in later. +#. In the new environment's ArgoCD, point the ``sasquatch`` app at your Phalanx development branch, and sync it. + +This first sync will not be successful. +The `cert-manager`_ ``Certificate`` resource will be stuck in a progressing state until we update some values and provision some DNS. + +.. _Phalanx: https://phalanx.lsst.io +.. _cert-manager: https://cert-manager.io/ + +Gather IP addresses and update Phalanx config +============================================= + +.. note:: + + The public IP address gathering and modification described here only applies to environments deployed on `GCP`_. + This process will be different for other types of environments. + +#. Get the broker ids, which are the node ids of the the kafka brokers. + In this example, the broker ids are ``0``, ``1``, and ``2``: + + .. code:: + + ❯ kubectl get kafkanodepool -n sasquatch + NAME DESIRED REPLICAS ROLES NODEIDS + controller 3 ["controller"] [3,4,5] + kafka 3 ["broker"] [0,1,2] + +#. A GCP public IP address will be provisioned for each of these broker nodes. + Another IP address will be provisioned for the external `kafka bootstrap servers`_ endpoint. + You can see all of the provisioned ip addresses in your GCP project here: :samp:`https://console.cloud.google.com/networking/addresses/list?authuser=1&hl=en&project={project name}`: + + .. figure:: /_static/gcp_ip_addresses.png + :name: GCP IP addresses + +#. One by one, click on the ``Forwarding rule`` links in each row until you find the ones annotated with :samp:`\{"kubernetes.io/service-name":"sasquatch/sasquatch-kafka-{broker node id}"\}` for each broker node. + Note the ip address and node number. + + .. figure:: /_static/forwarding_rule_details.png + :name: Forwarding rule details + +#. Find and note the IP address that is annotated with ``{"kubernetes.io/service-name":"sasquatch/sasquatch-kafka-external-bootstrap"}``: + + .. figure:: /_static/bootstrap_forwarding_rule.png + :name: Bootstrap forwarding rule + +#. Promote all of these IP addresses to GCP Static IP Addresses by choosing the option in the three-vertical-dots menu for each IP address (you may have to scroll horrizontally). + This makes sure that we won't lose these IP addresses and have to update DNS later: + + .. figure:: /_static/promote_ip_address.png + :name: Promote IP address + +#. Update the :samp:`applications/sasquatch/values-{environment}.yaml` ``strimzi-kafka.kafka`` config with ``loadBalancerIP`` and ``host`` entries that correspond with the node ids that you found. + Here is an example from ``idfint``. + Note that the broker node ids are in the ``broker`` entries, and that the ``host`` entries have numbers in them that match the those ids. + + .. code:: yaml + + strimzi-kafka: + kafka: + externalListener: + tls: + enabled: true + bootstrap: + loadBalancerIP: "35.188.187.82" + host: sasquatch-int-kafka-bootstrap.lsst.cloud + + brokers: + - broker: 0 + loadBalancerIP: "34.171.69.125" + host: sasquatch-int-kafka-0.lsst.cloud + - broker: 1 + loadBalancerIP: "34.72.50.204" + host: sasquatch-int-kafka-1.lsst.cloud + - broker: 2 + loadBalancerIP: "34.173.225.150" + host: sasquatch-int-kafka-2.lsst.cloud + +#. Push these changes to your Phalanx branch and sync ``sasquatch`` in ArgoCD. + +.. _GCP: https://cloud.google.com +.. _kafka bootstrap servers: https://kafka.apache.org/documentation/#producerconfigs_bootstrap.servers + +Provision DNS for TLS certificate +================================= + +#. Provision ``CNAME`` records (probably in AWS Route53) for `LetsEncrypt`_ verification for each of the ``host`` entries in the ``strimzi-kafka.kafka`` values. + Continuing with the ``idfint`` example: + + .. code:: text + + _acme-challenge.sasquatch-int-kafka-0.lsst.cloud (_acme-challenge.tls.lsst.cloud) + _acme-challenge.sasquatch-int-kafka-1.lsst.cloud (_acme-challenge.tls.lsst.cloud) + _acme-challenge.sasquatch-int-kafka-2.lsst.cloud (_acme-challenge.tls.lsst.cloud) + _acme-challenge.sasquatch-int-kafka-bootstrap.lsst.cloud (_acme-challenge.tls.lsst.cloud) + +#. Provision ``A`` records for each of the ``host`` entries with their matching IP address values: + + .. code:: text + + sasquatch-int-kafka-0.lsst.cloud (34.171.69.125) + sasquatch-int-kafka-1.lsst.cloud (34.72.50.204) + sasquatch-int-kafka-2.lsst.cloud (34.173.225.150) + sasquatch-int-kafka-bootstrap.lsst.cloud (35.188.187.82) + +#. Wait for the ``Certificate`` Kubernetes resource to provision in ArgoCD! This might take several minutes + +.. _LetsEncrypt: https://letsencrypt.org + +Configure Gafaelfawr OIDC authentication +======================================== + +Sasquatch assumes that Chronograf will use OIDC authentication. +Follow `these instructions `_ to set it up. + +.. warning:: + + This requires a Gafaelfawr restart. + It could also affect all of the apps in an environment if done incorrectly. + If your new environment is a production environment, you should probably wait for a maintenance window to do this step! + +Merge your Phalanx branch! +========================== + +If all is well, of course. diff --git a/docs/environments.rst b/docs/environments.rst index 415f45d..42e982f 100644 --- a/docs/environments.rst +++ b/docs/environments.rst @@ -17,6 +17,12 @@ The table below summarizes the Sasquatch environments and their main entry point +---------------------------+---------------------------------------------------+-----------------------------------+----------------+ | :ref:`USDF dev` | https://usdf-rsp-dev.slac.stanford.edu/chronograf | ``usdfdev_efd`` | Not required | +---------------------------+---------------------------------------------------+-----------------------------------+----------------+ +| :ref:`IDF` | https://data.lsst.cloud/chronograf | (not available) | Not required | ++---------------------------+---------------------------------------------------+-----------------------------------+----------------+ +| :ref:`IDF int` | https://data-int.lsst.cloud/chronograf | (not available) | Not required | ++---------------------------+---------------------------------------------------+-----------------------------------+----------------+ +| :ref:`IDF dev` | https://data-dev.lsst.cloud/chronograf | ``idfdev_efd`` | Not required | ++---------------------------+---------------------------------------------------+-----------------------------------+----------------+ | :ref:`TTS` | https://tucson-teststand.lsst.codes/chronograf | ``tucson_teststand_efd`` | NOIRLab VPN | +---------------------------+---------------------------------------------------+-----------------------------------+----------------+ | :ref:`BTS` | https://base-lsp.lsst.codes/chronograf | ``base_efd`` | Chile VPN | @@ -75,6 +81,58 @@ Intended audience: Project staff. - Schema Registry: ``http://sasquatch-schema-registry.sasquatch:8081`` (cluster internal) - Kafka REST proxy API: ``https://usdf-rsp-dev.slac.stanford.edu/sasquatch-rest-proxy`` +.. _idf: + +IDF +--- + +Sasquatch production environment for the community science platform in Google Cloud. +This instance is mainly used for :ref:`application metrics`. + +Intended audience: Project staff. + +- Chronograf: ``https://data.lsst.cloud/chronograf`` +- InfluxDB HTTP API: ``https://data.lsst.cloud/influxdb`` +- Kafdrop UI: ``https://data.lsst.cloud/kafdrop`` +- Kafka boostrap server: ``sasquatch-kafka-bootstrap.lsst.cloud:9094`` +- Schema Registry: ``http://sasquatch-schema-registry.sasquatch:8081`` (cluster internal) +- Kafka REST proxy API: (not available) + +.. _idfint: + +IDF int +------- + +Sasquatch integration environment for the community science platform in Google Cloud. +This instance is used for testing. +There is no direct EFD integration. + +Intended audience: Project staff. + +- Chronograf: ``https://data-int.lsst.cloud/chronograf`` +- InfluxDB HTTP API: ``https://data-int.lsst.cloud/influxdb`` +- Kafdrop UI: ``https://data-int.lsst.cloud/kafdrop`` +- Kafka boostrap server: ``sasquatch-int-kafka-bootstrap.lsst.cloud:9094`` +- Schema Registry: ``http://sasquatch-schema-registry.sasquatch:8081`` (cluster internal) +- Kafka REST proxy API: ``https://data-int.lsst.cloud/sasquatch-rest-proxy`` + +.. _idfdev: + +IDF dev +------- + +Sasquatch dev environment for the community science platform in Google Cloud. +This instance is used for testing. + +Intended audience: Project staff. + +- Chronograf: ``https://data-dev.lsst.cloud/chronograf`` +- InfluxDB HTTP API: ``https://data-dev.lsst.cloud/influxdb`` +- Kafdrop UI: ``https://data-dev.lsst.cloud/kafdrop`` +- Kafka boostrap server: ``sasquatch-dev-kafka-bootstrap.lsst.cloud:9094`` +- Schema Registry: ``http://sasquatch-schema-registry.sasquatch:8081`` (cluster internal) +- Kafka REST proxy API: ``https://data-dev.lsst.cloud/sasquatch-rest-proxy`` + .. _tts: Tucson Test Stand (TTS) diff --git a/docs/user-guide/app-metrics.rst b/docs/user-guide/app-metrics.rst index 282ffd8..4479523 100644 --- a/docs/user-guide/app-metrics.rst +++ b/docs/user-guide/app-metrics.rst @@ -1,9 +1,11 @@ +.. _appmetrics: + =================== Application metrics =================== Applications can use Sasquatch infrastructure to publish metrics events to `InfluxDB`_ via `Kafka`_. -Setting certain Sasquatch values in Phalanx will create Kafka user and topic, and configure a Telegraf consumer to put messages from that topic into the ``telegraf-kafka-app-metrics-consumer`` database in the Sasquatch InfluxDB instance. +Setting certain Sasquatch values in Phalanx will create Kafka user and topic, and configure a Telegraf consumer to put messages from that topic into the ``lsst.square.metrics`` database in the Sasquatch InfluxDB instance. The messages are expected to be in :ref:`Avro ` format, and schemas are expected to be in the `Schema Registry`_ for any messages that are encoded with a schema ID. diff --git a/docs/user-guide/directconnection.rst b/docs/user-guide/directconnection.rst index e4d6bad..f74d546 100644 --- a/docs/user-guide/directconnection.rst +++ b/docs/user-guide/directconnection.rst @@ -15,11 +15,17 @@ This guide describes the the most secure and straightforward option, assuming th Generating Kafka credentials ============================ +.. note:: + + The ``strimzi-access-operator`` `Phalanx`_ app must be enabled. + It provides the ``KafkaAccess`` CRD that is used in this guide. + You can generate Kafka credentials by creating a couple of `Strimzi`_ resources: * A `KafkaUser`_ resource, in the ``sasquatch`` namespace, to configure a user in the Kafka cluster and provision a Kubernetes Secret with that user's credentials * A `KafkaAccess`_ resource, in your app's namespace, to make those credentials and other Kafka connection information available to your app +.. _Phalanx: https://phalanx.lsst.io .. _Strimzi: https://strimzi.io .. _KafkaUser: https://strimzi.io/docs/operators/latest/configuring.html#type-KafkaUser-reference .. _KafkaAccess: https://github.com/strimzi/kafka-access-operator