From 47f53c7141922e64ca2ed79311c590c9d97529df Mon Sep 17 00:00:00 2001
From: Nicholas Kumia <85196563+nickumia-reisys@users.noreply.github.com>
Date: Thu, 9 Mar 2023 10:17:18 -0500
Subject: [PATCH 1/2] new: initial pass at tabular checks
---
.github/ISSUE_TEMPLATE/o-and-m.md | 19 +++++++++++++------
1 file changed, 13 insertions(+), 6 deletions(-)
diff --git a/.github/ISSUE_TEMPLATE/o-and-m.md b/.github/ISSUE_TEMPLATE/o-and-m.md
index 516311cd4..76c6993f8 100644
--- a/.github/ISSUE_TEMPLATE/o-and-m.md
+++ b/.github/ISSUE_TEMPLATE/o-and-m.md
@@ -60,11 +60,18 @@ $ cf env catalog-web | grep solr -C 2 | grep "uri\|solr_follower_individual_urls
## Acceptance criteria
You are responsible for all [O&M responsibilities](https://github.com/gsa/data.gov/wiki/Operation-and-Maintenance-Responsibilities) this week. We've highlighted a few so they're not forgotten.
-- [ ] [Audit log updated](https://docs.google.com/spreadsheets/d/1z6lqmyNxC7s5MiTt9f6vT41IS2DLLJl4HwEqXvvft40/edit) for [AU-6 Log auditing](https://github.com/gsa/data.gov/wiki/Operation-and-Maintenance-Responsibilities#au-6-log-auditing) (**Friday**).
-- [ ] Any [New Relic alerts](https://alerts.newrelic.com/accounts/1601367/incidents) have been addressed or GH issues created.
-- [ ] Weekly [Duplicate check](https://github.com/GSA/data.gov/wiki/Operation-and-Maintenance-Responsibilities#duplicate-check) has been done, and any pertinent issues created.
-- [ ] Weekly [Nessus scan](https://github.com/gsa/data.gov/wiki/Operation-and-Maintenance-Responsibilities#nessus-host-scan-report-from-isso) has been triaged.
-- [ ] Weekly [Snyk scan](https://github.com/gsa/data.gov/wiki/Operation-and-Maintenance-Responsibilities#automated-dependency-updates-ad-hoc-github-prs) is complete.
+| Task | Friday | Monday | Tuesday | Wednesday | Thursday | Friday | Monday | Tuesday | Wednesday | Thursday | Weekly/Monthly |
+|---------------------------|-----------------|-----------------|-----------------|-----------------|-----------------|-----------------|-----------------|-----------------|-----------------|-----------------|-----------------|
+| Check Deployments |
| | | | | | | | | | ➖ |
+| Check Restarts | | | | | | | | | | | ➖ |
+| Check [Snyk Scans](https://github.com/gsa/data.gov/wiki/Operation-and-Maintenance-Responsibilities#automated-dependency-updates-ad-hoc-github-prs) | | | | | | | | | | | ➖ |
+| Check Catalog Auto Tasks | - [ ] DB-Solr Sync
- [ ] Tracking Update
- [ ] Stuck Jobs
|- [ ] DB-Solr Sync
- [ ] Tracking Update
- [ ] Stuck Jobs
| - [ ] DB-Solr Sync
- [ ] Tracking Update
- [ ] Stuck Jobs
| - [ ] DB-Solr Sync
- [ ] Tracking Update
- [ ] Stuck Jobs
| - [ ] DB-Solr Sync
- [ ] Tracking Update
- [ ] Stuck Jobs
| - [ ] DB-Solr Sync
- [ ] Tracking Update
- [ ] Stuck Jobs
|- [ ] DB-Solr Sync
- [ ] Tracking Update
- [ ] Stuck Jobs
| - [ ] DB-Solr Sync
- [ ] Tracking Update
- [ ] Stuck Jobs
| - [ ] DB-Solr Sync
- [ ] Tracking Update
- [ ] Stuck Jobs
| - [ ] DB-Solr Sync
- [ ] Tracking Update
- [ ] Stuck Jobs
| ➖ |
+| Check Harvesting Emails | | | | | | | | | | | ➖ |
+| [New Relic Alerts](https://alerts.newrelic.com/accounts/1601367/incidents) Triaged | | | | | | | | | | | ➖ |
+| Check Catalog Solr | ➖ | ➖ | ➖ | ➖ | ➖ | ➖ | ➖ | ➖ | ➖ | ➖ | |
+| [Audit Log](https://docs.google.com/spreadsheets/d/1z6lqmyNxC7s5MiTt9f6vT41IS2DLLJl4HwEqXvvft40/edit) [*AU-6*](https://github.com/gsa/data.gov/wiki/Operation-and-Maintenance-Responsibilities#au-6-log-auditing) | | ➖ | ➖ | ➖ | ➖ | | ➖ | ➖ | ➖ | ➖ | ➖ |
+| [Catalog Dupe Check](https://github.com/GSA/data.gov/wiki/Operation-and-Maintenance-Responsibilities#duplicate-check) | ➖ | ➖ | ➖ | ➖ | ➖ | ➖ | ➖ | ➖ | ➖ | ➖ | |
+| [Invicti Scan](https://github.com/gsa/data.gov/wiki/Operation-and-Maintenance-Responsibilities#netsparker-compliance-scan-report-from-isso) | ➖ | ➖ | ➖ | ➖ | ➖ | ➖ | ➖ | ➖ | ➖ | ➖ | |
+
- [ ] Weekly [resources.data.gov link scan](https://app.circleci.com/pipelines/github/GSA/resources.data.gov?branch=main)
-- [ ] If received, the monthly [Netsparker scan](https://github.com/gsa/data.gov/wiki/Operation-and-Maintenance-Responsibilities#netsparker-compliance-scan-report-from-isso) has been triaged.
- [ ] Finishing the shift: Log the [number of alerts](https://docs.google.com/spreadsheets/d/1u1hSUAQW6FWzphog122stfB6MB9Wiq0NROT3PeicRoM/edit#gid=939071144)
From 67ade4abc0a4bc0d8d420d722ef25a124e6305d5 Mon Sep 17 00:00:00 2001
From: Nicholas Kumia <85196563+nickumia-reisys@users.noreply.github.com>
Date: Thu, 9 Mar 2023 10:40:57 -0500
Subject: [PATCH 2/2] new: cleanup the rest of the issue template
---
.github/ISSUE_TEMPLATE/o-and-m.md | 52 +++----------------------------
1 file changed, 4 insertions(+), 48 deletions(-)
diff --git a/.github/ISSUE_TEMPLATE/o-and-m.md b/.github/ISSUE_TEMPLATE/o-and-m.md
index 76c6993f8..58d42ce3d 100644
--- a/.github/ISSUE_TEMPLATE/o-and-m.md
+++ b/.github/ISSUE_TEMPLATE/o-and-m.md
@@ -7,55 +7,10 @@ assignees: ''
---
As part of day-to-day operation of Data.gov, there are many [Operation and Maintenance (O&M) responsibilities](https://github.com/gsa/data.gov/wiki/Operation-and-Maintenance-Responsibilities). Instead of having the entire team watching notifications and risking some notifications slipping through the cracks, we have created an [O&M Triage role](https://github.com/gsa/data.gov/wiki/Operation-and-Maintenance-Responsibilities#om-triage-rotation). One person on the team is assigned the Triage role which rotates each sprint. _This is not meant to be a 24/7 responsibility, only East Coast business hours. If you are unavailable, please note when you will be unavailable in Slack and ask for someone to take on the role for that time._
-## Routine Tasks
-These repositories will automatically create failure tickets, so no need to check the Actions
- - [Inventory Restart Action](https://github.com/GSA/inventory-app/actions/workflows/restart.yml)
- - [Inventory deploy Action](https://github.com/GSA/inventory-app/actions/workflows/deploy.yml)
- - [Catalog Restart Action](https://github.com/GSA/catalog.data.gov/actions/workflows/restart.yml)
- - [Catalog Deploy Action](https://github.com/GSA/catalog.data.gov/actions/workflows/publish.yml)
- - [Check Stuck Harvest Jobs](https://github.com/GSA/catalog.data.gov/actions/workflows/check-stuck-harvest-jobs.yml)
-
-### Snyk Scans
-For Catalog and Inventory, snyk will create PR's if a dependency needs to be updated.
- - [Inventory Snyk Scan](https://github.com/GSA/inventory-app/actions/workflows/snyk.yml)
- - [Catalog Snyk Scan](https://github.com/GSA/catalog.data.gov/actions/workflows/snyk.yml)
-
-If either of these actions failed and a PR was created, review and approve/triage it as needed
-
-If either of these actions failed and a PR was not created, an unfixable vulnerability was found, check the Snyk UI Console to triage the vulnerability.
-
-## Daily Routine
-
-### GH Actions
-Check Action tabs for each _active_ repositories, as these will not create issues automatically on failure
- - [Catalog DB-Solr-Sync Action](https://github.com/GSA/catalog.data.gov/actions/workflows/db-solr-sync-automated.yml) The actions should finish in minutes. Examine the amount of datasets affected if it takes long to finish.
- - [Tracking Update Action](https://github.com/GSA/catalog.data.gov/actions/workflows/tracking-update.yml) The action should take 1 - 2 hours to finish on prod. Examine the amount of datasets affected or Solr index speed if the time is way off.
-
-### Miscs
-- Verify harvesting jobs are running, go through Error reports to catch unusual errors that need attention [[Wiki doc](https://github.com/gsa/data.gov/wiki/Operation-and-Maintenance-Responsibilities#harvest-job-report-daily-email-report)]
+## Miscs
- Watch for user email requests
-- Triage DMARC Report from Google (daily) sent to datagovhelp@gsa.gov (only for catalog in prod).
- Watch in [#datagov-alerts](https://gsa-tts.slack.com/archives/C4RGAM1Q8) and [Vulnerable dependency notifications (daily email reports)](https://github.com/gsa/data.gov/wiki/Operation-and-Maintenance-Responsibilities#vulnerable-dependency-notifications-daily-email-reports) for critical alerts.
-## Weekly Routine
-### Solr
-- Verify each Solr Leader/Followers are functional
-
-Use this command to find Solr URLs and credentials in the `prod` space.
-
-```
-$ cf t -s prod
-$ cf env catalog-web | grep solr -C 2 | grep "uri\|solr_follower_individual_urls\|password\|username"
-```
-
-- Verify their Start time is in sync with Solr Memory Alert history at path `/solr/#/`
-- Verify each follower stays with Solr leader at path `/solr/#/ckan/core-overview`
-- Verify each Solr is responsive by running a few queries at `/solr/#/ckan/query`
-- Inspect each Solr's logging for abnormal errors at `/solr/#/~logging`
-
-- Examine the Solr Memory Utilization Graph to catch any abnormal incidences.
-
-- Log in to `tts-jump` AWS account with role `SSBDev@ssb-production`, go to custom [SolrAlarm dashboard](https://us-west-2.console.aws.amazon.com/cloudwatch/home?region=us-west-2#dashboards:name=CatalogProdSolr;start=PT72H) to see the graph for the past 72 hours. There should not be any Solr instance that has MemoryUtilization go above 90% threshold without getting restarted. Each Solr should not restart too often (more than a few times a week)
## Acceptance criteria
You are responsible for all [O&M responsibilities](https://github.com/gsa/data.gov/wiki/Operation-and-Maintenance-Responsibilities) this week. We've highlighted a few so they're not forgotten.
@@ -66,9 +21,10 @@ You are responsible for all [O&M responsibilities](https://github.com/gsa/data.g
| Check Restarts | | | | | | | | | | | ➖ |
| Check [Snyk Scans](https://github.com/gsa/data.gov/wiki/Operation-and-Maintenance-Responsibilities#automated-dependency-updates-ad-hoc-github-prs) | | | | | | | | | | | ➖ |
| Check Catalog Auto Tasks | - [ ] DB-Solr Sync
- [ ] Tracking Update
- [ ] Stuck Jobs
|- [ ] DB-Solr Sync
- [ ] Tracking Update
- [ ] Stuck Jobs
| - [ ] DB-Solr Sync
- [ ] Tracking Update
- [ ] Stuck Jobs
| - [ ] DB-Solr Sync
- [ ] Tracking Update
- [ ] Stuck Jobs
| - [ ] DB-Solr Sync
- [ ] Tracking Update
- [ ] Stuck Jobs
| - [ ] DB-Solr Sync
- [ ] Tracking Update
- [ ] Stuck Jobs
|- [ ] DB-Solr Sync
- [ ] Tracking Update
- [ ] Stuck Jobs
| - [ ] DB-Solr Sync
- [ ] Tracking Update
- [ ] Stuck Jobs
| - [ ] DB-Solr Sync
- [ ] Tracking Update
- [ ] Stuck Jobs
| - [ ] DB-Solr Sync
- [ ] Tracking Update
- [ ] Stuck Jobs
| ➖ |
-| Check Harvesting Emails | | | | | | | | | | | ➖ |
+| Check [Harvesting Emails](https://github.com/gsa/data.gov/wiki/Operation-and-Maintenance-Responsibilities#harvest-job-report-daily-email-report) | | | | | | | | | | | ➖ |
| [New Relic Alerts](https://alerts.newrelic.com/accounts/1601367/incidents) Triaged | | | | | | | | | | | ➖ |
-| Check Catalog Solr | ➖ | ➖ | ➖ | ➖ | ➖ | ➖ | ➖ | ➖ | ➖ | ➖ | |
+| Triage DMARC Report from Google | | | | | | | | | | | ➖ |
+| Check [Catalog Solr](https://github.com/GSA/data.gov/wiki/Operation-and-Maintenance-Responsibilities#solr) | ➖ | ➖ | ➖ | ➖ | ➖ | ➖ | ➖ | ➖ | ➖ | ➖ | |
| [Audit Log](https://docs.google.com/spreadsheets/d/1z6lqmyNxC7s5MiTt9f6vT41IS2DLLJl4HwEqXvvft40/edit) [*AU-6*](https://github.com/gsa/data.gov/wiki/Operation-and-Maintenance-Responsibilities#au-6-log-auditing) | | ➖ | ➖ | ➖ | ➖ | | ➖ | ➖ | ➖ | ➖ | ➖ |
| [Catalog Dupe Check](https://github.com/GSA/data.gov/wiki/Operation-and-Maintenance-Responsibilities#duplicate-check) | ➖ | ➖ | ➖ | ➖ | ➖ | ➖ | ➖ | ➖ | ➖ | ➖ | |
| [Invicti Scan](https://github.com/gsa/data.gov/wiki/Operation-and-Maintenance-Responsibilities#netsparker-compliance-scan-report-from-isso) | ➖ | ➖ | ➖ | ➖ | ➖ | ➖ | ➖ | ➖ | ➖ | ➖ | |