Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

O+M 2023-06-30 #4366

Closed
10 tasks done
hkdctol opened this issue Jun 23, 2023 · 5 comments
Closed
10 tasks done

O+M 2023-06-30 #4366

hkdctol opened this issue Jun 23, 2023 · 5 comments
Assignees
Labels
O&M Operations and maintenance tasks for the Data.gov platform

Comments

@hkdctol
Copy link
Contributor

hkdctol commented Jun 23, 2023

As part of day-to-day operation of Data.gov, there are many Operation and Maintenance (O&M) responsibilities. Instead of having the entire team watching notifications and risking some notifications slipping through the cracks, we have created an O&M Triage role. One person on the team is assigned the Triage role which rotates each sprint. This is not meant to be a 24/7 responsibility, only East Coast business hours. If you are unavailable, please note when you will be unavailable in Slack and ask for someone to take on the role for that time.

Miscs

Acceptance criteria

You are responsible for all O&M responsibilities this week. We've highlighted a few so they're not forgotten. You can copy each checklist into your daily report.

Daily Checklist

Check Production State/Actions

Note: Catalog Auto Tasks
You will need to update the chart values manually. Click the Action link in each issue and grab the values from monitor task output and check runtime.

Weekly Checklist

@hkdctol
Copy link
Contributor Author

hkdctol commented Jun 23, 2023

@hkdctol hkdctol moved this to 📟 Sprint Backlog [7] in data.gov team board Jun 23, 2023
@FuhuXia FuhuXia moved this from 📟 Sprint Backlog [7] to 🏗 In Progress [8] in data.gov team board Jun 26, 2023
@FuhuXia
Copy link
Member

FuhuXia commented Jun 27, 2023

GSA/catalog.data.gov#973

ckan harvester run command failed one time when processing harvest source

...
2023-06-27T08:14:16.66+0000 [APP/TASK/ckanharvesterrun-5387335272-1/0] OUT 2023-06-27 08:14:16,659 INFO  [ckanext.datagovcatalog.harvester.notifications] Adding extra recipients for source 38f3722d-48ce-4dbf-abde-941b707e5ad5
2023-06-27T08:14:18.64+0000 [APP/TASK/ckanharvesterrun-5387335272-1/0] OUT 2023-06-27 08:14:18,645 INFO  [ckanext.geodatagov.plugin] Added FQ to collection_package_id
2023-06-27T08:14:19.19+0000 [APP/TASK/ckanharvesterrun-5387335272-1/0] OUT Exit status 1

Reharvest the source 38f3722d-48ce-4dbf-abde-941b707e5ad5 and watch output to make sure the error is not repeating

===============
[UPDATE]
reharvested manually on the same source. No Error.

2023-06-27T10:13:40.87-0400 [APP/TASK/ckanharvesterrun-5390812886-1/0] OUT 2023-06-27 14:13:40,872 INFO  [ckanext.geodatagov.plugin] Added FQ to collection_package_id
2023-06-27T10:13:41.37-0400 [APP/TASK/ckanharvesterrun-5390812886-1/0] OUT 2023-06-27 14:13:41,374 INFO  [ckanext.datagovcatalog.harvester.notifications] Extra recipients for source found: [{'name': '████@neh.gov', 'email': '████@neh.gov'}]
...
2023-06-27T10:13:46.21-0400 [APP/TASK/ckanharvesterrun-5390812886-1/0] OUT 2023-06-27 14:13:46,210 INFO  [ckan.lib.mailer] Sent email to ████@neh.gov
2023-06-27T10:13:46.21-0400 [APP/TASK/ckanharvesterrun-5390812886-1/0] OUT 2023-06-27 14:13:46,212 DEBUG [ckanext.harvest.logic.action.update] No jobs to send to the gather queue
2023-06-27T10:13:47.56-0400 [APP/TASK/ckanharvesterrun-5390812886-1/0] OUT Exit status 0

@FuhuXia
Copy link
Member

FuhuXia commented Jun 27, 2023

https://github.com/GSA/catalog.data.gov/actions/runs/5387335272/jobs/9778522484

...
2023-06-27T08:14:02.01+0000 [APP/TASK/ckanharvesterrun-5387335272-1/0] OUT 2023-06-27 08:14:02,009 INFO  [ckanext.datagovcatalog.harvester.notifications] Extra recipients for source found: []
2023-06-27T08:14:02.38+0000 [APP/TASK/ckanharvesterrun-5387335272-1/0] OUT 2023-06-27 08:14:02,381 INFO  [ckan.lib.mailer] Sent email to █████@gsa.gov
2023-06-27T08:14:02.72+0000 [APP/TASK/ckanharvesterrun-5387335272-1/0] OUT 2023-06-27 08:14:02,725 INFO  [ckan.lib.mailer] Sent email to █████@gsa.gov
...

Logs shows we are sending harvest emails to members that have been off-boarded. Examining the system to see if this is an off-boarding issue or code bug.

=================
[UPDATE]
It is code bug. #4368 created.

@FuhuXia
Copy link
Member

FuhuXia commented Jun 27, 2023

For the first time over the past a few months, our DB and SOLR are totally in sync after daily harvesting.
https://github.com/GSA/catalog.data.gov/actions/runs/5385352045/jobs/9774274968

2023-06-27 03:35:39,795 INFO  [ckanext.geodatagov] total 373590 solr indexed_package
2023-06-27 03:35:40,218 INFO  [ckanext.geodatagov] 0 packages need to be removed from Solr
2023-06-27 03:35:40,218 INFO  [ckanext.geodatagov] 0 packages need to be updated/added to Solr
2023-06-27 03:35:40,218 INFO  [ckanext.geodatagov] 0 packages without harvest_object need to be mannually deleted
Exit status 0

@hkdctol hkdctol added the O&M Operations and maintenance tasks for the Data.gov platform label Jul 3, 2023
@FuhuXia
Copy link
Member

FuhuXia commented Jul 5, 2023

Inventory solr went down, affecting staging and production. After deliberately restarting development solr, development inventory went down too. Possible cause is some system change on AWS side. Resolved a few hours later. Details in Slack discussion.

@FuhuXia FuhuXia closed this as completed Jul 5, 2023
@github-project-automation github-project-automation bot moved this from 🏗 In Progress [8] to ✔ Done in data.gov team board Jul 5, 2023
@hkdctol hkdctol moved this from ✔ Done to 🗄 Closed in data.gov team board Jul 6, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
O&M Operations and maintenance tasks for the Data.gov platform
Projects
Archived in project
Development

No branches or pull requests

2 participants