-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve Monitoring #4619
Labels
Comments
18 tasks
Done:
Remaining:
|
Closing in favor of #4741. All tasks here have been completed, save for increased Slack integration and adding API calls to NewRelic. Those have been carried forward. |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
At a glance
We want to know as early as possible of any issue. It is also important to understand how the system is being used and to track overall performance.
Acceptance Criteria
Anyone at a glance to know if the system is working.
We can identify changes in performance and usage
Considerations
We already have a New Relic instance, so this may be a good source.
Key metrics for any API/endpoint:
Volume (numbers of calls)
Errors (number of errors)
Latency (how long the calls are taking)
Availability (usually a combination of error rate and just uptime, ideally as % ie 99.99%)
We had a database job that was silent failing, be sure to include this and other async github jobs
This task can easily balloon out, I would suggest picking a few key endpoints (verify UEI, submit audit, database backup) to demonstrate how we can expand it to additional endpoints. We should then document what was done, how, and which endpoints we should tackle next.
Aggregate the above into a single metric that is easy to see “is it healthy?”
API is currently lacking new relic metrics
TF allows configuration of new relic dashboards, ideally this stuff all be in code.
The text was updated successfully, but these errors were encountered: