-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How do we occasionally test deployed instances and report errors? #52
Comments
We have implemented most of these things and are in the final step of setting up email alerts. Service Health Checks: Most of these services are currently set up with health checks that monitor their operational status. These health checks are configured, which examine the services at 5-second intervals to verify they are functioning as expected. In case the service becomes unhealthy, the copilot will trigger an automatic restart to minimize downtime and restore service. Autoscaling setup: The test witness service is now configured with autoscaling, allowing it to dynamically scale between a set range of tasks, currently set to 1 and 2. The triggers are presently set up based on CPU & memory usage, with certain thresholds, so the service scales up automatically during increased load and scales down when the load decreases. CloudWatch Monitoring/Alarms: Besides health checks and autoscaling, we are utilizing AWS CloudWatch to monitor key performance metrics such as CPU and memory usage. A CloudWatch dashboard has been set up for the test witness service, and alarms are configured to trigger when certain thresholds are crossed, and which will help us manage performance. Automated Alerts: The final thing is setting up automated alerts that will notify us via email when an alarm is activated. And would allow us to identify and address any potential service disruptions or performance issues. |
@ronakseth96 thank you for the synopsis! Can you create the necessary follow-on issues and make sure they are in the reg-pilot project. |
updates with reference to the service autoscaling, monitoring, and alerts:
|
For instance our witnesses, api, and verifier are deployed to dev an test. But do we test it daily/automatically to determine if they are healthy?
The text was updated successfully, but these errors were encountered: