Skip to content

Troubleshooting

Liz Krznarich edited this page Nov 3, 2023 · 4 revisions

ROR services are very stable and rarely experience issues and downtime.

Common issues

Issue Common causes Action(s)
ROR API is down Elastic search at 100% CPU usage Typically nothing; app will recover on its own when traffic subsides. If it does not recover, force a restart with deployment to ror-api or (in case of emergency) AWS CLI request like aws ecs update-service --force-new-deployment --cluster CLUSTER_NAME --service SERVICE_NAME . If this happens repeatedly due to traffic from specific IPs, IPs can be blocked by adding them to the blacklist_ips_prod variable in Terraform Cloud and triggering a manual Terraform run.
Ror-site won’t deploy Dependency issues Review Actions log; check dependencies pulled in during actions run .Trigger deployment again if needed
No app logs from ECS containers (not really an issue itself, but makes it hard to troubleshoot) Nginx logs are not being forwarded (bug in Phusion Passenger https://github.com/phusion/passenger-docker/issues/224 SSH to container (see below) and restart nginx-log-forwarder

Reverting/overwriting Elastic search index

In case of issues with the data release process, it's possible to delete and recreated the Elasticsearch index from a data dump.

  1. SSH to running ECS container - see Bastion host entry in 1Password

  2. Run the setup up command and pass the filename of the data dump you want to index (no file extension). File must exist in ror-data.

     python manage.py setup v1.0-2022-03-17-ror-data
    
Clone this wiki locally