The following metrics are important indicators of the health of the Amazon S3 Find and Forget Solution:
AWS/SQS - ApproximateNumberOfMessagesVisible
for the Object Deletion Queue DLQ. Any value > 0 for this metric indicates that 1 or more objects could not be processed during a deletion job. The job which triggered the message(s) to be put in the queue will have a status of COMPLETED_WITH_ERRORS and theObjectUpdateFailed
event(s) will contain further debugging information.AWS/SQS - ApproximateNumberOfMessagesVisible
for the Events DLQ. Any value > 0 for this metrics indicates that 1 or more Job Events could not be processed.AWS/Athena - ProcessedBytes/TotalExecutionTime
. If the average processed bytes and/or total execution time per query is rising, it may be indicative of the average partition size also growing in size. This is not an issue per se, however if partitions grow too large (or your dataset is unpartitioned), you may eventually encounter Athena errors.AWS/States - ExecutionsFailed
. State machine executions failing indicates that the Amazon S3 Find and Forget solution is misconfigured error. To resolve this, find the State Machine execution which failed and investigate the cause of the failure.AWS/States - ExecutionsTimedOut
. State machine timeouts indicate that Amazon S3 Find and Forget is unable to complete a job before Step Functions kills the execution due to it exceeding the allowed execution time limit. See Troubleshooting for more details.
If required, you can create CloudWatch Alarms for any of the aforementioned metrics to be notified of potential solution misconfiguration.
All standard metrics for the services used by the Amazon S3 Find and Forget Solution are available. For detailed information about the metrics and logging for a given service, view the relevant Monitoring docs for that service. The key services used by the solution:
- Lambda Metrics / Lambda Logging
- ECS Metrics / ECS Logging 1
- Athena Metrics 2
- Step Functions Metrics
- SQS Metrics
- DynamoDB Metrics
- S3 Metrics
1 CloudWatch Container Insights can be be enabled when deploying the
solution by setting EnableContainerInsights
to true
. Using Container
Insights will incur additional charges. It is disabled by default.
2 To obtain Athena metrics, you will need to enable metrics for the
workgroup you are using to execute the queries as described in the Athena
docs. By default the solution uses the primary workgroup,
however you can change this when deploying the stack using the AthenaWorkGroup
parameter