-
Notifications
You must be signed in to change notification settings - Fork 67
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Intermitting Lambda Timeout #1259
Comments
Are there any error messages in the CloudWatch Logs? |
All logs show a |
Also potentially worth noting that some objects are stuck in an |
This timeline of metrics illustrates the problem well I think. I began with small numbers of concurrent uploads (75 every 5 minutes), and saw no issue. I began increasing the number of concurrent uploads first to 150 and then to 300. As you can see, there are still no durations close to long enough to cause a time out, and 100% of the executions were successful. Once I bumped the number up to 1000, however, the problem resurfaced. You can see the execution duration balloon to the max configured for the lambda, the errors begin to surface in bulk, the 3 retries being attempted (unsuccessfully), and finally, the majority of events being dropped. You can also see that I'm maxing out at a concurrency of ~800. I don't have any other lambdas in this account that fire frequently enough to push me over the 1000 executions limit. Hopefully this additional context is helpful |
From the perspective of the Lambda Application code, a cold start means that the ClamAV and Freshclam process both need to start up. Both of those take up CPU/memory. In addition, Freshclam will need to update the local virus database from the EFS filesystem. As to why the coldstarts are happening so frequently, that would need to be answered by AWS Support |
Is it possible to increase the verbosity of the logs? Also, have you seen certain CPU/memory configurations function better than others? I know that 10,240 MB of memory is the default but is that recommended? I currently have 3008 MB provisioned. |
What specifically are you looking for? It's possible to add log statements as the Scanning Lambda function enters each "phase" of the process. The Lambda does log when a specific process throws an error, but timeouts wouldn't be logged in CloudWatch Logs as the execution halts. |
I'd like to see if I can narrow in on where the execution is timing out. Adding log statements at each phase sounds like it would help achieve this. Could you help me understand how to enable this? |
This is going to be use case specific (file type, file size, scan frequency). I haven't performed testing that would provide any real guidance. I have personally seen that larger configurations result in faster executions both in cold starts and overall. |
Amazing. Thank you for that. Is it possible to pull these changes into my project without a new version of the cdk-serverless-clamscan package? |
You would have to manually edit your local files to reflect the changes. I can not guarantee that manual changes like this won't result in a failed deployment |
Apologies if I'm misunderstanding your recommendation, but I don't host the files of this repo locally. I import them into my node project via the cdk-serverless-clamscan package and instantiate a ServerlessClamscan object in my cdk code. Would I have to host my own custom fork of this repo in order to achieve enhanced logging? |
Also, FWIW, I tried increasing the lambda memory to 10240 and am still facing the same issue. I also validated that the files being sent to the construct have sizes on the order of KBs, which I've been able to handle in the past |
I recommend that you update to the latest version. If you don't want to (not recommended) you would need to directly modify the files in your |
I have had the CDK stack running in production for roughly 3 months without issue. Starting earlier this week, we are seeing an issue where the lambdas time out after the 10 minute max. S3 objects are stuck with a scan-status tag of
IN PROGRESS
or sometimes the scan hangs before they are event tagged. I saw this issue describing a similar problem, but updating the cdk-serverless-clamscan package to 2.6.230 did not seem to resolve the issue. I would appreciate any guidance you have here. I'm happy to provide any additional detailThe text was updated successfully, but these errors were encountered: