Canvas - issues sending emails - task runners crashing with "out of memory" errors when there is plenty of free memory #1979

JedMeister · 2024-08-12T23:45:53Z

Update: Proposed "fix" commented out as it does not seem to make any difference?! My colleague assures me that it was working (at least in part) however my testing suggests that it makes no difference.

Our latest v18.x Canvas has a number of known issues that we're working to resolve. AFAICT all the issues we are investigating are all related to the background task runners crashing.

The issues that have been reproduced related to background task runners crashing are:

emails not sending
unable to upload user profile avatar
unable to apply custom/updated themes

Other issues that have not been directly confirmed but appear to be related are:

As some background to the apparent cause of the issue; when any action is triggered in Canvas (e.g. sending an email, uploading a file and most changes made in the UI) the action is added to a background queue. When operating correctly, a background service initiates a task runner process to action the next job on the background job queue.

In our current Canvas release, the background service is running ok, but the individual task runners are crashing and not completing their tasks, leaving the jobs in the queue. The task runners die with an error message to the effect of "out of memory" - when there is plenty of free system memory.

We thought that we had developed a solution which seemed to resolve the issues that had been confirmed, e.g. emails starting being sent. However after testing the "fix" on multiple servers over numerous reboots, it became clear that it just changed the nature of the front end error(s) and just reduced the incidence of the task runner crashes. It didn't actually stop them occurring altogether. Intermittent task runner crashes (with the same memory error message) were still occurring.

The "fix" we developed/discovered was applying an undocumented DB migration. As noted it appears to not be a complete fix, but it can be applied (as root) with the following commands:
systemctl stop canvas_init systemctl stop apache2 cd /var/www/canvas RAILS_ENV=production bundle exec rake switchman_inst_jobs:install:migrations systemctl start canvas_init systemctl start apache2

The issue (at least after the "fix" has been applied) is intermittent for at least some cases and appears to be some sort of race condition. Unfortunately because the issue is intermittent and the only specific error message I've seen seems to be a red herring, it's particularly difficult to isolate the cause.

I have asked one of my colleagues to investigate the issue further, but so far we have had no progress. I plan to rebuild our Canvas server from scratch and carefully document the issue on a fresh server ASAP. After confirming that it is nothing we're overlooking on our end, I will lodge a bug report upstream.

The text was updated successfully, but these errors were encountered:

JedMeister · 2024-08-19T07:40:25Z

I am still having issues sending emails, but I have got the background task runners running reliably. The issue is that the delayed job runners were running out of memory and crashing.

To resolve that, edit the /var/www/canvas/config/delayed_jobs.yml config file and update the value for worker_max_memory_usage to 1073741824 (the default is 536870912). Be careful not to change the leading spaces on that (or any other) line as YAML files are white space sensitive.

Once you're done, the updated line should look like this:

  worker_max_memory_usage: 1073741824

Then restart the service (it's likely not required to restart apache, but best to do it anyway when there have been Canvas config changes):

systemctl restart canvas_init apache2

Unfortunately the emails still don't seem to be sending though?! :( I'll continue on this tomorrow...

JedMeister · 2024-08-20T05:29:09Z

FWIW I have confirmed that the host can send emails successfully. Canvas itself is not sending the emails.

The background tasks are running - and being successfully processed. There are no errors noted in the delayed_jobs log nor in the UI error log. No email jobs are showing in the jobs queue.

JedMeister added bug canvas labels Aug 12, 2024

This was referenced Aug 12, 2024

Unable to import QTI quizzes (after installing the QTI import tool) #1978

Open

Canvas LTI keys are missing for LTI 1.3 #1977

Open

JedMeister linked a pull request Aug 30, 2024 that will close this issue

deferred v18.x release - to apply in v19.0? turnkeylinux-apps/canvas#35

Open

JedMeister linked a pull request Sep 12, 2024 that will close this issue

Canvas v18.0->v18.1 patch & 18.x bt-bugfix-single script turnkeylinux/buildtasks#89

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Canvas - issues sending emails - task runners crashing with "out of memory" errors when there is plenty of free memory #1979

Canvas - issues sending emails - task runners crashing with "out of memory" errors when there is plenty of free memory #1979

JedMeister commented Aug 12, 2024 •

edited

Loading

JedMeister commented Aug 19, 2024

JedMeister commented Aug 20, 2024

Canvas - issues sending emails - task runners crashing with "out of memory" errors when there is plenty of free memory #1979

Canvas - issues sending emails - task runners crashing with "out of memory" errors when there is plenty of free memory #1979

Comments

JedMeister commented Aug 12, 2024 • edited Loading

JedMeister commented Aug 19, 2024

JedMeister commented Aug 20, 2024

JedMeister commented Aug 12, 2024 •

edited

Loading