You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I understand that AWX is open source software provided for free and that I might not receive a timely response.
Feature type
Enhancement to Existing Feature
Feature Summary
On several models, we store information about a relationship to a foreign key that is frequently updated, and potentially risky for deadlocks. By saving this foreign key as a field on the model, we are optimizing read performance for information that is probably more frequently written than it is read.
e.g. we are optimizing for read performance when we probably ought to to be optimizing for write performance.
But that starts to get into schedules and I think requires further thinking.
For last_job_host_summary, last_job, current_job, and last_job_failed I propose we are doing more harm than good by storing these on the model. This is because job templates can have many jobs launched from them, and hosts can have many jobs running against them.
This causes us to run into potential deadlocks when multiple processes attempt to update same attribute on single hosts and job templates. I have personally observed this when testing with high volumes of small jobs against a few hosts. Increasing number of callback receivers increases possibility of running into this.
Proposed alternative:
These values can be quickly calculated in the serializer or as a property on the job, the query would be quite cheap, something along lines of (psuedocode)
One way to force this would be to run many (hundreds or thousands) of small jobs from 1 job template (allow concurrent jobs) (have playbook have just 1 task) against 1 host. Increase the number of callback receiver workers to exacerbate.
Current results
Deadlocks
Sugested feature result
Calculate these values on read
Should not show in the list view as that would be expensive (only show in detail view)
Additional information
No response
The text was updated successfully, but these errors were encountered:
I am trying to muck with these fields already in #13553 for performance reasons, and have asked around before about getting performance validation of this to clear it for merge.
The issue that motivated that change was about deadlocks, but in that case it was the task manager acting on an approval job and deadlocking deterministically, not as a race condition. The full reason was not identified, but I believe it was a state inconsistency in the task manager inside of a single run, probably as multiple approval jobs are changing state at the same time. The thing which was so frustrating was the the model .save code was doing a tremendous number of round-trips to the database to load objects which should already be present, but programmers do this defensively in distrust of the current model, not realizing that it can later on lead to even more insidious problems because changes which should be no-op are not identified correctly, and transactional conflicts arise on the database layer.
These values can be quickly calculated in the serializer or as a property on the job
While I agree something should be done, this is a non-starter because of the consequences for serializer performance. The challenge isn't so-much Django, but the django-polymorphic library we use. The query you give is more-or-less impossible to prefetch, as we have ~6 types, and we would hit all those tables to prefetch (even if the notorious jazzband/django-polymorphic#198 had a clean resolution), but we have on the order of a dozen similar fields, making the benefit marginal or negative compared to no optimization.
I would love to grant our API the freedom to remove fields like these, but there is a strong user experience case to be made for a field like last_job, as the user needs to be able to see the current status of templates in the list. More importantly, we have failed to recapture performance gains from things the UI has already removed, specifically summary_fields.recent_jobs.
The branch I had here was specifically put together to combine multiple non-interference improvements for model updates. I believe this is addressing the concerns you cite, but it is not attempting to make any change to the existing API contract, instead doing only purely internal changes:
Please confirm the following
Feature type
Enhancement to Existing Feature
Feature Summary
On several models, we store information about a relationship to a foreign key that is frequently updated, and potentially risky for deadlocks. By saving this foreign key as a field on the model, we are optimizing read performance for information that is probably more frequently written than it is read.
e.g. we are optimizing for read performance when we probably ought to to be optimizing for write performance.
To be specific:
awx/awx/main/models/inventory.py
Lines 554 to 562 in fb8fadc
awx/awx/main/models/unified_jobs.py
Lines 126 to 137 in fb8fadc
awx/awx/main/models/unified_jobs.py
Lines 118 to 125 in fb8fadc
There is also:
awx/awx/main/models/unified_jobs.py
Lines 138 to 151 in fb8fadc
For
last_job_host_summary
,last_job
,current_job
, andlast_job_failed
I propose we are doing more harm than good by storing these on the model. This is because job templates can have many jobs launched from them, and hosts can have many jobs running against them.This causes us to run into potential deadlocks when multiple processes attempt to update same attribute on single hosts and job templates. I have personally observed this when testing with high volumes of small jobs against a few hosts. Increasing number of callback receivers increases possibility of running into this.
Problem areas:
awx/awx/main/models/events.py
Lines 588 to 601 in fb8fadc
awx/awx/main/signals.py
Lines 286 to 306 in fb8fadc
awx/awx/main/models/unified_jobs.py
Lines 815 to 826 in fb8fadc
Proposed alternative:
These values can be quickly calculated in the serializer or as a
property
on the job, the query would be quite cheap, something along lines of (psuedocode)This uses https://docs.djangoproject.com/en/dev/ref/models/querysets/#latest but there is equivalent ways to do it if thats not available in our version of django
Select the relevant components
Steps to reproduce
One way to force this would be to run many (hundreds or thousands) of small jobs from 1 job template (allow concurrent jobs) (have playbook have just 1 task) against 1 host. Increase the number of callback receiver workers to exacerbate.
Current results
Deadlocks
Sugested feature result
Calculate these values on read
Should not show in the list view as that would be expensive (only show in detail view)
Additional information
No response
The text was updated successfully, but these errors were encountered: