-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Delete unwanted audits #9839
Delete unwanted audits #9839
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know this isn't ready for review yet - I had ideas how we can delete these records performantly
Audited::Audit
.where('created_at => ? and created_at <= ?', DateTime.new(2024, 9, 3, 11), DateTime.new(2024, 9, 3, 20))
.where(user_type: nil, user_id: nil)
.where(action: :create)
.where(username: "(Automated process)")
.where(associated_type: "ApplicationChoice")
.where(auditable_type: ["ApplicationExperience", "ApplicationWorkHistoryBreak"])
.order(id: :asc)
.in_batches(of: 3_000, &:delete_all) # this will SQL delete 3000 rows at a time
def relation | ||
Audited::Audit | ||
.where('created_at > ? and created_at < ?', DateTime.new(2024, 9, 3, 11), DateTime.new(2024, 9, 3, 20)) | ||
.where(user_type: nil, user_id: nil) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might be worth putting the EXPLAIN for this query in the PR description, or here.
59623ec
to
723c53d
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Worth getting a few approvals for this one. Looks performant and not touching any other records 👌
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good and it sounds like you've done some realistic testing. Nice work!
spec/services/data_migrations/delete_work_history_audits_spec.rb
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a suggestion on the spec variable names, apart from that it looks solid
def change | ||
relation.in_batches(of: BATCH_SIZE) do |batch| | ||
DeleteAuditsWorker.perform_async(batch.ids) | ||
end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will put here my opinion but feel free to choose to go forward or not with my views. It is just me being over cautious.
When you deleted locally the million of records how long did it take?
While it was deleted did you try to use local apply and see if the DB wasn't busy?
My main concern is deleting the 3 million at one go.
I'd recommend to use Batch Delivery to stagger for over a hour (or more?). Process deletions gradually in time would be a safer option.
The reason is deleting millions of records at one go (the workers processing all at the same time) can still result in locking issues on the audited_audit table. This could block other queries that need to access these records while the deletion is in progress.
Also another thing: how much DB disk space it took while the deletion was happening?
PostgreSQL uses MVCC (Multi-Version Concurrency Control) - this mean when a row is deleted, it isn't immediately removed from disk. Instead, it's marked as "dead" and made invisible to future transactions. The actual space is reclaimed later through a process called vacuuming.
So disk space could sky rocket if we delete million of rows.
And similar to table bloat, indexes will accumulate dead entries, which also need to be cleaned up by vacuuming.
If we have enough DB disk space, we shouldn't worried but would be nice to know how much DB disk usage percentage increases to avoid surprises.
Last thing: probably is in your radar, but let's run this large delete operations during off-peak times to minimize any impact.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did use the batch delivery and it looked like it did a count(to spread them out I think) on all the records that we want to target, 3 million and it loaded all columns of the audit table. The PR only loads the id and it doesn't do the count
So I thought these extra steps were not necessary.
Don't know how much DB disk space we have, I'll try to find out
@dcyoung-dev what do you think? Should we spread the workers a bit?
47b2077
to
e6342f2
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm still happy with this
When we ran the big touch, on the 3rd of September 2024, we created about 3 million audits that are not necessary. This commit deletes these audis. We only intend to delete audits for ApplicationExperience and ApplicationWorkHistoryBreak that are associated with ApplicationChoice, these audits don't have a user attached to them and the username '(Automated process)'
e6342f2
to
9ef93fb
Compare
Context
By running this data migration #9741 we created a lot of audit records, almost 3 million more than we should have.
Because of these audits, we think the
/provider/activity
page is unusable now. There are too many records the query for the page needs to go over.This PR tries to remove those 3 million audits. It gets all the audits within the interval the data migration ran and deletes them.
We only intend to delete audits for
ApplicationExperience
andApplicationWorkHistoryBreak
that are associated withApplicationChoice
, these audits don't have a user attached to them and the username '(Automated process)'Explain the query that will loop through audits and create workers, in production blazer https://www.apply-for-teacher-training.service.gov.uk/support/blazer/queries/983-delete-audits-query-explain
Changes proposed in this pull request
Data migration
Guidance to review
I've run this migration on local with over 3 million records to enque with a DB of over 60 million audits, the migration did manage to enqueu all the workers and the workers were fast in doing their job.
This will probably create over 2000 jobs for the
low_priority
queue but the jobs should finish quite quick as we are just deleting records via SQL, not Active RecordThings to check