-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Delete unwanted audits #9839
Delete unwanted audits #9839
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,24 @@ | ||
module DataMigrations | ||
class DeleteWorkHistoryAudits | ||
TIMESTAMP = 20240917154444 | ||
MANUAL_RUN = false | ||
BATCH_SIZE = 5000 | ||
|
||
def change | ||
relation.in_batches(of: BATCH_SIZE) do |batch| | ||
DeleteAuditsWorker.perform_async(batch.ids) | ||
end | ||
end | ||
|
||
def relation | ||
Audited::Audit | ||
.where('created_at >= ? and created_at <= ?', DateTime.new(2024, 9, 3, 11), DateTime.new(2024, 9, 3, 20)) | ||
CatalinVoineag marked this conversation as resolved.
Show resolved
Hide resolved
|
||
.where(user_type: nil, user_id: nil) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It might be worth putting the EXPLAIN for this query in the PR description, or here. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The explain looks decent. Using bitmap index scans. |
||
.where(action: :create) | ||
.where(username: '(Automated process)') | ||
.where(associated_type: 'ApplicationChoice') | ||
.where(auditable_type: %w[ApplicationExperience ApplicationWorkHistoryBreak]) | ||
.order(id: :asc) | ||
end | ||
end | ||
end |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,8 @@ | ||
class DeleteAuditsWorker | ||
include Sidekiq::Worker | ||
sidekiq_options queue: :low_priority | ||
|
||
def perform(audit_ids) | ||
Audited::Audit.where(id: audit_ids).delete_all | ||
end | ||
end |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,101 @@ | ||
require 'rails_helper' | ||
|
||
RSpec.describe DataMigrations::DeleteWorkHistoryAudits do | ||
describe '#change' do | ||
it 'enques jobs to DeleteAuditsWorker' do | ||
create( | ||
:application_work_history_break_audit, | ||
user_type: nil, | ||
user_id: nil, | ||
created_at: DateTime.new(2024, 9, 3, 12), | ||
action: 'create', | ||
username: '(Automated process)', | ||
) | ||
|
||
expect { described_class.new.change }.to change( | ||
DeleteAuditsWorker.jobs, :size | ||
).by(1) | ||
end | ||
end | ||
|
||
describe '#relation' do | ||
it 'returns the audits that need deleting' do | ||
work_break = create( | ||
:application_work_history_break_audit, | ||
action: 'create', | ||
user_type: nil, | ||
user_id: nil, | ||
username: '(Automated process)', | ||
created_at: DateTime.new(2024, 9, 3, 12), | ||
) | ||
work_break_2 = create( | ||
:application_work_history_break_audit, | ||
action: 'create', | ||
user_type: nil, | ||
user_id: nil, | ||
username: '(Automated process)', | ||
created_at: DateTime.new(2024, 9, 3, 18), | ||
) | ||
work_experience = create( | ||
:application_experience_audit, | ||
action: 'create', | ||
user_type: nil, | ||
user_id: nil, | ||
username: '(Automated process)', | ||
created_at: DateTime.new(2024, 9, 3, 18), | ||
) | ||
work_break_with_provider_username = create( | ||
:application_work_history_break_audit, | ||
action: 'create', | ||
username: 'Provider', | ||
) | ||
work_break_with_update_action = create( | ||
:application_work_history_break_audit, | ||
action: 'update', | ||
username: '(Automated process)', | ||
user_type: nil, | ||
user_id: nil, | ||
) | ||
work_break_outside_time_range = create( | ||
:application_work_history_break_audit, | ||
action: 'create', | ||
username: '(Automated process)', | ||
user_type: nil, | ||
user_id: nil, | ||
created_at: DateTime.new(2024, 9, 4, 18), | ||
) | ||
work_break_with_provider_user_type = create( | ||
:application_work_history_break_audit, | ||
action: 'create', | ||
username: '(Automated process)', | ||
created_at: DateTime.new(2024, 9, 3, 18), | ||
user_type: 'Provider', | ||
) | ||
work_experience_outside_time_range = create( | ||
:application_experience_audit, | ||
action: 'create', | ||
user_type: nil, | ||
user_id: nil, | ||
created_at: DateTime.new(2024, 9, 4, 18), | ||
) | ||
application_form_audit = create( | ||
:application_form_audit, | ||
action: 'create', | ||
user_type: nil, | ||
user_id: nil, | ||
username: '(Automated process)', | ||
created_at: DateTime.new(2024, 9, 3, 12), | ||
) | ||
|
||
expect(described_class.new.relation).to include(work_break, work_break_2, work_experience) | ||
expect(described_class.new.relation).not_to include( | ||
work_break_with_provider_username, | ||
work_break_with_update_action, | ||
work_break_outside_time_range, | ||
work_break_with_provider_user_type, | ||
work_experience_outside_time_range, | ||
application_form_audit, | ||
) | ||
end | ||
end | ||
end |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,17 @@ | ||
require 'rails_helper' | ||
|
||
RSpec.describe DeleteAuditsWorker do | ||
describe '#perform' do | ||
it 'deletes audits by id' do | ||
audit_1 = create(:application_work_history_break_audit) | ||
audit_2 = create(:application_work_history_break_audit) | ||
audit_3 = create(:application_work_history_break_audit) | ||
audit_ids = [audit_1.id, audit_2.id] | ||
|
||
described_class.new.perform(audit_ids) | ||
|
||
expect(Audited::Audit.where(id: audit_ids)).not_to exist | ||
expect(Audited::Audit.where(id: audit_3.id)).to exist | ||
end | ||
end | ||
end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will put here my opinion but feel free to choose to go forward or not with my views. It is just me being over cautious.
When you deleted locally the million of records how long did it take?
While it was deleted did you try to use local apply and see if the DB wasn't busy?
My main concern is deleting the 3 million at one go.
I'd recommend to use Batch Delivery to stagger for over a hour (or more?). Process deletions gradually in time would be a safer option.
The reason is deleting millions of records at one go (the workers processing all at the same time) can still result in locking issues on the audited_audit table. This could block other queries that need to access these records while the deletion is in progress.
Also another thing: how much DB disk space it took while the deletion was happening?
PostgreSQL uses MVCC (Multi-Version Concurrency Control) - this mean when a row is deleted, it isn't immediately removed from disk. Instead, it's marked as "dead" and made invisible to future transactions. The actual space is reclaimed later through a process called vacuuming.
So disk space could sky rocket if we delete million of rows.
And similar to table bloat, indexes will accumulate dead entries, which also need to be cleaned up by vacuuming.
If we have enough DB disk space, we shouldn't worried but would be nice to know how much DB disk usage percentage increases to avoid surprises.
Last thing: probably is in your radar, but let's run this large delete operations during off-peak times to minimize any impact.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did use the batch delivery and it looked like it did a count(to spread them out I think) on all the records that we want to target, 3 million and it loaded all columns of the audit table. The PR only loads the id and it doesn't do the count
So I thought these extra steps were not necessary.
Don't know how much DB disk space we have, I'll try to find out
@dcyoung-dev what do you think? Should we spread the workers a bit?