Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Delete unwanted audits #9839

Merged
merged 1 commit into from
Nov 7, 2024
Merged

Delete unwanted audits #9839

merged 1 commit into from
Nov 7, 2024

Conversation

CatalinVoineag
Copy link
Contributor

@CatalinVoineag CatalinVoineag commented Sep 17, 2024

Context

By running this data migration #9741 we created a lot of audit records, almost 3 million more than we should have.

Because of these audits, we think the /provider/activity page is unusable now. There are too many records the query for the page needs to go over.

This PR tries to remove those 3 million audits. It gets all the audits within the interval the data migration ran and deletes them.

We only intend to delete audits for ApplicationExperience and ApplicationWorkHistoryBreak that are associated with ApplicationChoice, these audits don't have a user attached to them and the username '(Automated process)'

Explain the query that will loop through audits and create workers, in production blazer https://www.apply-for-teacher-training.service.gov.uk/support/blazer/queries/983-delete-audits-query-explain

Changes proposed in this pull request

Data migration

Guidance to review

  • I've run this migration on local with over 3 million records to enque with a DB of over 60 million audits, the migration did manage to enqueu all the workers and the workers were fast in doing their job.

  • This will probably create over 2000 jobs for the low_priority queue but the jobs should finish quite quick as we are just deleting records via SQL, not Active Record

Things to check

  • If the code removes any existing feature flags, a data migration has also been added to delete the entry from the database
  • This code does not rely on migrations in the same Pull Request
  • If this code includes a migration adding or changing columns, it also backfills existing records for consistency
  • If this code adds a column to the DB, decide whether it needs to be in analytics yml file or analytics blocklist
  • API release notes have been updated if necessary
  • If it adds a significant user-facing change, is it documented in the CHANGELOG?
  • Required environment variables have been updated added to the Azure KeyVault
  • Inform data insights team due to database changes
  • Make sure all information from the Trello card is in here
  • Rebased main
  • Cleaned commit history
  • Tested by running locally
  • Add PR link to Trello card

@CatalinVoineag
Copy link
Contributor Author

Copy link
Collaborator

@dcyoung-dev dcyoung-dev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know this isn't ready for review yet - I had ideas how we can delete these records performantly

Audited::Audit
        .where('created_at => ? and created_at <= ?', DateTime.new(2024, 9, 3, 11), DateTime.new(2024, 9, 3, 20))
        .where(user_type: nil, user_id: nil)
        .where(action: :create)
        .where(username: "(Automated process)")
        .where(associated_type: "ApplicationChoice")
        .where(auditable_type: ["ApplicationExperience", "ApplicationWorkHistoryBreak"])
        .order(id: :asc)
        .in_batches(of: 3_000, &:delete_all) # this will SQL delete 3000 rows at a time

def relation
Audited::Audit
.where('created_at > ? and created_at < ?', DateTime.new(2024, 9, 3, 11), DateTime.new(2024, 9, 3, 20))
.where(user_type: nil, user_id: nil)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be worth putting the EXPLAIN for this query in the PR description, or here.

@CatalinVoineag CatalinVoineag marked this pull request as ready for review September 19, 2024 09:09
Copy link
Collaborator

@dcyoung-dev dcyoung-dev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Worth getting a few approvals for this one. Looks performant and not touching any other records 👌

Copy link
Contributor

@elceebee elceebee left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good and it sounds like you've done some realistic testing. Nice work!

Copy link
Collaborator

@inulty-dfe inulty-dfe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a suggestion on the spec variable names, apart from that it looks solid

def change
relation.in_batches(of: BATCH_SIZE) do |batch|
DeleteAuditsWorker.perform_async(batch.ids)
end
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will put here my opinion but feel free to choose to go forward or not with my views. It is just me being over cautious.

When you deleted locally the million of records how long did it take?
While it was deleted did you try to use local apply and see if the DB wasn't busy?

My main concern is deleting the 3 million at one go.

I'd recommend to use Batch Delivery to stagger for over a hour (or more?). Process deletions gradually in time would be a safer option.

The reason is deleting millions of records at one go (the workers processing all at the same time) can still result in locking issues on the audited_audit table. This could block other queries that need to access these records while the deletion is in progress.

Also another thing: how much DB disk space it took while the deletion was happening?
PostgreSQL uses MVCC (Multi-Version Concurrency Control) - this mean when a row is deleted, it isn't immediately removed from disk. Instead, it's marked as "dead" and made invisible to future transactions. The actual space is reclaimed later through a process called vacuuming.
So disk space could sky rocket if we delete million of rows.
And similar to table bloat, indexes will accumulate dead entries, which also need to be cleaned up by vacuuming.
If we have enough DB disk space, we shouldn't worried but would be nice to know how much DB disk usage percentage increases to avoid surprises.

Last thing: probably is in your radar, but let's run this large delete operations during off-peak times to minimize any impact.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did use the batch delivery and it looked like it did a count(to spread them out I think) on all the records that we want to target, 3 million and it loaded all columns of the audit table. The PR only loads the id and it doesn't do the count

So I thought these extra steps were not necessary.

Don't know how much DB disk space we have, I'll try to find out

@dcyoung-dev what do you think? Should we spread the workers a bit?

Copy link
Collaborator

@dcyoung-dev dcyoung-dev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm still happy with this

app/services/data_migrations/delete_work_history_audits.rb Outdated Show resolved Hide resolved
When we ran the big touch, on the 3rd of September 2024, we created
about 3 million audits that are not necessary.

This commit deletes these audis.

We only intend to delete audits for ApplicationExperience and
ApplicationWorkHistoryBreak that are associated with ApplicationChoice,
these audits don't have a user attached to them and the username
'(Automated process)'
@CatalinVoineag CatalinVoineag merged commit 58f45b0 into main Nov 7, 2024
23 checks passed
@CatalinVoineag CatalinVoineag deleted the cv/delete-audits branch November 7, 2024 09:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants