Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Eliminate orphan objects #3732

Open
wants to merge 3 commits into
base: master
Choose a base branch
from
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
41 changes: 41 additions & 0 deletions lib/tasks/multitenant/tenants.rake
Original file line number Diff line number Diff line change
Expand Up @@ -77,6 +77,47 @@ namespace :multitenant do
Rails.logger.error "Inconsistent tenant_ids for:\n#{inconsistent.map {_1.join(" ")}.join("\n")}"
end

desc 'Check and remove orphaned objects (whose tenant is missing), pass "destroy" argument to delete'
task :cleanup_orphans, [:mode] => :environment do |_task, args|
destroy = args[:mode] == "destroy"
Copy link
Contributor

@mayorova mayorova Aug 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, the idea is that by default calling the task would only print the logs.
To actually destroy stuff, one would need to run

bundle exec rake "multitenant:tenants:cleanup_orphans[destroy]"

By the way, the logs are probably too verbose... Or maybe that's just the first time, as we have lots of orphans. Once the big batch is deleted, probably it will be fine.


puts "Checking orphaned objects..."
puts "WARNING: the found orphan objects will be destroyed" if destroy
Comment on lines +84 to +85
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
puts "Checking orphaned objects..."
puts "WARNING: the found orphan objects will be destroyed" if destroy
Rails.logger.warn "Checking orphaned objects..."
Rails.logger.warn "WARNING: the found orphan objects will be destroyed" if destroy

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@akostadinov by default, in rake tasks Rails logger prints to the log file, and not to stdout.
We can force it by adding RAILS_LOG_TO_STDOUT=1 in front of bundle exec rake command, but then it prints all the debug logs, which I am not interested in... to fix it we'd need to also add log level in development environment, managed by an env var.

Currently, in most of rake tasks we use puts, I think it's fine.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm, in container it prints to stdout by default.. anyway there are multiple considerations.

If we want to run things automatially, better would be logging because it will keep track of what we did.

For this task I'm not sure.

If we want to run things once or multiple times only manually, then we need to use puts and warn discriminatively.

puts outputs to stdout. stdout is usually used to the information that needs to be passed down the execution pipeline. Informative messages usually go to stderr so that they don't ess up data that should go down a pipeline.

In this case warn seems more appropriate for these messages.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So you suggest to replace both with warn? what about the rest of puts ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This specific tasks is meant to be run once. In fact, I am not quite sure we'll ever run it, because it's probably too much 😵

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seems like all outputs in this PR are for stderr. wrt how many times we run, idk. We can always end-up with orphan objects if deletions fail for whatever reason.


base_models.each do |model|
orphaned_objects = model.where.not(tenant_id: Account.unscoped.providers_with_master.select(:id))

if orphaned_objects.exists?
puts "Found #{orphaned_objects.size} orphaned objects for model #{model.name}:"
seconds_between_batches = 15

orphaned_objects.find_in_batches(batch_size: 100).with_index do |batch, index|
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seems like find_each will simplify this

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The idea (that I haven't finished pushing, it seems), is to add some "sleep" time between each batch, to space the deletion jobs.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The whole thing should happen in a different way. Most likely background deletion. Optimize this within a single rake task will probably be fragile in any case. If you want leave this alone until we figure out background deletion and then we may have a better method to handle this.

puts "Processing batch #{index+1} of model #{model.name}..."
wait_time = (index * seconds_between_batches).seconds
batch.each do |object|
puts "- ID: #{object.id}, Tenant ID: #{object.tenant_id}"
DeletePlainObjectWorker.set(wait: wait_time).perform_later(object) if destroy
end
end
else
puts "No orphaned objects found for model #{model.name}."
end
end

puts 'Orphaned objects check completed.'
end

def base_models
all_models = ApplicationRecord.descendants.select(&:arel_table).reject(&:abstract_class?)
all_models.select! { _1.attribute_names.include? "tenant_id"}
# we only want base STI classes, not the children
all_models.select do |model|
base_class = model.base_class
# either current model is the base_class or we can't find a base class amongst the discovered models (which would be very weird)
base_class == model || all_models.none? { |potential_parent| potential_parent == base_class }
end
end

def update_tenant_ids(tenant_id_block, association_block, condition, **args)
query = args[:table_name].constantize.joining(&association_block).where.has(&condition)
puts "------ Updating #{args[:table_name]} ------"
Expand Down