-
Notifications
You must be signed in to change notification settings - Fork 74
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Eliminate orphan objects #3732
base: master
Are you sure you want to change the base?
Eliminate orphan objects #3732
Conversation
@akostadinov Updated |
lib/tasks/database_cleanup.rake
Outdated
@@ -6,72 +6,27 @@ namespace :database do | |||
puts 'Checking and removing orphaned objects...' | |||
|
|||
# Tables to exclude from orphaned objects check | |||
excluded_tables = ['accounts', 'audits', 'categories', 'category_types', 'cms_templates', 'legal_term_acceptances', 'legal_term_bindings', | |||
'legal_term_versions', 'proxy_logs', 'schema_migrations', 'service_cubert_infos', 'slugs', 'taggings', 'tags'] | |||
excluded_tables = ['cms_templates'] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why exclude this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this exclusion can be removed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I removed the exclusion, but I don't know if a coincidence or not - the task got stuck on Found orphaned objects in cms_templates:
when I tried it in staging...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If it doesn't persist, let's think it's a coincidence. Why should cms_templates
be special?
lib/tasks/database_cleanup.rake
Outdated
|
||
if orphaned_objects.exists? | ||
puts "Found orphaned objects in #{model.table_name}:" | ||
orphaned_objects.each { |obj| puts "- ID: #{obj.id}, Tenant ID: #{obj.tenant_id}" } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd use find_each
here instead of each
to use batching.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Addressed in ee9e0a8
89a470f
to
8e47c60
Compare
I rebased this, It was too old. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wrote a couple of comments but this LGTM. I haven't tried to run the script locally though.
It would be good to write a couple of tests. Do we have tests for rake tasks?
lib/tasks/database_cleanup.rake
Outdated
@@ -6,72 +6,27 @@ namespace :database do | |||
puts 'Checking and removing orphaned objects...' | |||
|
|||
# Tables to exclude from orphaned objects check | |||
excluded_tables = ['accounts', 'audits', 'categories', 'category_types', 'cms_templates', 'legal_term_acceptances', 'legal_term_bindings', | |||
'legal_term_versions', 'proxy_logs', 'schema_migrations', 'service_cubert_infos', 'slugs', 'taggings', 'tags'] | |||
excluded_tables = ['cms_templates'] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this exclusion can be removed
This PR is stale because it has not received activity for more than 30 days. Remove stale label or comment or this will be closed in 15 days. |
This PR is stale because it has not received activity for more than 30 days. Remove stale label or comment or this will be closed in 15 days. |
23b6efa
to
ee9e0a8
Compare
task :cleanup_orphans, [:mode] => :environment do |_task, args| | ||
puts 'Checking and removing orphaned objects...' | ||
|
||
destroy = args[:mode] == "destroy" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, the idea is that by default calling the task would only print the logs.
To actually destroy stuff, one would need to run
bundle exec rake "multitenant:tenants:cleanup_orphans[destroy]"
By the way, the logs are probably too verbose... Or maybe that's just the first time, as we have lots of orphans. Once the big batch is deleted, probably it will be fine.
lib/tasks/multitenant/tenants.rake
Outdated
|
||
puts "WARNING: the found orphan objects will be destroyed" if destroy | ||
|
||
provider_account_ids = Account.where(provider: true).pluck(:id) + [Account.master.id] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've added Master's ID here, to avoid deleting master's objects. Specifically, the master itself was being reported as an orphan 😅 (because it does not have provider: true
)
Found orphaned objects in accounts:
- ID: 1, Tenant ID: 1
And we probably don't want to destroy the master 🙃
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're too goodhearted..
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But this could be hundreds of thousands of objects. Can't we filter this out in the query itself?
lib/tasks/multitenant/tenants.rake
Outdated
|
||
if destroy | ||
puts "Destroying orphan #{model.table_name}..." | ||
orphaned_objects.in_batches(of: 100).destroy_all |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added batched deletion, but still not sure if it still may have negative effects on the DB...
We might also need some more logging to see whether the operation was succeeded.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we need to revisit how we iterate over many objects
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added batched deletion, but still not sure if it still may have negative effects on the DB...
For sure batching is an improvement, what else could we do? We should be sure to run this after the billing jobs have finished and heads up ops.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ideally we would use the new Sidekiq 7.3 iteration jobs. But if we just do in a rake task in the foreground, batches would be better.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we want to run thins in produciton, we most likely need to use this iteration sidekiq job type. Because running haevy deletion in production may overburden the database. While recent hardware is much faster, I'm not sure how good this would be.
4d3c822
to
e2c5276
Compare
lib/tasks/multitenant/tenants.rake
Outdated
|
||
provider_account_ids = Account.where(provider: true).pluck(:id) + [Account.master.id] | ||
|
||
ApplicationRecord.descendants.each do |model| |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can filter the classes further by filtering out children STI classes. Because if we run the query with the base STI class, running with the children will be redundant. SOmething along the lines of:
def base_models
all_models = ApplicationRecord.descendants.select(&:arel_table).reject(&:abstract_class?)
all_models.select! { _1.attribute_names.include? "tenant_id"}
# we only want base STI classes, not the children
all_models.select! do |model|
base_class = model.base_class
# either current model is the base_class or we can't find a base class amongst the discovered models /which would be very weird/
base_class == model || all_models.none? { |potential_parent| potential_parent == base_class }
end
end
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good idea, thanks!
eea080b
to
a9d31ad
Compare
puts "Checking orphaned objects..." | ||
puts "WARNING: the found orphan objects will be destroyed" if destroy |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
puts "Checking orphaned objects..." | |
puts "WARNING: the found orphan objects will be destroyed" if destroy | |
Rails.logger.warn "Checking orphaned objects..." | |
Rails.logger.warn "WARNING: the found orphan objects will be destroyed" if destroy |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@akostadinov by default, in rake tasks Rails logger prints to the log file, and not to stdout.
We can force it by adding RAILS_LOG_TO_STDOUT=1
in front of bundle exec rake
command, but then it prints all the debug logs, which I am not interested in... to fix it we'd need to also add log level in development environment, managed by an env var.
Currently, in most of rake tasks we use puts
, I think it's fine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm, in container it prints to stdout by default.. anyway there are multiple considerations.
If we want to run things automatially, better would be logging because it will keep track of what we did.
For this task I'm not sure.
If we want to run things once or multiple times only manually, then we need to use puts
and warn
discriminatively.
puts
outputs to stdout. stdout is usually used to the information that needs to be passed down the execution pipeline. Informative messages usually go to stderr so that they don't ess up data that should go down a pipeline.
In this case warn
seems more appropriate for these messages.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So you suggest to replace both with warn
? what about the rest of puts
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This specific tasks is meant to be run once. In fact, I am not quite sure we'll ever run it, because it's probably too much 😵
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
seems like all outputs in this PR are for stderr
. wrt how many times we run, idk. We can always end-up with orphan objects if deletions fail for whatever reason.
lib/tasks/multitenant/tenants.rake
Outdated
provider_account_ids = Account.where(provider: true).pluck(:id) + [Account.master.id] | ||
|
||
base_models.each do |model| | ||
orphaned_objects = model.where.not(tenant_id: provider_account_ids) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
orphaned_objects = model.where.not(tenant_id: provider_account_ids) | |
orphaned_objects = model.where.not(tenant_id: Account.unscoped.providers_with_master.select(:id)) |
|
||
if orphaned_objects.exists? | ||
puts "Found #{orphaned_objects.size} orphaned objects for model #{model.name}:" | ||
orphaned_objects.find_in_batches(batch_size: 100).with_index do |batch, index| |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
seems like find_each will simplify this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The idea (that I haven't finished pushing, it seems), is to add some "sleep" time between each batch, to space the deletion jobs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The whole thing should happen in a different way. Most likely background deletion. Optimize this within a single rake task will probably be fragile in any case. If you want leave this alone until we figure out background deletion and then we may have a better method to handle this.
Co-authored-by: Joan Lledó <[email protected]>
f3f3411
to
442ed67
Compare
442ed67
to
682bb6e
Compare
This PR is stale because it has not received activity for more than 30 days. Remove stale label or comment or this will be closed in 15 days. |
What this PR does / why we need it:
We need a rake task to go over all tables with the
tenant_id
field and make check whether that tenant still exists, reporting and optionally removing these on the way.complete description is in JIRA
Which issue(s) this PR fixes
https://issues.redhat.com/browse/THREESCALE-10855
Note: I have commented the deleted line for now but added puts so whoever will run it while reviewing can see the puts