-
Notifications
You must be signed in to change notification settings - Fork 1
Handling uniqueness errors
Johnathan Martin edited this page Jul 15, 2020
·
9 revisions
If you see an error in Honeybadger like this:
RuntimeError: Duplicate workflow step for druid:rn729gn5113 accessionWF sdr-ingest-transfer
You may fix this by going to the rails console and finding out which version is causing the trouble:
druid = 'druid:rn729gn5113'
WorkflowStep.where(druid: druid, workflow: 'accessionWF', process: 'sdr-ingest-transfer').order(:id).pluck(:version, :process, :id)
=> [[1, "sdr-ingest-transfer", 79045117],
[2, "sdr-ingest-transfer", 84412718],
[2, "sdr-ingest-transfer", 84413318],
[3, "sdr-ingest-transfer", 168576319]]
Here we can see there are two steps for version 2.
Now we see if there are any other duplicate steps in that version. In the case below, there's a whole duplicate workflow.
WorkflowStep.where(druid: druid, workflow: 'accessionWF', version: 2).order(:process, :id).pluck(:version, :process, :id)
=> [[2, "content-metadata", 84412712],
[2, "content-metadata", 84413305],
[2, "descriptive-metadata", 84412710],
[2, "descriptive-metadata", 84413303],
[2, "end-accession", 84412721],
[2, "end-accession", 84413325],
[2, "provenance-metadata", 84412717],
[2, "provenance-metadata", 84413316],
[2, "publish", 84412716],
[2, "publish", 84413314],
[2, "remediate-object", 84412714],
[2, "remediate-object", 84413309],
[2, "reset-workspace", 84412720],
[2, "reset-workspace", 84413323],
[2, "rights-metadata", 84412711],
[2, "rights-metadata", 84413304],
[2, "sdr-ingest-received", 84412719],
[2, "sdr-ingest-received", 84413320],
[2, "sdr-ingest-transfer", 84412718],
[2, "sdr-ingest-transfer", 84413318],
[2, "shelve", 84412715],
[2, "shelve", 84413311],
[2, "start-accession", 84412709],
[2, "start-accession", 84413302],
[2, "technical-metadata", 84412713],
[2, "technical-metadata", 84413306]]
We can remove the second copy of each step:
approach 1:
WorkflowStep.find(84413305, 84413303, 84413325, 84413316, 84413314, 84413309, 84413323, 84413304, 84413320, 84413318, 84413311, 84413302, 84413306).map(&:destroy)
approach 2:
note the id for the second start accession (84413302 in the example)
ids = WorkflowStep.where(druid: druid, workflow: 'accessionWF', version: 2).where("id >= ?", 84413302).order(:id).pluck(:id).flatten
ids.each { |id| WorkflowStep.find(id).destroy }
And finally, go back to https://robot-console-prod.stanford.edu/failed and retry the step.
def versions_with_dupe_steps(druid, wf_name)
WorkflowStep.where(druid: druid, workflow: wf_name).group(:process, :version).having('count(version) > 1').pluck('distinct(version)')
end
# call once with just the druid and wf_name to make sure the right things would get cleaned up, call again with "false" for the optional third param to actually do the cleanup
def check(druid, wf_name, is_dry_run = true)
versions_to_remediate = versions_with_dupe_steps(druid, wf_name)
return 'no duplicates found' unless versions_to_remediate
puts "versions with duplicate steps: \n#{versions_to_remediate}"
versions_to_remediate.each do |version|
steps = WorkflowStep.where(druid: druid, workflow: wf_name, version: version).order(:process, :id).pluck(:process, :id)
# Double checking here.
steps.group_by(&:first).each do |process, steps_for_process|
raise "There are #{steps_for_process.count} steps for #{process}" unless steps_for_process.size == 2
puts "#{'[would be]' if is_dry_run}Deleting #{steps_for_process.last.last}"
WorkflowStep.find(steps_for_process.last.last).destroy unless is_dry_run
end
puts "#{'[would have]' if is_dry_run}Removed duplicate workflow (steps) for #{druid} #{version}"
end
end
druid = 'bc123df4567'
wf_name = 'wedgedWF'
# query to sanity check the `check` output below
WorkflowStep.where(druid: druid, workflow: wf_name).order(:version, :process, :status, :id).pluck(:version, :process, :status, :id)
# output like:
# [[1, "release-members", "completed", 67260989],
# [1, "release-members", "skipped", 55236716],
# [1, "release-publish", "completed", 55236718],
# [1, "release-publish", "completed", 67260990],
# [1, "start", "completed", 55236715],
# [1, "start", "completed", 67260988],
# [1, "update-marc", "completed", 55236720],
# [1, "update-marc", "completed", 67260991],
# [2, "release-members", "queued", 171453262],
# [2, "release-publish", "waiting", 171453263],
# [2, "start", "completed", 171453261],
# [2, "update-marc", "waiting", 171453264]]
# call once with just the druid and wf_name to make sure the right things would get cleaned up, call again with "false" for the optional third param to actually do the cleanup
check(druid, wf_name)
# output like
# versions with duplicate steps:
# [1]
# [would be]Deleting 67260989
# [would be]Deleting 67260990
# [would be]Deleting 67260988
# [would be]Deleting 67260991
# [would have]Removed duplicate workflow (steps) for druid:bj234mg6185 1
# => [1]
# and if the output of that looks good compared to the query results
check(druid, wf_name, false)
# output like
# versions with duplicate steps:
# [1]
# Deleting 67260989
# Deleting 67260990
# Deleting 67260988
# Deleting 67260991
# Removed duplicate workflow (steps) for druid:bj234mg6185 1
# => [1]