Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Delete a newly created page if the version import fails #535

Conversation

bensheldon
Copy link
Contributor

@bensheldon bensheldon commented May 21, 2019

This fixes #192.

This PR is a broad wrapping of the import behavior with a rescue block because I'm not quite sure what the failure behavior is of importing. It's possible to target more specific import failures if you can point me in that direction.

I also considered wrapping it in a transaction but I wanted to suggest the rescue block first. If I changed to a transaction, I'd want to do a bit of reordering in that method so I could open the transaction after the API call so that the open transaction is as brief as possible.

@bensheldon bensheldon force-pushed the cleanup_pages_on_import_error branch from 514c87a to 20ed3a0 Compare May 21, 2019 14:56
Copy link
Member

@Mr0grog Mr0grog left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We talked about this briefly offline, but in taking a good look at this, I’m not sure I feel especially great about it. That’s NOT a knock on the code you wrote here — the original issue was poorly thought through on my part. I feel bad about that.

My big concern here is safety: we can (and usually do) have multiple import jobs running simultaneously, and the large gap of time between creating a page and then potentially removing it means a removing it the way it’s happening here could potentially break another import that’s happening in parallel. For example:

  1. Import job A creates Page X in order to handle Version 1, then
  2. Import job B sees Page X and use it to handle Version 2, then
  3. Import A might fail to create Version 1 and delete Page X, and then
  4. Import B will fail on saving Version 2 (even though it was valid) because the ID of Page X refers to an object that no longer exists.

That’s totally an edge case, but it’s one we don’t have to worry about at all right now, so I’m a little worried about introducing it.

The more I think about it, the transaction idea isn’t ideal either. I think you might be thinking the same, since you already pointed out:

If I changed to a transaction, I'd want to do a bit of reordering in that method so I could open the transaction after the API call so that the open transaction is as brief as possible.

A transaction could cause us to created two records for the same page (while import A is creating a page + version, import B is doing the same).

Another thing we could do while rearranging code here is leverage the fact that version_for_record goes out of its way to avoid persisting the version model it constructs. We could proactively construct the version model and check its validity first (side note: we’d have to temporarily assign it to a dummy page for validations to pass) before bothering to create a page, and never create a page at all if the version would fail to save. That still leaves the super rare scenario of Rails validations succeeding but Postgres constraints failing. We could either just not solve that, or use the transaction to handle it.

Bigger picture, though, all the above is a lot of work and rearranging to handle a rare problem. It would probably make the import code much more prone to accidental breakage in the future, too, because it would be so carefully tip-toeing around a race condition. You’ve already solved the real serious part of this by doing edgi-govdata-archiving/web-monitoring-ui#363, so I think you should only worry about doing all this stuff if you really want to. Let me know whether you plan to continue on this or drop it for now.

Again, sorry my original issue was poorly thought through and set you up for this scenario :(

@bensheldon
Copy link
Contributor Author

@Mr0grog no worries. I’ll close this for now.

@bensheldon bensheldon closed this Jun 1, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

An import that fails between creating a page and creating a version leaves an empty page record
2 participants