Retrying migrations #37

matthieuprat · 2018-06-13T09:07:32Z

Genuine question: have you considered automatically retrying migrations if they fail because of a lock timeout?

I'm asking this because we (@doctolib) faced a similar issue as the one described in Zero-downtime Postgres migrations - the hard parts. We lowered PG's lock timeout but our migrations sometimes hit this timeout which makes our deployment process fail. And we typically want to make this process as reliable as possible.

Not sure at all this "retry" idea is a good one (I can see several downsides) but at least, it sounds feasible under certain conditions (for instance, the migration would have to be idempotent—which is the case if it's wrapped in a transaction).

Sinjo · 2018-10-03T15:30:47Z

Hi @matthieuprat, sorry for the mega slow response!

We're currently looking at the data for why schema migrations fail to apply at GoCardless - in particular when they run into the lock or statement timeout. Our initial impression is that adding some sort of retry mechanism would help reduce the number of failures that need manual retry.

Anything we implement around retries will live on a branch or in an alpha release until we're happy that they're valuable.

We'll keep this ticket updated with what we find out.

matthieuprat · 2018-10-15T08:26:57Z

Thank you for your reply!

We ended up rolling our own solution that we open-sourced last week: safe-pg-migrations. It's not ready for prime time yet but it implements a retry mechanism for statements that needs to acquire an ACCESS EXCLUSIVE lock. It also logs blocking statements if a blocked statement fails with a lock timeout.

From our experience, apart from long transactions, lengthly vacuum freeze on huge tables are causing lock timeouts.

jfinzel · 2019-01-08T17:37:39Z

+1 for this feature. On really busy OLTP systems it's extremely likely to hit lock_timeout, but just as likely that within 5 minutes of brief retries you will indeed get the lock you need. In our env we have several tools that use this exact pattern for anything requiring ACCESS EXCLUSIVE.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Retrying migrations #37

Retrying migrations #37

matthieuprat commented Jun 13, 2018

Sinjo commented Oct 3, 2018

matthieuprat commented Oct 15, 2018

jfinzel commented Jan 8, 2019

Retrying migrations #37

Retrying migrations #37

Comments

matthieuprat commented Jun 13, 2018

Sinjo commented Oct 3, 2018

matthieuprat commented Oct 15, 2018

jfinzel commented Jan 8, 2019