Add side_by_side mode to update_materialized_view #430

calebhearth · 2024-11-19T23:59:05Z

This adds a side_by_side kwarg to the update_materialized_view method, which builds the new view alongside the old one and then atomically swaps them to reduce downtime at the cost of increasing disk usage. It is plumbed through to migrations as a hash value for the materialized kwarg of update_view.

Fixes #387

mjankowski · 2024-12-11T19:37:32Z

Does usage/docs/README/generator need to list this option?

I was able to figure out how/where to add the option from looking at the diff here, might not be obvious to the general public of materialized view operators?

derekprior · 2024-12-30T17:56:26Z

lib/scenic/adapters/postgres.rb

+          rename_materialized_view(name, drop_name)
+          rename_materialized_view(new_name, name)
+          drop_materialized_view(drop_name)


Why do we:

rename existing view

rename new view

drop existing view

instead of:

drop existing view

rename new view

In the first case, the view is no longer available to the application anyway. Is it that drop is potentially slow?

Not sure. OP may know; I was just rearranging

To guarantee that if you're outside a transaction, there's no potential outage window where the view is missing.

I think we need to be inside a transaction. Really not wild about the state we'd leave someone in if this wasn't inside a transaction. But maybe I'm completely wrong. What do you think?

Agreed - we should be in transactions when performing side_by_side migrations.

Postgres supports SAVEPOINT but it appears that we'd still need to be inside a transaction for it to be useful, so I'm thinking that doesn't help here.

lib/scenic/index.rb

lib/scenic/adapters/postgres/index_reapplication.rb

derekprior · 2025-01-01T01:23:06Z

I've made a very sizable refactor to something I'm happier to maintain. Don't absolutely love the object names introduced here, but like the factoring. Commit message explains much of my thinking:

The initial implementation of side_by_side materialized view creation
worked but had a couple of issues that needed to be resolved and I
wanted to refactor the code for better maintainability.

We had postgres-specific things in the Scenic::Index class, which is
not part of the adapter API. The code was refactored to not rely on
adding the schema name to this object.

Index migration is different from index reapplication, and it felt
like we were reusing IndexReapplication just to get access to the
SAVEPOINT functionality in that class. I extracted IndexCreation
which is now used by IndexReapplication and our new class,
IndexMigration.

Side-by-side logic was moved to a class of its own, SideBySide, for
encapsulation purposes.

Instead of conditionally hashing the view name in the case where the
temporary name overflows the postgres identifier limit, we now always
hash the temporary object names. This just keeps the code simpler and
observed behavior from the outside identical no matter identifier
length. This behavior is tested in the new TemporaryName class.

Removed rename_materialized_view from the public API on the adapter,
as I'd like to make sure that's something we want separate from this
before we do something like that.

Added connection to the public adapter UI for ease of use from our
helper objects. Documented as internal use only.

I added a number of tests for new and previously existing code.

Still to do:

I think we should consider enforcing that side_by_side updates are
done in a transaction. Feels like it would be really bad if we failed
somewhere after renaming indexes on the current view.

We should consider adding a final check to make sure no indexes or
views are left behind that have our temporary name prefix. But I
think that's probably being paranoid?

README documentation and generator support

@calebhearth @Roguelazer @mjankowski - Any thoughts? I haven't tried this in a real app yet, so another to do is to make sure the output when doing this is sufficient and looks good.

derekprior · 2025-01-02T14:16:48Z

I've improved test coverage, added generator support. Still to do:

Ensure we're in a transaction for this migration unless anyone thinks this is a bad idea?
Update README
Test in a real app

calebhearth · 2025-01-02T14:56:38Z

Don’t all migrations run in transactions

renchap · 2025-01-02T15:08:27Z

Don’t all migrations run in transactions

You can disable transactions for some migrations, and this is sometimes very useful if you have migrations that can lock a table for a long time. We do this in Mastodon for example.

derekprior · 2025-01-02T15:22:55Z

@renchap would you want to disable it with side_by_side materialized view migrations?

I view this as an atomic operation. Failing in the middle could leave you, for example, with nonsense index names on the original view that eventually find their way into your schema.rb, but not your coworkers. I don't want those support requests and I think the fact that we're now refreshing before we do anything that requires locking should alleviate the transaction concern.

My options are:

Raise an error if it's called without an open transaction present already. This leaves it to the user to either remove disable_ddl_transaction! or to manually start a transaction in the part of the migration that updates the materialized view.
Silently start a transaction if there isn't one started already.

renchap · 2025-01-02T15:43:04Z

I would prefer your first solution. It would fail with an explicit error message on development so the developer can understand what happens and adjust the migration. Starting a transaction silently can very easily create production outages in my experience.

derekprior · 2025-01-02T23:35:25Z

This is now ready. I updated the output when running side by side updates, made a change to require an open transaction, added an end-to-end test, and more. I also ran this in my own sample app in several different ways.

@renchap @mjankowski any chance either of you could give it a whirl on Mastodon and let me know how it works out?

Would love to get this merged with some more confidence.

mjankowski · 2025-01-03T14:21:15Z

Updated my branch to use latest from this PR, added new migration with SIDE BY SIDE enabled, see output like:

== 20250103141121 UpdateGlobalFollowRecommendationsToVersion3: migrating ======
-- update_view(:global_follow_recommendations, {:version=>3, :revert_to_version=>2, :materialized=>{:side_by_side=>true}})
   -> temporary materialized view _scenic_sbs_53c2caf36ce95257a84e529f90957dc0d96c27c5 has been created
   -> indexes on 'global_follow_recommendations' have been renamed to avoid collisions
   -> index 'index_global_follow_recommendations_on_account_id' on '_scenic_sbs_53c2caf36ce95257a84e529f90957dc0d96c27c5' has been created
   -> materialized view global_follow_recommendations has been dropped
   -> temporary materialized view _scenic_sbs_53c2caf36ce95257a84e529f90957dc0d96c27c5 has been renamed to global_follow_recommendations
   -> 0.0056s
== 20250103141121 UpdateGlobalFollowRecommendationsToVersion3: migrated (0.0056s)

(and tailing full log shows operations happening -- create new view with unique name, rename index, create index, drop old view, rename new view to old view name).

Still have not run with any meaningful large table size or contemplated how changing code and changing view defs around same time would have to interact with deployments/etc - but thats slightly out of scope for the narrow feature here.

derekprior · 2025-01-03T16:14:46Z

Thanks! I updated the logging to make sure we're quoting table names consistently. But I think this is ready for a squash and merge. I'll probably get to that this evening.

Then it's a matter of release. I kind of think the tsort changes merged a while back but unreleased changes might necessitate a 2.0 release, in which case there are other things I'd like to get in too. 🤔

Still have not run with any meaningful large table size or contemplated how changing code and changing view defs around same time would have to interact with deployments/etc - but thats slightly out of scope for the narrow feature here.

Yeah, I was thinking about that yesterday. I don't know if this changes anything meaningfully around migration/deployment practices. But if you come up with anything, let me know.

This adds a `side_by_side` kwarg to the `update_materialized_view` method, which builds the new view alongside the old one and then atomically swaps them to reduce downtime at the cost of increasing disk usage. It is plumbed through to migrations as a hash value for the `materialized` kwarg of `update_view`.

Copilot reviewed 7 out of 22 changed files in this pull request and generated no comments.

Files not reviewed (15)

lib/generators/scenic/view/templates/db/migrate/update_view.erb: Evaluated as low risk
spec/scenic/statements_spec.rb: Evaluated as low risk
lib/scenic/adapters/postgres.rb: Evaluated as low risk
lib/scenic/adapters/postgres/errors.rb: Evaluated as low risk
lib/scenic/adapters/postgres/index_creation.rb: Evaluated as low risk
lib/scenic/adapters/postgres/index_migration.rb: Evaluated as low risk
lib/scenic/adapters/postgres/index_reapplication.rb: Evaluated as low risk
lib/scenic/adapters/postgres/side_by_side.rb: Evaluated as low risk
lib/scenic/adapters/postgres/temporary_name.rb: Evaluated as low risk
spec/scenic/command_recorder_spec.rb: Evaluated as low risk
spec/generators/scenic/view/view_generator_spec.rb: Evaluated as low risk
lib/scenic/statements.rb: Evaluated as low risk
spec/acceptance/user_manages_views_spec.rb: Evaluated as low risk
lib/generators/scenic/materializable.rb: Evaluated as low risk
spec/scenic/adapters/postgres_spec.rb: Evaluated as low risk

Comments suppressed due to low confidence (2)

spec/scenic/adapters/postgres/side_by_side_spec.rb:5

The test coverage for the side_by_side mode should include edge cases such as failure in temporary view creation, index migration failure, and inability to drop the original view.

describe Postgres::SideBySide, :db, :silence do

spec/scenic/adapters/postgres/index_migration_spec.rb:18

Add an assertion to check if the original indexes are renamed to avoid collisions.

expect(indexes_for_original.first.index_name).not_to eq "hi_greeting_idx"

Tip: Turn on automatic Copilot reviews for this repository to get quick feedback on every pull request. Learn more

calebhearth

Looks great overall. A couple of minor nits but I think this could ship as-is if you're feeling good about it.

lib/generators/scenic/materializable.rb

lib/scenic/adapters/postgres/errors.rb

lib/scenic/adapters/postgres/side_by_side.rb

The initial implementation of side_by_side materialized view creation worked but had a couple of issues that needed to be resolved and I wanted to refactor the code for better maintainability. * We had postgres-specific things in the `Scenic::Index` class, which is not part of the adapter API. The code was refactored to not rely on adding the schema name to this object. * Index migration is different from index reapplication, and it felt like we were reusing `IndexReapplication` just to get access to the `SAVEPOINT` functionality in that class. I extracted `IndexCreation` which is now used by `IndexReapplication` and our new class, `IndexMigration`. * Side-by-side logic was moved to a class of its own, `SideBySide`, for encapsulation purposes. * Instead of conditionally hashing the view name in the case where the temporary name overflows the postgres identifier limit, we now always hash the temporary object names. This just keeps the code simpler and observed behavior from the outside identical no matter identifier length. This behavior is tested in the new `TemporaryName` class. * Removed `rename_materialized_view` from the public API on the adapter, as I'd like to make sure that's something we want separate from this before we do something like that. * Added `connection` to the public adapter UI for ease of use from our helper objects. Documented as internal use only. * Require a transaction in order to use `side_by_side`. This prevents leaving the user in a weird state that would be difficult to recover from. * Added `--side-by-side` (and `--side_by_side`) support to the `scenic:view` generator. Also added `--no-data` as an alias for the existing `--no_data` while I was at it. * I added a number of tests for new and previously existing code throughout, including an acceptance level test for `side_by_side`. Our test coverage should be much improved. * Updated README with information on `side_by_side`. Here's a sample of the output from running a `side_by_side` update: ``` == 20250102191533 UpdateSearchesToVersion3: migrating ========================= -- update_view(:searches, {version: 3, revert_to_version: 2, materialized: {side_by_side: true}}) -> temporary materialized view _scenic_sbs_8a03f467c615b126f59617cc510d2abd41296834 has been created -> indexes on 'searches' have been renamed to avoid collisions -> index 'index_searches_on_content' on '_scenic_sbs_8a03f467c615b126f59617cc510d2abd41296834' has been created -> index 'index_searches_on_user_id' on '_scenic_sbs_8a03f467c615b126f59617cc510d2abd41296834' has been created -> materialized view searches has been dropped -> temporary materialized view _scenic_sbs_8a03f467c615b126f59617cc510d2abd41296834 has been renamed to searches -> 0.0299s == 20250102191533 UpdateSearchesToVersion3: migrated (0.0300s) ================ ```

cpb · 2025-02-01T00:57:39Z

FWIW I trialed this in a staging environment and on a new view serving 2 req/s perceived no downtime for rematerializations that took 30 sec or 45 minutes.

calebhearth force-pushed the less-locks branch from 2864288 to 8e9c5c4 Compare November 20, 2024 00:07

renchap mentioned this pull request Dec 11, 2024

Find a way to update materialized views using Scenic mastodon/mastodon#33275

Open

derekprior reviewed Dec 30, 2024

View reviewed changes

lib/scenic/index.rb Outdated Show resolved Hide resolved

derekprior reviewed Dec 30, 2024

View reviewed changes

lib/scenic/index.rb Outdated Show resolved Hide resolved

derekprior reviewed Dec 30, 2024

View reviewed changes

lib/scenic/index.rb Outdated Show resolved Hide resolved

derekprior reviewed Dec 30, 2024

View reviewed changes

lib/scenic/adapters/postgres/index_reapplication.rb Outdated Show resolved Hide resolved

derekprior mentioned this pull request Dec 30, 2024

Add side_by_side mode to update_materialized_view #387

Closed

derekprior force-pushed the less-locks branch from 8e9c5c4 to ec6c73c Compare December 30, 2024 23:40

derekprior force-pushed the less-locks branch from f9701f5 to d992429 Compare January 4, 2025 01:19

derekprior requested a review from Copilot January 4, 2025 01:20

Copilot AI reviewed Jan 4, 2025

View reviewed changes

calebhearth commented Jan 8, 2025

View reviewed changes

lib/generators/scenic/materializable.rb Show resolved Hide resolved

lib/generators/scenic/materializable.rb Show resolved Hide resolved

lib/scenic/adapters/postgres/errors.rb Outdated Show resolved Hide resolved

lib/scenic/adapters/postgres/side_by_side.rb Show resolved Hide resolved

derekprior force-pushed the less-locks branch from 0a58be2 to a2a9be5 Compare January 13, 2025 19:58

derekprior merged commit a9f2cff into main Jan 15, 2025
7 checks passed

derekprior deleted the less-locks branch January 15, 2025 15:38

derekprior mentioned this pull request Jan 15, 2025

make update_view for materialized views create less downtime #386

Closed

renchap mentioned this pull request Jan 23, 2025

Migrate to PostgreSQL lobsters/lobsters#539

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add side_by_side mode to update_materialized_view #430

Add side_by_side mode to update_materialized_view #430

calebhearth commented Nov 19, 2024

mjankowski commented Dec 11, 2024

derekprior Dec 30, 2024

calebhearth Dec 30, 2024

Roguelazer Dec 31, 2024

derekprior Jan 1, 2025

calebhearth Jan 8, 2025

derekprior commented Jan 1, 2025

derekprior commented Jan 2, 2025 •

edited

Loading

calebhearth commented Jan 2, 2025 via email •

edited

Loading

renchap commented Jan 2, 2025

derekprior commented Jan 2, 2025

renchap commented Jan 2, 2025

derekprior commented Jan 2, 2025

mjankowski commented Jan 3, 2025

derekprior commented Jan 3, 2025

calebhearth left a comment

cpb commented Feb 1, 2025

Add side_by_side mode to update_materialized_view #430

Add side_by_side mode to update_materialized_view #430

Conversation

calebhearth commented Nov 19, 2024

mjankowski commented Dec 11, 2024

derekprior Dec 30, 2024

Choose a reason for hiding this comment

calebhearth Dec 30, 2024

Choose a reason for hiding this comment

Roguelazer Dec 31, 2024

Choose a reason for hiding this comment

derekprior Jan 1, 2025

Choose a reason for hiding this comment

calebhearth Jan 8, 2025

Choose a reason for hiding this comment

derekprior commented Jan 1, 2025

derekprior commented Jan 2, 2025 • edited Loading

calebhearth commented Jan 2, 2025 via email • edited Loading

renchap commented Jan 2, 2025

derekprior commented Jan 2, 2025

renchap commented Jan 2, 2025

derekprior commented Jan 2, 2025

mjankowski commented Jan 3, 2025

derekprior commented Jan 3, 2025

Choose a reason for hiding this comment

calebhearth left a comment

Choose a reason for hiding this comment

cpb commented Feb 1, 2025

derekprior commented Jan 2, 2025 •

edited

Loading

calebhearth commented Jan 2, 2025 via email •

edited

Loading