-
-
Notifications
You must be signed in to change notification settings - Fork 228
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add side_by_side mode to update_materialized_view #430
Conversation
2864288
to
8e9c5c4
Compare
Does usage/docs/README/generator need to list this option? I was able to figure out how/where to add the option from looking at the diff here, might not be obvious to the general public of materialized view operators? |
lib/scenic/adapters/postgres.rb
Outdated
rename_materialized_view(name, drop_name) | ||
rename_materialized_view(new_name, name) | ||
drop_materialized_view(drop_name) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we:
- rename existing view
- rename new view
- drop existing view
instead of:
- drop existing view
- rename new view
In the first case, the view is no longer available to the application anyway. Is it that drop
is potentially slow?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure. OP may know; I was just rearranging
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To guarantee that if you're outside a transaction, there's no potential outage window where the view is missing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we need to be inside a transaction. Really not wild about the state we'd leave someone in if this wasn't inside a transaction. But maybe I'm completely wrong. What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed - we should be in transactions when performing side_by_side migrations.
Postgres supports SAVEPOINT but it appears that we'd still need to be inside a transaction for it to be useful, so I'm thinking that doesn't help here.
8e9c5c4
to
ec6c73c
Compare
I've made a very sizable refactor to something I'm happier to maintain. Don't absolutely love the object names introduced here, but like the factoring. Commit message explains much of my thinking:
@calebhearth @Roguelazer @mjankowski - Any thoughts? I haven't tried this in a real app yet, so another to do is to make sure the output when doing this is sufficient and looks good. |
I've improved test coverage, added generator support. Still to do:
|
Don’t all migrations run in transactions
|
You can disable transactions for some migrations, and this is sometimes very useful if you have migrations that can lock a table for a long time. We do this in Mastodon for example. |
@renchap would you want to disable it with I view this as an atomic operation. Failing in the middle could leave you, for example, with nonsense index names on the original view that eventually find their way into your schema.rb, but not your coworkers. I don't want those support requests and I think the fact that we're now refreshing before we do anything that requires locking should alleviate the transaction concern. My options are:
|
I would prefer your first solution. It would fail with an explicit error message on development so the developer can understand what happens and adjust the migration. Starting a transaction silently can very easily create production outages in my experience. |
This is now ready. I updated the output when running side by side updates, made a change to require an open transaction, added an end-to-end test, and more. I also ran this in my own sample app in several different ways. @renchap @mjankowski any chance either of you could give it a whirl on Mastodon and let me know how it works out? Would love to get this merged with some more confidence. |
Updated my branch to use latest from this PR, added new migration with SIDE BY SIDE enabled, see output like:
(and tailing full log shows operations happening -- create new view with unique name, rename index, create index, drop old view, rename new view to old view name). Still have not run with any meaningful large table size or contemplated how changing code and changing view defs around same time would have to interact with deployments/etc - but thats slightly out of scope for the narrow feature here. |
Thanks! I updated the logging to make sure we're quoting table names consistently. But I think this is ready for a squash and merge. I'll probably get to that this evening. Then it's a matter of release. I kind of think the
Yeah, I was thinking about that yesterday. I don't know if this changes anything meaningfully around migration/deployment practices. But if you come up with anything, let me know. |
This adds a `side_by_side` kwarg to the `update_materialized_view` method, which builds the new view alongside the old one and then atomically swaps them to reduce downtime at the cost of increasing disk usage. It is plumbed through to migrations as a hash value for the `materialized` kwarg of `update_view`.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Copilot reviewed 7 out of 22 changed files in this pull request and generated no comments.
Files not reviewed (15)
- lib/generators/scenic/view/templates/db/migrate/update_view.erb: Evaluated as low risk
- spec/scenic/statements_spec.rb: Evaluated as low risk
- lib/scenic/adapters/postgres.rb: Evaluated as low risk
- lib/scenic/adapters/postgres/errors.rb: Evaluated as low risk
- lib/scenic/adapters/postgres/index_creation.rb: Evaluated as low risk
- lib/scenic/adapters/postgres/index_migration.rb: Evaluated as low risk
- lib/scenic/adapters/postgres/index_reapplication.rb: Evaluated as low risk
- lib/scenic/adapters/postgres/side_by_side.rb: Evaluated as low risk
- lib/scenic/adapters/postgres/temporary_name.rb: Evaluated as low risk
- spec/scenic/command_recorder_spec.rb: Evaluated as low risk
- spec/generators/scenic/view/view_generator_spec.rb: Evaluated as low risk
- lib/scenic/statements.rb: Evaluated as low risk
- spec/acceptance/user_manages_views_spec.rb: Evaluated as low risk
- lib/generators/scenic/materializable.rb: Evaluated as low risk
- spec/scenic/adapters/postgres_spec.rb: Evaluated as low risk
Comments suppressed due to low confidence (2)
spec/scenic/adapters/postgres/side_by_side_spec.rb:5
- The test coverage for the side_by_side mode should include edge cases such as failure in temporary view creation, index migration failure, and inability to drop the original view.
describe Postgres::SideBySide, :db, :silence do
spec/scenic/adapters/postgres/index_migration_spec.rb:18
- Add an assertion to check if the original indexes are renamed to avoid collisions.
expect(indexes_for_original.first.index_name).not_to eq "hi_greeting_idx"
Tip: Turn on automatic Copilot reviews for this repository to get quick feedback on every pull request. Learn more
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great overall. A couple of minor nits but I think this could ship as-is if you're feeling good about it.
The initial implementation of side_by_side materialized view creation worked but had a couple of issues that needed to be resolved and I wanted to refactor the code for better maintainability. * We had postgres-specific things in the `Scenic::Index` class, which is not part of the adapter API. The code was refactored to not rely on adding the schema name to this object. * Index migration is different from index reapplication, and it felt like we were reusing `IndexReapplication` just to get access to the `SAVEPOINT` functionality in that class. I extracted `IndexCreation` which is now used by `IndexReapplication` and our new class, `IndexMigration`. * Side-by-side logic was moved to a class of its own, `SideBySide`, for encapsulation purposes. * Instead of conditionally hashing the view name in the case where the temporary name overflows the postgres identifier limit, we now always hash the temporary object names. This just keeps the code simpler and observed behavior from the outside identical no matter identifier length. This behavior is tested in the new `TemporaryName` class. * Removed `rename_materialized_view` from the public API on the adapter, as I'd like to make sure that's something we want separate from this before we do something like that. * Added `connection` to the public adapter UI for ease of use from our helper objects. Documented as internal use only. * Require a transaction in order to use `side_by_side`. This prevents leaving the user in a weird state that would be difficult to recover from. * Added `--side-by-side` (and `--side_by_side`) support to the `scenic:view` generator. Also added `--no-data` as an alias for the existing `--no_data` while I was at it. * I added a number of tests for new and previously existing code throughout, including an acceptance level test for `side_by_side`. Our test coverage should be much improved. * Updated README with information on `side_by_side`. Here's a sample of the output from running a `side_by_side` update: ``` == 20250102191533 UpdateSearchesToVersion3: migrating ========================= -- update_view(:searches, {version: 3, revert_to_version: 2, materialized: {side_by_side: true}}) -> temporary materialized view _scenic_sbs_8a03f467c615b126f59617cc510d2abd41296834 has been created -> indexes on 'searches' have been renamed to avoid collisions -> index 'index_searches_on_content' on '_scenic_sbs_8a03f467c615b126f59617cc510d2abd41296834' has been created -> index 'index_searches_on_user_id' on '_scenic_sbs_8a03f467c615b126f59617cc510d2abd41296834' has been created -> materialized view searches has been dropped -> temporary materialized view _scenic_sbs_8a03f467c615b126f59617cc510d2abd41296834 has been renamed to searches -> 0.0299s == 20250102191533 UpdateSearchesToVersion3: migrated (0.0300s) ================ ```
0a58be2
to
a2a9be5
Compare
FWIW I trialed this in a staging environment and on a new view serving 2 req/s perceived no downtime for rematerializations that took 30 sec or 45 minutes. |
This adds a
side_by_side
kwarg to theupdate_materialized_view
method, which builds the new view alongside the old one and then atomically swaps them to reduce downtime at the cost of increasing disk usage. It is plumbed through to migrations as a hash value for thematerialized
kwarg ofupdate_view
.Fixes #387