Add `omdb migrations list` command #6160

hawkw · 2024-07-25T19:42:45Z

This commit adds a new OMDB command for listing the contents of the
migration table. The query can be filtered by migration state, and can
also show only the migrations of one or more instances.

leftwo · 2024-07-29T17:28:11Z

Do these changes require anything else in Omicron to work? I tried compiling and running it and it gave me this error"

EVT22200005 # /tmp/omdb db migrations list
note: database URL not specified.  Will search DNS.
note: (override with --db-url or OMDB_DB_URL)
note: using DNS server for subnet fd00:1122:3344::/48
note: (if this is not right, use --dns-server to specify an alternate DNS server)
note: using database URL postgresql://root@[fd00:1122:3344:101::4]:32221,[fd00:1122:3344:101::6]:32221,[fd00:1122:3344:101::3]:32221,[fd00:1122:3344:101::5]:32221,[fd00:1122:3344:101::7]:32221/omicron?sslmode=disable
WARN: found schema version 84.0.0, expected 83.0.0
It's possible the database is running a version that's different from what this
tool understands.  This may result in errors or incorrect output.
Error: listing migrations

Caused by:
    query `SELECT "migration"."id", "migration"."instance_id", "migration"."time_created", "migration"."time_deleted", "migration"."source_state", "migration"."source_propolis_id", "migration"."source_gen", "migration"."time_source_updated", "migration"."target_state", "migration"."target_propolis_id", "migration"."target_gen", "migration"."time_target_updated" FROM "migration" WHERE ("migration"."time_deleted" IS NULL) ORDER BY "migration"."time_created" LIMIT $1` contains a full table/index scan which is explicitly disallowed

hawkw · 2024-07-29T17:37:45Z

Do these changes require anything else in Omicron to work? I tried compiling and running it and it gave me this error"

EVT22200005 # /tmp/omdb db migrations list
note: database URL not specified.  Will search DNS.
note: (override with --db-url or OMDB_DB_URL)
note: using DNS server for subnet fd00:1122:3344::/48
note: (if this is not right, use --dns-server to specify an alternate DNS server)
note: using database URL postgresql://root@[fd00:1122:3344:101::4]:32221,[fd00:1122:3344:101::6]:32221,[fd00:1122:3344:101::3]:32221,[fd00:1122:3344:101::5]:32221,[fd00:1122:3344:101::7]:32221/omicron?sslmode=disable
WARN: found schema version 84.0.0, expected 83.0.0
It's possible the database is running a version that's different from what this
tool understands.  This may result in errors or incorrect output.
Error: listing migrations

Caused by:
    query `SELECT "migration"."id", "migration"."instance_id", "migration"."time_created", "migration"."time_deleted", "migration"."source_state", "migration"."source_propolis_id", "migration"."source_gen", "migration"."time_source_updated", "migration"."target_state", "migration"."target_propolis_id", "migration"."target_gen", "migration"."time_target_updated" FROM "migration" WHERE ("migration"."time_deleted" IS NULL) ORDER BY "migration"."time_created" LIMIT $1` contains a full table/index scan which is explicitly disallowed

Ugh...I thought it worked for me locally a couple days ago, but I may have only tested the variants of the command with instance ID and migration ID arguments. Looks like the bare migrations list command needs to allow the full-table scan, which makes sense. I'll fix that.

smklein · 2024-07-29T17:45:11Z

dev-tools/omdb/src/bin/omdb/db.rs

+    let migrations = query
+        .limit(i64::from(u32::from(fetch_opts.fetch_limit)))
+        .order_by(dsl::time_created)
+        .select(Migration::as_select())
+        .load_async(&*datastore.pool_connection_for_tests().await?)
+        .await
+        .context("listing migrations")?;


I think the full table scan is because we have no index on "time_created", so this query will need to look up all migrations to return any migrations, even less than fetch_limit.

Two ways of fixing this:

Add the index. Pros: works, no full scan. Cons: adds to cockroachdb footprint, isn't free to maintain, a little odd it's only for omdb.

Allow the full table scan (see: ALLOW_FULL_TABLE_SCAN_SQL in this file for examples). Pros: Works, minimally invasive. Cons: If we have a rack with a gargantuan number of migrations, this could cause OOM-ish issues.

AH, yeah, that's probably the issue --- I added the order-by more recently than the last time I tested this command. I was assuming the full-table scan was necessary because we were trying to list all migrations.

A third option would just be to remove the order-by clause and have omdb sort the result set in memory. But, because of the LIMIT clause, that has somewhat different semantics --- we will display the migrations in chronological order, but we won't necessarily have selected the most recent records...which seems pretty bad. So, one of your suggestions is probably better.

Naively, it kinda seems to me like the index is the better option...it seems like we might want to be able to order migrations by time anyway, such as if (for example) we eventually grow a UI for displaying migration history for an instance.

And, I'd rather avoid OOM issues in "real life" racks! I notice, however, that a bunch of existing OMDB commands use the approach of allowing full-table scans instead, though, so maybe there's some reason that we would prefer not to add an index just for OMDB use? Some of those commands look like they're actually just loading the entire table, though?

And, I'd rather avoid OOM issues in "real life" racks! I notice, however, that a bunch of existing OMDB commands use the approach of allowing full-table scans instead, though, so maybe there's some reason that we would prefer not to add an index just for OMDB use? Some of those commands look like they're actually just loading the entire table, though?

It's possible other examples did not do the "right" thing? I say that as someone who may have added one of
those full table scans :)
I also believe I may have added an index just for OMDB.

Hmm, just adding the following index still results in a full-table scan:

CREATE INDEX IF NOT EXISTS migrations_by_time_created ON omicron.public.migration ( time_created );

I wonder if it's necessary for the migration to also include time_deleted, or something?

Hmm, adding time-deleted to the index also doesn't seem to help:

CREATE INDEX IF NOT EXISTS migrations_by_time_created ON omicron.public.migration ( time_created, time_deleted );

I guess I'll just have it allow the full-table scan for now, unless someone else has any idea what I've got wrong with the index?

I don't believe the issue is with the index, but the fact that the query doesn't supply any filtering clauses on the indexed columns, e.g., time_created. There's nothing it can use to restrict the search here, and so this has to be a full-table scan. I believe you can add a (possibly-dummy) time_created to start from, and then the planner should be aware that it can use the right index.

For example:

let migrations = query .limit(i64::from(u32::from(fetch_opts.fetch_limit))) .order_by(dsl::time_created) .filter(dsl::time_created.gt(chrono::DateTime::UNIX_EPOCH)) // Any time, really .select(Migration::as_select()) .load_async(&*datastore.pool_connection_for_tests().await?) .await .context("listing migrations")?;

For example, I ran omicron-dev db-run and then connected to the database. I added the same index you're using here, but a query with no filtering clauses is a full table scan, while one with a filter can use the index:

root@[::1]:32221/omicron> create index if not exists migration_by_creation on migration (time_created); CREATE INDEX Time: 111ms total (execution 111ms / network 0ms) root@[::1]:32221/omicron> select time_created from migration; time_created ---------------- (0 rows) Time: 1ms total (execution 1ms / network 0ms) root@[::1]:32221/omicron> explain select time_created from migration; info ------------------------------------------------------------------------------- distribution: local vectorized: true • scan estimated row count: 1 (100% of the table; stats collected 5 seconds ago) table: migration@migration_by_creation spans: FULL SCAN (7 rows) Time: 1ms total (execution 0ms / network 0ms) root@[::1]:32221/omicron> explain select time_created from migration where time_created > '1970-01-01 1:00:00 UTC'; info -------------------------------------------------------------------------------- distribution: local vectorized: true • scan estimated row count: 1 (100% of the table; stats collected 35 seconds ago) table: migration@migration_by_creation spans: [/'1970-01-01 01:00:00.000001+00:00' - ] (7 rows) Time: 1ms total (execution 1ms / network 0ms)

here's the EXPLAIN output:

distribution: local vectorized: true • sort │ estimated row count: 1 │ order: +time_created │ └── • limit │ estimated row count: 1 │ count: 500 │ └── • sort │ estimated row count: 1 │ order: +time_created │ └── • filter │ estimated row count: 1 │ filter: time_deleted IS NULL │ └── • scan estimated row count: 1 (100% of the table; stats collected 7 minutes ago) table: migration@migration_pkey spans: FULL SCAN index recommendations: 1 1. type: index creation SQL command: CREATE INDEX ON migration (time_deleted) STORING (instance_id, time_created, source_state, source_propolis_id, source_gen, time_source_updated, target_state, target_propolis_id, target_gen, time_target_updated);

and, when running with the --include-deleted flag, it also says:

distribution: local vectorized: true • sort │ estimated row count: 1 │ order: +time_created │ └── • limit │ estimated row count: 1 │ count: 500 │ └── • sort │ estimated row count: 1 │ order: +time_created │ └── • scan estimated row count: 1 (100% of the table; stats collected 7 minutes ago) table: migration@migration_pkey spans: FULL SCAN index recommendations: 1 1. type: index creation SQL command: CREATE INDEX ON migration (time_created) STORING (instance_id, time_deleted, source_state, source_propolis_id, source_gen, time_source_updated, target_state, target_propolis_id, target_gen, time_target_updated);

yup, @bnaecker's suggestion of adding a no-op filter seems to fix it. Thanks Ben!

This commit adds a new OMDB command for listing the contents of the migration table. The query can be filtered by migration state, and can also show only the migrations of one or more instances.

bnaecker

LGTM! Two small questions, both non-blocking. Thanks!

bnaecker · 2024-07-29T21:43:28Z

dev-tools/omdb/src/bin/omdb/db.rs

+    let migrations = query
+        .limit(i64::from(u32::from(fetch_opts.fetch_limit)))
+        .order_by(dsl::time_created)
+        // This is just to prove to CRDB that it can use the
+        // migrations-by-time-created index, it doesn't actually do anything.
+        .filter(dsl::time_created.gt(chrono::DateTime::UNIX_EPOCH))
+        .select(Migration::as_select())
+        .load_async(&*datastore.pool_connection_for_tests().await?)
+        .await
+        .context("listing migrations")?;


This all seems good as far as using the new index. Does it make sense to let the caller / CLI provide timestamps themselves? E.g., they could look at migrations in the last hour, if they wanted, or the code could use the UNIX epoch if nothing was provided.

Part of why I ask is we have given control of the limit in the CLI, but not the starting point.

I'd like to add that as well, but I was thinking about doing it in a subsequent
PR. I could do it now, though!

Oh, if you want to leave it for a follow-up, that's fine too! I think it'd be nice to have both in the limit, but not crucial they land in the same commit. Up to you :)

bnaecker · 2024-07-29T21:44:48Z

nexus/db-model/src/schema_versions.rs

+        KnownVersion::new(85, "region-read-only"),
+        KnownVersion::new(84, "add-migrations-by-time-created-index"),


Did you intend to add the add-migrations-by-time-create-index migration after the region-read-only? I don't think it should matter as long as they're orthogonal.

whoops, i messed up the rebase --- lemme fix that!

leftwo

I'm all good with the code here.
I would like to see some sample output from the command, if possible.

hawkw · 2024-08-03T18:59:31Z

@leftwo:

I'm all good with the code here. I would like to see some sample output from the command, if possible.

For some sample output, see #5749 (comment).

Having actually tested out the command in real life, it turns out the output is...really wide. I'm thinking about changing it to only show the migration ID column in verbose mode, and also potentially hiding the instance ID if the command is run with the flag to select only a single instance.

leftwo · 2024-08-05T16:49:07Z

@leftwo:

I'm all good with the code here. I would like to see some sample output from the command, if possible.

For some sample output, see #5749 (comment).

Having actually tested out the command in real life, it turns out the output is...really wide. I'm thinking about changing it to only show the migration ID column in verbose mode, and also potentially hiding the instance ID if the command is run with the flag to select only a single instance.

Cool, cool. The output is really a matter of taste, and for the one who will be using the command really.
Thanks for showing it.

hawkw · 2024-08-05T16:52:19Z

@leftwo Yeah, for sure. My personal preference is for the default output to fit in a reasonably narrow terminal, because I find the tabular output much harder to interpret if it's been wrapped. So, I think I'm gonna make a few bits verbose mode only.

hawkw · 2024-08-05T17:11:28Z

Okay, I've tweaked the output a bit more (cd33fc9), it should now look more like this:

root@oxz_switch1:~# ./omdb db migration ls --instance-id 604763da-4af9-45b9-a592-aaeb8fee06b4
note: database URL not specified.  Will search DNS.
note: (override with --db-url or OMDB_DB_URL)
note: using DNS server for subnet fd00:1122:3344::/48
note: (if this is not right, use --dns-server to specify an alternate DNS server)
note: using database URL postgresql://root@[fd00:1122:3344:103::3]:32221,[fd00:1122:3344:101::3]:32221,[fd00:1122:3344:102::3]:32221,[fd00:1122:3344:104::4]:32221,[fd00:1122:3344:104::3]:32221/omicron?sslmode=disable
WARN: found schema version 84.0.0, expected 85.0.0
It's possible the database is running a version that's different from what this
tool understands.  This may result in errors or incorrect output.
CREATED                        SRC_STATE TGT_STATE SRC_VMM                              TGT_VMM
2024-08-03 18:45:29.310857 UTC completed completed ace760d0-3698-4856-bda4-cbf433fcfb4e 704251e4-04dc-4e6e-8a7f-02ee46571543
2024-08-03 18:46:00.927484 UTC completed completed 704251e4-04dc-4e6e-8a7f-02ee46571543 14009464-b2aa-4767-babb-9728af1959b9
2024-08-03 18:46:32.039600 UTC completed completed 14009464-b2aa-4767-babb-9728af1959b9 5580dd81-afb8-4dd6-922a-f776c773f0a9
2024-08-03 18:47:03.114914 UTC completed completed 5580dd81-afb8-4dd6-922a-f776c773f0a9 511a229c-84c3-486d-b080-b3c0d60895cd
2024-08-03 18:47:34.950071 UTC completed completed 511a229c-84c3-486d-b080-b3c0d60895cd b468d316-aaae-45d1-a0a0-7a6cf2cd77f4
2024-08-03 18:48:07.060381 UTC completed completed b468d316-aaae-45d1-a0a0-7a6cf2cd77f4 03736b34-7453-4802-8eb5-03030184bd91
2024-08-03 18:48:38.152076 UTC completed completed 03736b34-7453-4802-8eb5-03030184bd91 cff0be96-c57d-4d21-9215-ebc304a393e9
2024-08-03 18:49:09.691518 UTC completed completed cff0be96-c57d-4d21-9215-ebc304a393e9 61169459-b4ab-4db2-b7d4-844b5318c4ac
2024-08-03 18:49:40.780488 UTC completed completed 61169459-b4ab-4db2-b7d4-844b5318c4ac 61e8015e-dbe3-411a-b88c-fc55a35e4fa6
2024-08-03 18:50:11.792970 UTC completed completed 61e8015e-dbe3-411a-b88c-fc55a35e4fa6 879cb984-51a5-4279-947b-6a9dd7c6925d
2024-08-03 18:50:42.886472 UTC completed completed 879cb984-51a5-4279-947b-6a9dd7c6925d b8239c3f-272d-4207-853d-8d07f268be59
2024-08-03 18:51:14.020375 UTC completed completed b8239c3f-272d-4207-853d-8d07f268be59 5e57b727-7343-493b-a954-3dde0aa98a7d
2024-08-03 18:51:45.087713 UTC completed completed 5e57b727-7343-493b-a954-3dde0aa98a7d df2d0980-6ad1-4ce3-b0f8-f045cc0a06c1
2024-08-03 18:52:16.085818 UTC completed completed df2d0980-6ad1-4ce3-b0f8-f045cc0a06c1 9878f88c-3c43-4786-905e-f8ea9e9d5d6f
2024-08-03 18:52:47.220469 UTC completed completed 9878f88c-3c43-4786-905e-f8ea9e9d5d6f ba12da12-13da-40c6-9904-ffc803f30485
2024-08-03 18:53:18.278158 UTC completed completed ba12da12-13da-40c6-9904-ffc803f30485 ae8ec94f-b02b-4b1d-beea-a6f11ff64972
root@oxz_switch1:~#

The instance ID is included when migrations are not filtered by instance, but I figured it was nice to omit it when we're only looking at a single instance.

hawkw · 2024-08-05T17:19:35Z

ugh, i broke it. please hold...

hawkw requested review from leftwo and gjcolombo July 29, 2024 16:50

smklein reviewed Jul 29, 2024

View reviewed changes

hawkw added 5 commits July 29, 2024 14:00

Add omdb migrations list command

441c814

This commit adds a new OMDB command for listing the contents of the migration table. The query can be filtered by migration state, and can also show only the migrations of one or more instances.

s/omcrion/omicron

b00232e

update error output

968318e

just allow the full-table scan

cc88786

use an index instead

161874d

hawkw force-pushed the eliza/omdb-migration branch from f811e2c to 161874d Compare July 29, 2024 21:33

hawkw requested a review from bnaecker July 29, 2024 21:33

bnaecker approved these changes Jul 29, 2024

View reviewed changes

hawkw added 2 commits July 29, 2024 15:10

apply migrations in the correct order

881061e

every time i use the github merge UI, i end up regretting it

65aa9ea

leftwo approved these changes Jul 31, 2024

View reviewed changes

mess with output format a bit

cd33fc9

reduce repetition of tabled structs

d719e83

hawkw enabled auto-merge (squash) August 5, 2024 18:27

hawkw merged commit bdd1a0d into main Aug 5, 2024
22 checks passed

hawkw deleted the eliza/omdb-migration branch August 5, 2024 19:02

hawkw mentioned this pull request Aug 8, 2024

want time range options for omdb db migration ls #6269

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add `omdb migrations list` command #6160

Add `omdb migrations list` command #6160

hawkw commented Jul 25, 2024

leftwo commented Jul 29, 2024

hawkw commented Jul 29, 2024 •

edited

Loading

smklein Jul 29, 2024

hawkw Jul 29, 2024

leftwo Jul 29, 2024

hawkw Jul 29, 2024

hawkw Jul 29, 2024

hawkw Jul 29, 2024

bnaecker Jul 29, 2024

bnaecker Jul 29, 2024 •

edited

Loading

hawkw Jul 29, 2024

hawkw Jul 29, 2024

bnaecker left a comment

bnaecker Jul 29, 2024

hawkw Jul 29, 2024

bnaecker Jul 29, 2024

bnaecker Jul 29, 2024

hawkw Jul 29, 2024

leftwo left a comment

hawkw commented Aug 3, 2024

leftwo commented Aug 5, 2024

hawkw commented Aug 5, 2024

hawkw commented Aug 5, 2024

hawkw commented Aug 5, 2024

		KnownVersion::new(85, "region-read-only"),
		KnownVersion::new(84, "add-migrations-by-time-created-index"),

Add omdb migrations list command #6160

Add omdb migrations list command #6160

Conversation

hawkw commented Jul 25, 2024

leftwo commented Jul 29, 2024

hawkw commented Jul 29, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bnaecker Jul 29, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bnaecker left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

leftwo left a comment

Choose a reason for hiding this comment

hawkw commented Aug 3, 2024

leftwo commented Aug 5, 2024

hawkw commented Aug 5, 2024

hawkw commented Aug 5, 2024

hawkw commented Aug 5, 2024

Add `omdb migrations list` command #6160

Add `omdb migrations list` command #6160

hawkw commented Jul 29, 2024 •

edited

Loading

bnaecker Jul 29, 2024 •

edited

Loading