Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add omdb migrations list command #6160

Merged
merged 9 commits into from
Aug 5, 2024
Merged

Add omdb migrations list command #6160

merged 9 commits into from
Aug 5, 2024

Conversation

hawkw
Copy link
Member

@hawkw hawkw commented Jul 25, 2024

This commit adds a new OMDB command for listing the contents of the
migration table. The query can be filtered by migration state, and can
also show only the migrations of one or more instances.

@hawkw hawkw requested review from leftwo and gjcolombo July 29, 2024 16:50
@leftwo
Copy link
Contributor

leftwo commented Jul 29, 2024

Do these changes require anything else in Omicron to work? I tried compiling and running it and it gave me this error"

EVT22200005 # /tmp/omdb db migrations list
note: database URL not specified.  Will search DNS.
note: (override with --db-url or OMDB_DB_URL)
note: using DNS server for subnet fd00:1122:3344::/48
note: (if this is not right, use --dns-server to specify an alternate DNS server)
note: using database URL postgresql://root@[fd00:1122:3344:101::4]:32221,[fd00:1122:3344:101::6]:32221,[fd00:1122:3344:101::3]:32221,[fd00:1122:3344:101::5]:32221,[fd00:1122:3344:101::7]:32221/omicron?sslmode=disable
WARN: found schema version 84.0.0, expected 83.0.0
It's possible the database is running a version that's different from what this
tool understands.  This may result in errors or incorrect output.
Error: listing migrations

Caused by:
    query `SELECT "migration"."id", "migration"."instance_id", "migration"."time_created", "migration"."time_deleted", "migration"."source_state", "migration"."source_propolis_id", "migration"."source_gen", "migration"."time_source_updated", "migration"."target_state", "migration"."target_propolis_id", "migration"."target_gen", "migration"."time_target_updated" FROM "migration" WHERE ("migration"."time_deleted" IS NULL) ORDER BY "migration"."time_created" LIMIT $1` contains a full table/index scan which is explicitly disallowed

@hawkw
Copy link
Member Author

hawkw commented Jul 29, 2024

Do these changes require anything else in Omicron to work? I tried compiling and running it and it gave me this error"

EVT22200005 # /tmp/omdb db migrations list
note: database URL not specified.  Will search DNS.
note: (override with --db-url or OMDB_DB_URL)
note: using DNS server for subnet fd00:1122:3344::/48
note: (if this is not right, use --dns-server to specify an alternate DNS server)
note: using database URL postgresql://root@[fd00:1122:3344:101::4]:32221,[fd00:1122:3344:101::6]:32221,[fd00:1122:3344:101::3]:32221,[fd00:1122:3344:101::5]:32221,[fd00:1122:3344:101::7]:32221/omicron?sslmode=disable
WARN: found schema version 84.0.0, expected 83.0.0
It's possible the database is running a version that's different from what this
tool understands.  This may result in errors or incorrect output.
Error: listing migrations

Caused by:
    query `SELECT "migration"."id", "migration"."instance_id", "migration"."time_created", "migration"."time_deleted", "migration"."source_state", "migration"."source_propolis_id", "migration"."source_gen", "migration"."time_source_updated", "migration"."target_state", "migration"."target_propolis_id", "migration"."target_gen", "migration"."time_target_updated" FROM "migration" WHERE ("migration"."time_deleted" IS NULL) ORDER BY "migration"."time_created" LIMIT $1` contains a full table/index scan which is explicitly disallowed

Ugh...I thought it worked for me locally a couple days ago, but I may have only tested the variants of the command with instance ID and migration ID arguments. Looks like the bare migrations list command needs to allow the full-table scan, which makes sense. I'll fix that.

Comment on lines 3971 to 4259
let migrations = query
.limit(i64::from(u32::from(fetch_opts.fetch_limit)))
.order_by(dsl::time_created)
.select(Migration::as_select())
.load_async(&*datastore.pool_connection_for_tests().await?)
.await
.context("listing migrations")?;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the full table scan is because we have no index on "time_created", so this query will need to look up all migrations to return any migrations, even less than fetch_limit.

Two ways of fixing this:

  1. Add the index. Pros: works, no full scan. Cons: adds to cockroachdb footprint, isn't free to maintain, a little odd it's only for omdb.
  2. Allow the full table scan (see: ALLOW_FULL_TABLE_SCAN_SQL in this file for examples). Pros: Works, minimally invasive. Cons: If we have a rack with a gargantuan number of migrations, this could cause OOM-ish issues.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AH, yeah, that's probably the issue --- I added the order-by more recently than the last time I tested this command. I was assuming the full-table scan was necessary because we were trying to list all migrations.

A third option would just be to remove the order-by clause and have omdb sort the result set in memory. But, because of the LIMIT clause, that has somewhat different semantics --- we will display the migrations in chronological order, but we won't necessarily have selected the most recent records...which seems pretty bad. So, one of your suggestions is probably better.

Naively, it kinda seems to me like the index is the better option...it seems like we might want to be able to order migrations by time anyway, such as if (for example) we eventually grow a UI for displaying migration history for an instance.

And, I'd rather avoid OOM issues in "real life" racks! I notice, however, that a bunch of existing OMDB commands use the approach of allowing full-table scans instead, though, so maybe there's some reason that we would prefer not to add an index just for OMDB use? Some of those commands look like they're actually just loading the entire table, though?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And, I'd rather avoid OOM issues in "real life" racks! I notice, however, that a bunch of existing OMDB commands use the approach of allowing full-table scans instead, though, so maybe there's some reason that we would prefer not to add an index just for OMDB use? Some of those commands look like they're actually just loading the entire table, though?

It's possible other examples did not do the "right" thing? I say that as someone who may have added one of
those full table scans :)
I also believe I may have added an index just for OMDB.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, just adding the following index still results in a full-table scan:

CREATE INDEX IF NOT EXISTS migrations_by_time_created ON omicron.public.migration (
    time_created
);

I wonder if it's necessary for the migration to also include time_deleted, or something?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, adding time-deleted to the index also doesn't seem to help:

CREATE INDEX IF NOT EXISTS migrations_by_time_created ON omicron.public.migration (
    time_created, time_deleted
);

I guess I'll just have it allow the full-table scan for now, unless someone else has any idea what I've got wrong with the index?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ooh

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't believe the issue is with the index, but the fact that the query doesn't supply any filtering clauses on the indexed columns, e.g., time_created. There's nothing it can use to restrict the search here, and so this has to be a full-table scan. I believe you can add a (possibly-dummy) time_created to start from, and then the planner should be aware that it can use the right index.

For example:

    let migrations = query
        .limit(i64::from(u32::from(fetch_opts.fetch_limit)))
        .order_by(dsl::time_created)
        .filter(dsl::time_created.gt(chrono::DateTime::UNIX_EPOCH)) // Any time, really
        .select(Migration::as_select())
        .load_async(&*datastore.pool_connection_for_tests().await?)
        .await
        .context("listing migrations")?;

Copy link
Collaborator

@bnaecker bnaecker Jul 29, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For example, I ran omicron-dev db-run and then connected to the database. I added the same index you're using here, but a query with no filtering clauses is a full table scan, while one with a filter can use the index:

root@[::1]:32221/omicron> create index if not exists migration_by_creation on migration (time_created);
CREATE INDEX


Time: 111ms total (execution 111ms / network 0ms)

root@[::1]:32221/omicron> select time_created from migration;
  time_created
----------------
(0 rows)


Time: 1ms total (execution 1ms / network 0ms)

root@[::1]:32221/omicron> explain select time_created from migration;
                                     info
-------------------------------------------------------------------------------
  distribution: local
  vectorized: true

  • scan
    estimated row count: 1 (100% of the table; stats collected 5 seconds ago)
    table: migration@migration_by_creation
    spans: FULL SCAN
(7 rows)


Time: 1ms total (execution 0ms / network 0ms)

root@[::1]:32221/omicron> explain select time_created from migration where time_created > '1970-01-01 1:00:00 UTC';
                                      info
--------------------------------------------------------------------------------
  distribution: local
  vectorized: true

  • scan
    estimated row count: 1 (100% of the table; stats collected 35 seconds ago)
    table: migration@migration_by_creation
    spans: [/'1970-01-01 01:00:00.000001+00:00' - ]
(7 rows)


Time: 1ms total (execution 1ms / network 0ms)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

here's the EXPLAIN output:

distribution: local
vectorized: true

• sort
│ estimated row count: 1
│ order: +time_created
│
└── • limit
    │ estimated row count: 1
    │ count: 500
    │
    └── • sort
        │ estimated row count: 1
        │ order: +time_created
        │
        └── • filter
            │ estimated row count: 1
            │ filter: time_deleted IS NULL
            │
            └── • scan
                  estimated row count: 1 (100% of the table; stats collected 7 minutes ago)
                  table: migration@migration_pkey
                  spans: FULL SCAN

index recommendations: 1
1. type: index creation
   SQL command: CREATE INDEX ON migration (time_deleted) STORING (instance_id, time_created, source_state, source_propolis_id, source_gen, time_source_updated, target_state, target_propolis_id, target_gen, time_target_updated);

and, when running with the --include-deleted flag, it also says:

distribution: local
vectorized: true

• sort
│ estimated row count: 1
│ order: +time_created
│
└── • limit
    │ estimated row count: 1
    │ count: 500
    │
    └── • sort
        │ estimated row count: 1
        │ order: +time_created
        │
        └── • scan
              estimated row count: 1 (100% of the table; stats collected 7 minutes ago)
              table: migration@migration_pkey
              spans: FULL SCAN

index recommendations: 1
1. type: index creation
   SQL command: CREATE INDEX ON migration (time_created) STORING (instance_id, time_deleted, source_state, source_propolis_id, source_gen, time_source_updated, target_state, target_propolis_id, target_gen, time_target_updated);

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yup, @bnaecker's suggestion of adding a no-op filter seems to fix it. Thanks Ben!

This commit adds a new OMDB command for listing the contents of the
migration table. The query can be filtered by migration state, and can
also show only the migrations of one or more instances.
Copy link
Collaborator

@bnaecker bnaecker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Two small questions, both non-blocking. Thanks!

Comment on lines +4250 to +4259
let migrations = query
.limit(i64::from(u32::from(fetch_opts.fetch_limit)))
.order_by(dsl::time_created)
// This is just to prove to CRDB that it can use the
// migrations-by-time-created index, it doesn't actually do anything.
.filter(dsl::time_created.gt(chrono::DateTime::UNIX_EPOCH))
.select(Migration::as_select())
.load_async(&*datastore.pool_connection_for_tests().await?)
.await
.context("listing migrations")?;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This all seems good as far as using the new index. Does it make sense to let the caller / CLI provide timestamps themselves? E.g., they could look at migrations in the last hour, if they wanted, or the code could use the UNIX epoch if nothing was provided.

Part of why I ask is we have given control of the limit in the CLI, but not the starting point.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like to add that as well, but I was thinking about doing it in a subsequent
PR. I could do it now, though!

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, if you want to leave it for a follow-up, that's fine too! I think it'd be nice to have both in the limit, but not crucial they land in the same commit. Up to you :)

Comment on lines 32 to 33
KnownVersion::new(85, "region-read-only"),
KnownVersion::new(84, "add-migrations-by-time-created-index"),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you intend to add the add-migrations-by-time-create-index migration after the region-read-only? I don't think it should matter as long as they're orthogonal.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

whoops, i messed up the rebase --- lemme fix that!

Copy link
Contributor

@leftwo leftwo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm all good with the code here.
I would like to see some sample output from the command, if possible.

@hawkw
Copy link
Member Author

hawkw commented Aug 3, 2024

@leftwo:

I'm all good with the code here. I would like to see some sample output from the command, if possible.

For some sample output, see #5749 (comment).

Having actually tested out the command in real life, it turns out the output is...really wide. I'm thinking about changing it to only show the migration ID column in verbose mode, and also potentially hiding the instance ID if the command is run with the flag to select only a single instance.

@leftwo
Copy link
Contributor

leftwo commented Aug 5, 2024

@leftwo:

I'm all good with the code here. I would like to see some sample output from the command, if possible.

For some sample output, see #5749 (comment).

Having actually tested out the command in real life, it turns out the output is...really wide. I'm thinking about changing it to only show the migration ID column in verbose mode, and also potentially hiding the instance ID if the command is run with the flag to select only a single instance.

Cool, cool. The output is really a matter of taste, and for the one who will be using the command really.
Thanks for showing it.

@hawkw
Copy link
Member Author

hawkw commented Aug 5, 2024

@leftwo Yeah, for sure. My personal preference is for the default output to fit in a reasonably narrow terminal, because I find the tabular output much harder to interpret if it's been wrapped. So, I think I'm gonna make a few bits verbose mode only.

@hawkw
Copy link
Member Author

hawkw commented Aug 5, 2024

Okay, I've tweaked the output a bit more (cd33fc9), it should now look more like this:

root@oxz_switch1:~# ./omdb db migration ls --instance-id 604763da-4af9-45b9-a592-aaeb8fee06b4
note: database URL not specified.  Will search DNS.
note: (override with --db-url or OMDB_DB_URL)
note: using DNS server for subnet fd00:1122:3344::/48
note: (if this is not right, use --dns-server to specify an alternate DNS server)
note: using database URL postgresql://root@[fd00:1122:3344:103::3]:32221,[fd00:1122:3344:101::3]:32221,[fd00:1122:3344:102::3]:32221,[fd00:1122:3344:104::4]:32221,[fd00:1122:3344:104::3]:32221/omicron?sslmode=disable
WARN: found schema version 84.0.0, expected 85.0.0
It's possible the database is running a version that's different from what this
tool understands.  This may result in errors or incorrect output.
CREATED                        SRC_STATE TGT_STATE SRC_VMM                              TGT_VMM
2024-08-03 18:45:29.310857 UTC completed completed ace760d0-3698-4856-bda4-cbf433fcfb4e 704251e4-04dc-4e6e-8a7f-02ee46571543
2024-08-03 18:46:00.927484 UTC completed completed 704251e4-04dc-4e6e-8a7f-02ee46571543 14009464-b2aa-4767-babb-9728af1959b9
2024-08-03 18:46:32.039600 UTC completed completed 14009464-b2aa-4767-babb-9728af1959b9 5580dd81-afb8-4dd6-922a-f776c773f0a9
2024-08-03 18:47:03.114914 UTC completed completed 5580dd81-afb8-4dd6-922a-f776c773f0a9 511a229c-84c3-486d-b080-b3c0d60895cd
2024-08-03 18:47:34.950071 UTC completed completed 511a229c-84c3-486d-b080-b3c0d60895cd b468d316-aaae-45d1-a0a0-7a6cf2cd77f4
2024-08-03 18:48:07.060381 UTC completed completed b468d316-aaae-45d1-a0a0-7a6cf2cd77f4 03736b34-7453-4802-8eb5-03030184bd91
2024-08-03 18:48:38.152076 UTC completed completed 03736b34-7453-4802-8eb5-03030184bd91 cff0be96-c57d-4d21-9215-ebc304a393e9
2024-08-03 18:49:09.691518 UTC completed completed cff0be96-c57d-4d21-9215-ebc304a393e9 61169459-b4ab-4db2-b7d4-844b5318c4ac
2024-08-03 18:49:40.780488 UTC completed completed 61169459-b4ab-4db2-b7d4-844b5318c4ac 61e8015e-dbe3-411a-b88c-fc55a35e4fa6
2024-08-03 18:50:11.792970 UTC completed completed 61e8015e-dbe3-411a-b88c-fc55a35e4fa6 879cb984-51a5-4279-947b-6a9dd7c6925d
2024-08-03 18:50:42.886472 UTC completed completed 879cb984-51a5-4279-947b-6a9dd7c6925d b8239c3f-272d-4207-853d-8d07f268be59
2024-08-03 18:51:14.020375 UTC completed completed b8239c3f-272d-4207-853d-8d07f268be59 5e57b727-7343-493b-a954-3dde0aa98a7d
2024-08-03 18:51:45.087713 UTC completed completed 5e57b727-7343-493b-a954-3dde0aa98a7d df2d0980-6ad1-4ce3-b0f8-f045cc0a06c1
2024-08-03 18:52:16.085818 UTC completed completed df2d0980-6ad1-4ce3-b0f8-f045cc0a06c1 9878f88c-3c43-4786-905e-f8ea9e9d5d6f
2024-08-03 18:52:47.220469 UTC completed completed 9878f88c-3c43-4786-905e-f8ea9e9d5d6f ba12da12-13da-40c6-9904-ffc803f30485
2024-08-03 18:53:18.278158 UTC completed completed ba12da12-13da-40c6-9904-ffc803f30485 ae8ec94f-b02b-4b1d-beea-a6f11ff64972
root@oxz_switch1:~#

The instance ID is included when migrations are not filtered by instance, but I figured it was nice to omit it when we're only looking at a single instance.

@hawkw
Copy link
Member Author

hawkw commented Aug 5, 2024

ugh, i broke it. please hold...

@hawkw hawkw enabled auto-merge (squash) August 5, 2024 18:27
@hawkw hawkw merged commit bdd1a0d into main Aug 5, 2024
22 checks passed
@hawkw hawkw deleted the eliza/omdb-migration branch August 5, 2024 19:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants