-
Notifications
You must be signed in to change notification settings - Fork 39
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add omdb migrations list
command
#6160
Conversation
Do these changes require anything else in Omicron to work? I tried compiling and running it and it gave me this error"
|
Ugh...I thought it worked for me locally a couple days ago, but I may have only tested the variants of the command with instance ID and migration ID arguments. Looks like the bare |
dev-tools/omdb/src/bin/omdb/db.rs
Outdated
let migrations = query | ||
.limit(i64::from(u32::from(fetch_opts.fetch_limit))) | ||
.order_by(dsl::time_created) | ||
.select(Migration::as_select()) | ||
.load_async(&*datastore.pool_connection_for_tests().await?) | ||
.await | ||
.context("listing migrations")?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the full table scan is because we have no index on "time_created", so this query will need to look up all migrations to return any migrations, even less than fetch_limit
.
Two ways of fixing this:
- Add the index. Pros: works, no full scan. Cons: adds to cockroachdb footprint, isn't free to maintain, a little odd it's only for omdb.
- Allow the full table scan (see:
ALLOW_FULL_TABLE_SCAN_SQL
in this file for examples). Pros: Works, minimally invasive. Cons: If we have a rack with a gargantuan number of migrations, this could cause OOM-ish issues.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AH, yeah, that's probably the issue --- I added the order-by more recently than the last time I tested this command. I was assuming the full-table scan was necessary because we were trying to list all migrations.
A third option would just be to remove the order-by clause and have omdb
sort the result set in memory. But, because of the LIMIT clause, that has somewhat different semantics --- we will display the migrations in chronological order, but we won't necessarily have selected the most recent records...which seems pretty bad. So, one of your suggestions is probably better.
Naively, it kinda seems to me like the index is the better option...it seems like we might want to be able to order migrations by time anyway, such as if (for example) we eventually grow a UI for displaying migration history for an instance.
And, I'd rather avoid OOM issues in "real life" racks! I notice, however, that a bunch of existing OMDB commands use the approach of allowing full-table scans instead, though, so maybe there's some reason that we would prefer not to add an index just for OMDB use? Some of those commands look like they're actually just loading the entire table, though?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And, I'd rather avoid OOM issues in "real life" racks! I notice, however, that a bunch of existing OMDB commands use the approach of allowing full-table scans instead, though, so maybe there's some reason that we would prefer not to add an index just for OMDB use? Some of those commands look like they're actually just loading the entire table, though?
It's possible other examples did not do the "right" thing? I say that as someone who may have added one of
those full table scans :)
I also believe I may have added an index just for OMDB.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, just adding the following index still results in a full-table scan:
CREATE INDEX IF NOT EXISTS migrations_by_time_created ON omicron.public.migration (
time_created
);
I wonder if it's necessary for the migration to also include time_deleted
, or something?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, adding time-deleted
to the index also doesn't seem to help:
CREATE INDEX IF NOT EXISTS migrations_by_time_created ON omicron.public.migration (
time_created, time_deleted
);
I guess I'll just have it allow the full-table scan for now, unless someone else has any idea what I've got wrong with the index?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ooh
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't believe the issue is with the index, but the fact that the query doesn't supply any filtering clauses on the indexed columns, e.g., time_created
. There's nothing it can use to restrict the search here, and so this has to be a full-table scan. I believe you can add a (possibly-dummy) time_created
to start from, and then the planner should be aware that it can use the right index.
For example:
let migrations = query
.limit(i64::from(u32::from(fetch_opts.fetch_limit)))
.order_by(dsl::time_created)
.filter(dsl::time_created.gt(chrono::DateTime::UNIX_EPOCH)) // Any time, really
.select(Migration::as_select())
.load_async(&*datastore.pool_connection_for_tests().await?)
.await
.context("listing migrations")?;
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For example, I ran omicron-dev db-run
and then connected to the database. I added the same index you're using here, but a query with no filtering clauses is a full table scan, while one with a filter can use the index:
root@[::1]:32221/omicron> create index if not exists migration_by_creation on migration (time_created);
CREATE INDEX
Time: 111ms total (execution 111ms / network 0ms)
root@[::1]:32221/omicron> select time_created from migration;
time_created
----------------
(0 rows)
Time: 1ms total (execution 1ms / network 0ms)
root@[::1]:32221/omicron> explain select time_created from migration;
info
-------------------------------------------------------------------------------
distribution: local
vectorized: true
• scan
estimated row count: 1 (100% of the table; stats collected 5 seconds ago)
table: migration@migration_by_creation
spans: FULL SCAN
(7 rows)
Time: 1ms total (execution 0ms / network 0ms)
root@[::1]:32221/omicron> explain select time_created from migration where time_created > '1970-01-01 1:00:00 UTC';
info
--------------------------------------------------------------------------------
distribution: local
vectorized: true
• scan
estimated row count: 1 (100% of the table; stats collected 35 seconds ago)
table: migration@migration_by_creation
spans: [/'1970-01-01 01:00:00.000001+00:00' - ]
(7 rows)
Time: 1ms total (execution 1ms / network 0ms)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
here's the EXPLAIN
output:
distribution: local
vectorized: true
• sort
│ estimated row count: 1
│ order: +time_created
│
└── • limit
│ estimated row count: 1
│ count: 500
│
└── • sort
│ estimated row count: 1
│ order: +time_created
│
└── • filter
│ estimated row count: 1
│ filter: time_deleted IS NULL
│
└── • scan
estimated row count: 1 (100% of the table; stats collected 7 minutes ago)
table: migration@migration_pkey
spans: FULL SCAN
index recommendations: 1
1. type: index creation
SQL command: CREATE INDEX ON migration (time_deleted) STORING (instance_id, time_created, source_state, source_propolis_id, source_gen, time_source_updated, target_state, target_propolis_id, target_gen, time_target_updated);
and, when running with the --include-deleted
flag, it also says:
distribution: local
vectorized: true
• sort
│ estimated row count: 1
│ order: +time_created
│
└── • limit
│ estimated row count: 1
│ count: 500
│
└── • sort
│ estimated row count: 1
│ order: +time_created
│
└── • scan
estimated row count: 1 (100% of the table; stats collected 7 minutes ago)
table: migration@migration_pkey
spans: FULL SCAN
index recommendations: 1
1. type: index creation
SQL command: CREATE INDEX ON migration (time_created) STORING (instance_id, time_deleted, source_state, source_propolis_id, source_gen, time_source_updated, target_state, target_propolis_id, target_gen, time_target_updated);
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yup, @bnaecker's suggestion of adding a no-op filter seems to fix it. Thanks Ben!
This commit adds a new OMDB command for listing the contents of the migration table. The query can be filtered by migration state, and can also show only the migrations of one or more instances.
f811e2c
to
161874d
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Two small questions, both non-blocking. Thanks!
let migrations = query | ||
.limit(i64::from(u32::from(fetch_opts.fetch_limit))) | ||
.order_by(dsl::time_created) | ||
// This is just to prove to CRDB that it can use the | ||
// migrations-by-time-created index, it doesn't actually do anything. | ||
.filter(dsl::time_created.gt(chrono::DateTime::UNIX_EPOCH)) | ||
.select(Migration::as_select()) | ||
.load_async(&*datastore.pool_connection_for_tests().await?) | ||
.await | ||
.context("listing migrations")?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This all seems good as far as using the new index. Does it make sense to let the caller / CLI provide timestamps themselves? E.g., they could look at migrations in the last hour, if they wanted, or the code could use the UNIX epoch if nothing was provided.
Part of why I ask is we have given control of the limit in the CLI, but not the starting point.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd like to add that as well, but I was thinking about doing it in a subsequent
PR. I could do it now, though!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, if you want to leave it for a follow-up, that's fine too! I think it'd be nice to have both in the limit, but not crucial they land in the same commit. Up to you :)
KnownVersion::new(85, "region-read-only"), | ||
KnownVersion::new(84, "add-migrations-by-time-created-index"), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did you intend to add the add-migrations-by-time-create-index
migration after the region-read-only
? I don't think it should matter as long as they're orthogonal.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
whoops, i messed up the rebase --- lemme fix that!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm all good with the code here.
I would like to see some sample output from the command, if possible.
For some sample output, see #5749 (comment). Having actually tested out the command in real life, it turns out the output is...really wide. I'm thinking about changing it to only show the migration ID column in verbose mode, and also potentially hiding the instance ID if the command is run with the flag to select only a single instance. |
Cool, cool. The output is really a matter of taste, and for the one who will be using the command really. |
@leftwo Yeah, for sure. My personal preference is for the default output to fit in a reasonably narrow terminal, because I find the tabular output much harder to interpret if it's been wrapped. So, I think I'm gonna make a few bits verbose mode only. |
Okay, I've tweaked the output a bit more (cd33fc9), it should now look more like this: root@oxz_switch1:~# ./omdb db migration ls --instance-id 604763da-4af9-45b9-a592-aaeb8fee06b4
note: database URL not specified. Will search DNS.
note: (override with --db-url or OMDB_DB_URL)
note: using DNS server for subnet fd00:1122:3344::/48
note: (if this is not right, use --dns-server to specify an alternate DNS server)
note: using database URL postgresql://root@[fd00:1122:3344:103::3]:32221,[fd00:1122:3344:101::3]:32221,[fd00:1122:3344:102::3]:32221,[fd00:1122:3344:104::4]:32221,[fd00:1122:3344:104::3]:32221/omicron?sslmode=disable
WARN: found schema version 84.0.0, expected 85.0.0
It's possible the database is running a version that's different from what this
tool understands. This may result in errors or incorrect output.
CREATED SRC_STATE TGT_STATE SRC_VMM TGT_VMM
2024-08-03 18:45:29.310857 UTC completed completed ace760d0-3698-4856-bda4-cbf433fcfb4e 704251e4-04dc-4e6e-8a7f-02ee46571543
2024-08-03 18:46:00.927484 UTC completed completed 704251e4-04dc-4e6e-8a7f-02ee46571543 14009464-b2aa-4767-babb-9728af1959b9
2024-08-03 18:46:32.039600 UTC completed completed 14009464-b2aa-4767-babb-9728af1959b9 5580dd81-afb8-4dd6-922a-f776c773f0a9
2024-08-03 18:47:03.114914 UTC completed completed 5580dd81-afb8-4dd6-922a-f776c773f0a9 511a229c-84c3-486d-b080-b3c0d60895cd
2024-08-03 18:47:34.950071 UTC completed completed 511a229c-84c3-486d-b080-b3c0d60895cd b468d316-aaae-45d1-a0a0-7a6cf2cd77f4
2024-08-03 18:48:07.060381 UTC completed completed b468d316-aaae-45d1-a0a0-7a6cf2cd77f4 03736b34-7453-4802-8eb5-03030184bd91
2024-08-03 18:48:38.152076 UTC completed completed 03736b34-7453-4802-8eb5-03030184bd91 cff0be96-c57d-4d21-9215-ebc304a393e9
2024-08-03 18:49:09.691518 UTC completed completed cff0be96-c57d-4d21-9215-ebc304a393e9 61169459-b4ab-4db2-b7d4-844b5318c4ac
2024-08-03 18:49:40.780488 UTC completed completed 61169459-b4ab-4db2-b7d4-844b5318c4ac 61e8015e-dbe3-411a-b88c-fc55a35e4fa6
2024-08-03 18:50:11.792970 UTC completed completed 61e8015e-dbe3-411a-b88c-fc55a35e4fa6 879cb984-51a5-4279-947b-6a9dd7c6925d
2024-08-03 18:50:42.886472 UTC completed completed 879cb984-51a5-4279-947b-6a9dd7c6925d b8239c3f-272d-4207-853d-8d07f268be59
2024-08-03 18:51:14.020375 UTC completed completed b8239c3f-272d-4207-853d-8d07f268be59 5e57b727-7343-493b-a954-3dde0aa98a7d
2024-08-03 18:51:45.087713 UTC completed completed 5e57b727-7343-493b-a954-3dde0aa98a7d df2d0980-6ad1-4ce3-b0f8-f045cc0a06c1
2024-08-03 18:52:16.085818 UTC completed completed df2d0980-6ad1-4ce3-b0f8-f045cc0a06c1 9878f88c-3c43-4786-905e-f8ea9e9d5d6f
2024-08-03 18:52:47.220469 UTC completed completed 9878f88c-3c43-4786-905e-f8ea9e9d5d6f ba12da12-13da-40c6-9904-ffc803f30485
2024-08-03 18:53:18.278158 UTC completed completed ba12da12-13da-40c6-9904-ffc803f30485 ae8ec94f-b02b-4b1d-beea-a6f11ff64972
root@oxz_switch1:~# The instance ID is included when migrations are not filtered by instance, but I figured it was nice to omit it when we're only looking at a single instance. |
ugh, i broke it. please hold... |
This commit adds a new OMDB command for listing the contents of the
migration table. The query can be filtered by migration state, and can
also show only the migrations of one or more instances.