(7/5) [nexus-db-queries] Benchmark for VMM reservation #7498

smklein · 2025-02-06T22:12:47Z

Following-up on the affinity work, I wanted to validate that the additional logic for affinity groups does not make the performance of the instance reservation query any worse than it was before.

hawkw

This is a cursory first review, I want to look closer at the actual benchmarks, but figured I'd go ahead and leave some quick notes on the surrounding code.

hawkw · 2025-02-21T18:34:36Z

nexus/db-queries/benches/harness/db_utils.rs

+//! It may be worth refactoring some of these functions to a test utility
+//! crate to avoid the de-duplication.


Alternatively ,we could perhaps just publicly export them under a "test utils" feature, or something? That might be a simpler solution to de-duplicate this without having to go make a whole new crate for them?

I did this refactor in 36ddafc

hawkw · 2025-02-21T23:07:38Z

nexus/db-queries/benches/harness/mod.rs

+    ContentionQuery {
+        sql: "SELECT table_name, index_name, num_contention_events::TEXT FROM crdb_internal.cluster_contended_indexes",
+        description: "Indexes which are experiencing contention",
+    },
+    ContentionQuery {
+        sql: "SELECT table_name,num_contention_events::TEXT FROM crdb_internal.cluster_contended_tables",
+        description: "Tables which are experiencing contention",
+    },
+    ContentionQuery {
+        sql: "WITH c AS (SELECT DISTINCT ON (table_id, index_id) table_id, index_id, num_contention_events AS events, cumulative_contention_time AS time FROM crdb_internal.cluster_contention_events) SELECT i.descriptor_name as table_name, i.index_name, c.events::TEXT, c.time::TEXT FROM crdb_internal.table_indexes AS i JOIN c ON i.descriptor_id = c.table_id AND i.index_id = c.index_id ORDER BY c.time DESC LIMIT 10;",
+        description: "Top ten longest contention events, grouped by table + index",
+    },


sorry to be annoying, but...i don't suppose we could wrap some of these queries?

hawkw · 2025-02-21T23:08:59Z

nexus/db-queries/benches/sled_reservation.rs

+// You can also set the "SHOW_CONTENTION" environment variable to display
+// additional data from CockroachDB tables about contention statistics.


hawkw · 2025-02-21T23:12:03Z

nexus/db-queries/benches/harness/mod.rs

+                        print!("|");
+                        total_len += width + 3;
+                    }
+                    println!("");


huh, i thought clippy disliked printlns with empty strings in them, but perhaps i made that up...

gjcolombo

This looks good to me, though I have a few small comments. Thanks for putting this together!

One other general remark: it'd be good to have some checked-in instructions for running the tests (assuming I haven't overlooked them). Do I just run cargo bench? Is there some initial setup required to establish a baseline that Criterion can detect regressions against?

gjcolombo · 2025-02-24T18:21:00Z

nexus/db-queries/benches/sled_reservation.rs

+        // For example: if the total number of vmms has no impact on the next provisioning
+        // request, we should see similar durations for "100 vmms reserved" vs "1 vmm
+        // reserved". However, if more vmms actually make reservation times slower, we'll see
+        // the "100 vmm" case take longer than the "1 vmm" case. The same goes for tasks:


nit: partial comment? (The line ends with a colon.)

whoops, fixed in b8c5024

gjcolombo · 2025-02-24T18:52:15Z

nexus/db-queries/benches/harness/db_utils.rs

+                // due to contention. We normally bubble this out to users,
+                // rather than stalling the request, but in this particular
+                // case, we choose to retry immediately.


Is it possible/worthwhile to report the number of retries as a lower-is-better benchmarked metric, or does criterion just deal with durations? Extra retries will hurt the average duration, too, but I can see it being useful to have an explicit metric for this, especially if "real" retries produce a user error.

It looks like the answer may be "no" per this bit of the docs ("Note that as of version 0.3.0, only timing measurements are supported, and only a single measurement can be used for one benchmark. These restrictions may be lifted in future versions."). Alas. I'm leaving the comment here, though, in case there's some other clever way to display this that I haven't thought of yet.

Yeah, I think it's okay right now that we're tracking time -- if it helps, it's a decent proxy for retries, even though, as you say, we don't actually acknowledge the number of user-visible retries here.

I'm mostly doing this so I can force the system into contention, which is where benchmarking is most insightful anyway.

gjcolombo · 2025-02-24T19:16:57Z

nexus/db-queries/benches/sled_reservation.rs

+    // Number of vmms to provision from the task-under-test
+    vmms: usize,


Just to make sure I'm following everything: this also determines the number of instances the test creates, and each task will end up allocating one VMM for each of these instances (i.e., the VMMs/instances are not partitioned by task). Is that correct? (I'm 99% sure it is from the phrase "task-under-test," just making sure I haven't missed something.)

That is correct, each "task" operates on a totally distinct set of instances/vmms from the other tasks.

gjcolombo · 2025-02-24T19:17:49Z

nexus/db-queries/benches/sled_reservation.rs

+                },
+            ],
+        },
+        // TODO create a test for "policy = Fail" groups.


nit: you might've been planning to file one already but this seems like it's probably worth an issue

Filed #7628

smklein · 2025-02-25T00:53:37Z

One other general remark: it'd be good to have some checked-in instructions for running the tests (assuming I haven't overlooked them). Do I just run cargo bench? Is there some initial setup required to establish a baseline that Criterion can detect regressions against?

I tried to comment this in:

omicron/nexus/db-queries/benches/sled_reservation.rs

Lines 93 to 104 in fba0614

    
           ///////////////////////////////////////////////////////////////// 
        
           // 
        
           // BENCHMARKS 
        
           // 
        
           // You can run these with the following command: 
        
           // 
        
           // ```bash 
        
           // cargo bench -p nexus-db-queries 
        
           // ``` 
        
           // 
        
           // You can also set the "SHOW_CONTENTION" environment variable to display 
        
           // additional data from CockroachDB tables about contention statistics.

But I can make a README too!

Criterion has command line args that can be used to specify how to create a baseline, but you can also create a baseline by just "Running it once, then running it again later". The baseline is just "whatever you ran last time".

smklein added 30 commits January 30, 2025 12:08

[nexus] Add Affinity/Anti-Affinity Groups to API (unimplemented)

c9fb7a6

[nexus] Add Affinity/Anti-Affinity groups to database

4020517

[nexus] Add CRUD implementations for Affinity/Anti-Affinity Groups

8f1d37c

[nexus] Consider Affinity/Anti-Affinity Groups during instance placement

772e64f

[nexus] Implement Affinity/Anti-Affinity Groups in external API

d8cff32

fix policy tests

161f9d6

Merge branch 'affinity-db-crud' into affinity-instance-integration

df119b6

Merge branch 'affinity-instance-integration' into affinity-integration

83a26a4

fmt

8dc0825

Merge branch 'affinity-api' into affinity-db-model

e3113ff

Merge branch 'affinity-db-model' into affinity-db-crud

789bc97

Merge branch 'affinity-db-crud' into affinity-instance-integration

5e21f34

Merge branch 'affinity-instance-integration' into affinity-integration

fa9461b

tags

4e9cebc

doc comments

6cfca2d

Merge branch 'affinity-api' into affinity-db-model

a1c97d4

Merge branch 'main' into affinity-api

050b4c5

typed UUID

4b08032

Merge branch 'affinity-api' into affinity-db-model

8bc8f0c

Typed UUID

900f09c

Merge branch 'affinity-db-model' into affinity-db-crud

195e167

Typed UUID

f2ebe31

Merge branch 'affinity-db-crud' into affinity-instance-integration

62c38ec

Merge branch 'affinity-instance-integration' into affinity-integration

85985c1

UUID typing

1326116

comments

aba9596

Merge branch 'affinity-db-model' into affinity-db-crud

a271f1d

review feedback

1ad0101

comment

4d26262

clippy

6ae1910

smklein added 6 commits February 21, 2025 11:30

Merge branch 'affinity-api' into affinity-db-model

e15460c

Merge branch 'affinity-db-model' into affinity-db-crud

9190d80

Merge branch 'affinity-db-crud' into affinity-instance-integration

8d90161

Merge branch 'affinity-instance-integration' into affinity-integration

9f6ad7b

Merge branch 'affinity-integration' into sled-resource-vmm

49196d2

Merge branch 'sled-resource-vmm' into vmm-reserve-bench

7193c88

hawkw reviewed Feb 21, 2025

View reviewed changes

gjcolombo approved these changes Feb 24, 2025

View reviewed changes

smklein added 16 commits February 24, 2025 12:04

Merge branch 'main' into affinity-api

16b3df3

Merge branch 'affinity-api' into affinity-db-model

f5bbe28

Merge branch 'affinity-db-model' into affinity-db-crud

beaedfc

Merge branch 'affinity-db-crud' into affinity-instance-integration

afbc4ad

Merge branch 'affinity-instance-integration' into affinity-integration

25a6e5e

Merge branch 'affinity-integration' into sled-resource-vmm

7c90072

Merge branch 'sled-resource-vmm' into vmm-reserve-bench

701106d

review feedback

0419d37

Merge branch 'affinity-integration' into sled-resource-vmm

2ad7bb7

Merge branch 'sled-resource-vmm' into vmm-reserve-bench

da625b9

Relocate some functions into pub_test_utils

36ddafc

Code review feedback

b8c5024

Merge branch 'main' into affinity-instance-integration

bba32a6

Merge branch 'affinity-instance-integration' into affinity-integration

e893250

Merge branch 'affinity-integration' into sled-resource-vmm

f5ec80d

Merge branch 'sled-resource-vmm' into vmm-reserve-bench

fba0614

README

e79fa5f

Base automatically changed from sled-resource-vmm to main February 25, 2025 19:35

smklein added 3 commits February 25, 2025 11:36

Merge branch 'main' into vmm-reserve-bench

faed1c4

Add issue

01029f1

Merge branch 'main' into vmm-reserve-bench

3c0cc80

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

(7/5) [nexus-db-queries] Benchmark for VMM reservation #7498

(7/5) [nexus-db-queries] Benchmark for VMM reservation #7498

smklein commented Feb 6, 2025 •

edited

Loading

hawkw left a comment

hawkw Feb 21, 2025

smklein Feb 25, 2025

hawkw Feb 21, 2025

smklein Feb 25, 2025

hawkw Feb 21, 2025

hawkw Feb 21, 2025

gjcolombo left a comment

gjcolombo Feb 24, 2025

smklein Feb 25, 2025

gjcolombo Feb 24, 2025

gjcolombo Feb 24, 2025

smklein Feb 25, 2025

gjcolombo Feb 24, 2025

smklein Feb 25, 2025

gjcolombo Feb 24, 2025

smklein Feb 25, 2025

smklein commented Feb 25, 2025

		//! It may be worth refactoring some of these functions to a test utility
		//! crate to avoid the de-duplication.

		// You can also set the "SHOW_CONTENTION" environment variable to display
		// additional data from CockroachDB tables about contention statistics.

		// Number of vmms to provision from the task-under-test
		vmms: usize,

(7/5) [nexus-db-queries] Benchmark for VMM reservation #7498

Are you sure you want to change the base?

(7/5) [nexus-db-queries] Benchmark for VMM reservation #7498

Conversation

smklein commented Feb 6, 2025 • edited Loading

hawkw left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

gjcolombo left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

smklein commented Feb 25, 2025

smklein commented Feb 6, 2025 •

edited

Loading