Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Need to review entries in inv_dataset table #6615

Open
askfongjojo opened this issue Sep 20, 2024 · 10 comments · Fixed by #6632
Open

Need to review entries in inv_dataset table #6615

askfongjojo opened this issue Sep 20, 2024 · 10 comments · Fixed by #6632
Assignees
Milestone

Comments

@askfongjojo
Copy link

I see that there is an unusally large number of records in this table that has no time_deleted or state fields:

root@[fd00:1122:3344:109::3]:32221/omicron> \d inv_dataset
     column_name    | data_type | is_nullable | column_default | generation_expression |      indices       | is_hidden
--------------------+-----------+-------------+----------------+-----------------------+--------------------+------------
  inv_collection_id | UUID      |    false    | NULL           |                       | {inv_dataset_pkey} |   false
  sled_id           | UUID      |    false    | NULL           |                       | {inv_dataset_pkey} |   false
  id                | UUID      |    true     | NULL           |                       | {inv_dataset_pkey} |   false
  name              | STRING    |    false    | NULL           |                       | {inv_dataset_pkey} |   false
  available         | INT8      |    false    | NULL           |                       | {inv_dataset_pkey} |   false
  used              | INT8      |    false    | NULL           |                       | {inv_dataset_pkey} |   false
  quota             | INT8      |    true     | NULL           |                       | {inv_dataset_pkey} |   false
  reservation       | INT8      |    true     | NULL           |                       | {inv_dataset_pkey} |   false
  compression       | STRING    |    false    | NULL           |                       | {inv_dataset_pkey} |   false
(9 rows)

root@[fd00:1122:3344:109::3]:32221/omicron> select count(*) from inv_dataset;
   count
------------
  35641826

root@[fd00:1122:3344:109::3]:32221/omicron> select sled_id, count(*) from inv_dataset group by sled_id;
                sled_id                |  count
---------------------------------------+----------
  f15774c1-b8e5-434f-a493-ec43f96cba06 | 3129834
  bd96ef7c-4941-4729-b6f7-5f47feecbc4b | 2054724
  71def415-55ad-46b4-ba88-3ca55d7fb287 | 3064761
  a2adea92-b56e-44fc-8a0d-7d63b5fd3b93 | 3015194
  dd83e75a-1edf-4aa1-89a0-cd8b2091a7cd | 3221035
  0c7011f7-a4bf-4daf-90cc-1c2410103300 | 2894176
  2707b587-9c7f-4fb0-a7af-37c3b7a9a0fa | 2943800
  7b862eb6-7f50-4c2f-b9a6-0d12ac913d3c | 2948692
  87c2c4fc-b0c7-4fef-a305-78f0ed265bbc | 3009928
  5f6720b8-8a31-45f8-8c94-8e699218f28b | 3131293
  b886b58a-1e3f-4be1-b9f2-0c2e66c6bc88 | 3016263
  db183874-65b5-4263-a1c1-ddb2737ae0e9 | 3208511
(12 rows)

It may be worth confirming if this table is supposed to grow indefinitely. Other inv_ table seems to be pruned and kept to reasonable sizes (maybe related to #5305).

@askfongjojo askfongjojo changed the title Need to review Need to review entries in inv_dataset table Sep 20, 2024
@smklein smklein self-assigned this Sep 20, 2024
@davepacheco
Copy link
Collaborator

Thanks for catching this!!

Yeah it seems pretty likely that this table is missing from the inventory delete code. We can certainly fix that problem but it's a little harder to fix existing systems. We could try something like:

DELETE FROM inv_dataset WHERE inv_collection_id NOT IN (SELECT id FROM inv_collection);

But that's a potentially huge transaction and it'd be safer to do it in batches. It looks like you can do something like this but I haven't tested it:

DELETE FROM inv_dataset WHERE inv_collection_id NOT IN (SELECT id FROM inv_collection) LIMIT 1000;

and we could run that in a loop until it produces no results.

We could:

  • do this by hand the next time we upgrade each customer's system
  • write an omdb command to do this and do it at the same time
  • write a background task to do this

I'd probably lean towards the omdb command. There's a related question of: is this too urgent to wait for the next upgrade?

@askfongjojo
Copy link
Author

The issue doesn't seem to affect the system negatively at the moment AFAICT (I found it only by chance). We can do the cleanup at the next customer rack update after fixing the inventory delete code.

@jgallagher
Copy link
Contributor

We could:

* do this by hand the next time we upgrade each customer's system

* write an omdb command to do this and do it at the same time

* write a background task to do this

I'd probably lean towards the omdb command. There's a related question of: is this too urgent to wait for the next upgrade?

I this this is only on dogfood, right? Release 10 was cut on Aug 31, and the most recent main commit in it was a032d2a. inv_dataset was added in #6167, which was merged on Sep 6, and the git history on main confirms it's a fair bit newer than a032d2a.

Given that, I'd probably vote "do this by hand once we've patched this up and updated dogfood"? Sooner if it is becoming problematic?

@smklein
Copy link
Collaborator

smklein commented Sep 23, 2024

@askfongjojo , thank you for filing this issue, this was 100% an oversight on my part. I have a fix ready at at #6632 , which also adds an extra test to try to catch future cases where an inventory subcollection is added but not correctly removed.

Looking at SHOW TABLES on dogfood, I see that there are a couple other tables in addition to inv_dataset which have high-volume and no garbage collection. I filed #6635 for these other cases.

smklein added a commit that referenced this issue Sep 23, 2024
Fixes #6615

Does not patch existing systems (dogfood) which may have unnecessary
inventory entries
@jgallagher
Copy link
Contributor

(Reopening just so we remember to manually clean up dogfood after 165ddc2 is deployed)

@jgallagher jgallagher reopened this Sep 23, 2024
@davepacheco davepacheco added this to the 11 milestone Sep 23, 2024
@leftwo
Copy link
Contributor

leftwo commented Sep 24, 2024

I this this is only on dogfood, right? Release 10 was cut on Aug 31, and the most recent main commit in it was a032d2a. inv_dataset was added in #6167, which was merged on Sep 6, and the git history on main confirms it's a fair bit newer than a032d2a.

Given that, I'd probably vote "do this by hand once we've patched this up and updated dogfood"? Sooner if it is becoming problematic?

Could we confirm on all customer locations that they don't have it with:

select count(*) from inv_dataset;

Expecting to get an error that no such table exists?

@jgallagher
Copy link
Contributor

Could we confirm on all customer locations that they don't have it with:

select count(*) from inv_dataset;

Expecting to get an error that no such table exists?

Yes. It'd be nicer if we could confirm via omdb; the same PR that added inv_dataset updated omdb db inventory collections show ... to output physical disks + zpools + datasets. Do you think "run omdb and confirm none of that information is present" is a good enough proxy, or should we log into crdb manually and confirm the table isn't there?

@leftwo
Copy link
Contributor

leftwo commented Sep 24, 2024

If there is an omdb command that looks for it, that should be enough to confirm it is not present.
The only reason I might want to check before we update would be if we thought we might find
something while we have the rack parked, we could do cleanup before upgrading.

We could also take a newer omdb binary and use that to probe for the table, which might be better
as we won't have to open database.

@smklein
Copy link
Collaborator

smklein commented Sep 24, 2024

Yes. It'd be nicer if we could confirm via omdb; the same PR that added inv_dataset updated omdb db inventory collections show ... to output physical disks + zpools + datasets. Do you think "run omdb and confirm none of that information is present" is a good enough proxy, or should we log into crdb manually and confirm the table isn't there?

So the assumptions would be:

  • The omdb version and the Nexus version are tightly coupled (if Nexus isn't using inv_datasets, then omdb isn't querying for them)
  • A "lack of inv_datasets in omdb" implies they are also not present in Nexus?

I think this check would be pretty much identical to "checking the version of software deployed on the rack", as it would tell you if your codebase is "before inv_dataset" or "after inv_dataset" -- and hopefully, with our release cadence, the "after inv_dataset world" also includes #6632 .

FWIW, this is the output of omdb db inventory collections show on dogfood:

sled f15774c1-b8e5-434f-a493-ec43f96cba06 (role = Gimlet, serial BRM44220005)                                                                                                                                                                  
    found at:    2024-09-24 17:07:53.400006 UTC from http://[fd00:1122:3344:105::1]:12345                                                                                                                                                      
    address:     [fd00:1122:3344:105::1]:12345                                                                                                                                                                                                 
    usable hw threads:   128                                                                                                                                                                                                                   
    usable memory (GiB): 1011                                                                                                                                                                                                                  
    reservoir (GiB):     809                                                                                                                                                                                                                   
    physical disks:                                                                                                                                                                                                                            
      U2: DiskIdentity { vendor: "1b96", model: "WUS4C6432DSP3X3", serial: "A084A60D" } in 0                                                                                                                                                   
      U2: DiskIdentity { vendor: "1b96", model: "WUS4C6432DSP3X3", serial: "A084A6EF" } in 1                                                                                                                                                   
      U2: DiskIdentity { vendor: "1b96", model: "WUS4C6432DSP3X3", serial: "A084A5E5" } in 2                                                                                                                                                   
      U2: DiskIdentity { vendor: "1b96", model: "WUS4C6432DSP3X3", serial: "A084A6CB" } in 3                                                                                                                                                   
      U2: DiskIdentity { vendor: "1b96", model: "WUS4C6432DSP3X3", serial: "A084A63C" } in 4                                                                                                                                                   
      U2: DiskIdentity { vendor: "1b96", model: "WUS4C6432DSP3X3", serial: "A084A7F3" } in 5                                                                                                                                                   
      U2: DiskIdentity { vendor: "1b96", model: "WUS4C6432DSP3X3", serial: "A084A703" } in 6                                                                                                                                                   
      U2: DiskIdentity { vendor: "1b96", model: "WUS4C6432DSP3X3", serial: "A084A7C0" } in 7                                                                                                                                                   
      U2: DiskIdentity { vendor: "1b96", model: "WUS4C6432DSP3X3", serial: "A084A606" } in 8                                                                                                                                                   
      U2: DiskIdentity { vendor: "1b96", model: "WUS4C6432DSP3X3", serial: "A084A795" } in 9                                                                                                                                                   
      M2: DiskIdentity { vendor: "1344", model: "Micron_7300_MTFDHBG1T9TDF", serial: "2141327471AD" } in 17                                                                                                                                    
      M2: DiskIdentity { vendor: "1344", model: "Micron_7300_MTFDHBG1T9TDF", serial: "214132748879" } in 18                                                                                                                                    
    zpools                                                                                                                                                                                                                                     
      0ae29053-29a2-489e-a1e6-6aec0ecd05f8: total size: 2976 GiB                                                                                                                                                                               
      17eff217-f0b1-4353-b133-0f68bbd5ceaa: total size: 2976 GiB                                                                                                                                                                               
      4358f47f-f21e-4cc8-829e-0c7fc2400a59: total size: 2976 GiB                                                                                                                                                                               
      4eb2e4eb-41d8-496c-9a5a-687d7e004aa4: total size: 2976 GiB                                                                                                                                                                               
      57650e05-36ff-4de8-865f-b9562bdb67f5: total size: 2976 GiB                                                                                                                                                                               
      aadf48eb-6ff0-40b5-a092-1fdd06c03e11: total size: 2976 GiB
      b358fb1e-f52a-4a63-9aab-170225509b37: total size: 2976 GiB
      d1cb6b7d-2b92-4b7d-8a4d-551987f0277e: total size: 2976 GiB
      eb1234a5-fdf7-4977-94d5-2eef25ce56a1: total size: 2976 GiB
      f8b11629-ced6-412a-9c3f-d169b99ee996: total size: 2976 GiB
    datasets:
      oxp_0ae29053-29a2-489e-a1e6-6aec0ecd05f8 - id: none, compression: off
        available: 2414356254208 B, used: 681241424384 B
        reservation: None, quota: None
      oxp_0ae29053-29a2-489e-a1e6-6aec0ecd05f8/crucible - id: b3e9fee2-24d2-44e7-8539-a6918e85cf2b, compression: off
        available: 2414356254208 B, used: 652740293 KiB
        reservation: None, quota: None
      oxp_0ae29053-29a2-489e-a1e6-6aec0ecd05f8/crypt - id: none, compression: off
        available: 2414356254208 B, used: 12664662528 B
        reservation: None, quota: None
      oxp_0ae29053-29a2-489e-a1e6-6aec0ecd05f8/crypt/debug - id: none, compression: gzip-9
        available: 104857501 KiB, used: 99 KiB
        reservation: None, quota: Some(ByteCount(107374182400))
      oxp_0ae29053-29a2-489e-a1e6-6aec0ecd05f8/crypt/zone - id: none, compression: off
        available: 2414356254208 B, used: 612149 KiB
        reservation: None, quota: None
      oxp_0ae29053-29a2-489e-a1e6-6aec0ecd05f8/crypt/zone/oxz_crucible_b3e9fee2-24d2-44e7-8539-a6918e85cf2b - id: none, compression: off
        available: 2414356254208 B, used: 298954 KiB
        reservation: None, quota: None
      oxp_0ae29053-29a2-489e-a1e6-6aec0ecd05f8/crypt/zone/oxz_propolis-server_24b4948e-a9af-4abd-876c-3252305c4296 - id: none, compression: off
... <redacted, a lot more datasets>

@smklein
Copy link
Collaborator

smklein commented Sep 24, 2024

One issue with looking for the presence or absence of datasets in omdb is that an old omdb on a new version of Nexus also wouldn't report them.

My suggestion: inv_dataset was added in schema version 96.0.0. If you run the command:

$ omdb db inventory collections list

You should see output about the schema version.

# this is from dogfood
note: database schema version matches expected (98.0.0)

If a customer has a schema < 96.0.0, they don't have any inv_dataset entries, because the table does not exist. This is what I would expect for someone running R10.

@morlandi7 morlandi7 modified the milestones: 11, 12 Oct 11, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants