Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[nexus] Expunge disk internal API, omdb commands #5994

Merged
merged 82 commits into from
Jul 15, 2024
Merged
Show file tree
Hide file tree
Changes from 74 commits
Commits
Show all changes
82 commits
Select commit Hold shift + click to select a range
cee78bc
Start requiring zone filesystem argument
smklein Jun 20, 2024
3be4b6e
Deprecate the old service format
smklein Jun 20, 2024
8756076
Merge branch 'main' into nexus-zone-filesystems-2
smklein Jun 20, 2024
0aac450
Merge branch 'deprecate-services-migration' into nexus-zone-filesyste…
smklein Jun 20, 2024
3833549
Plumbing through filesystem_pool, still need to make it optional
smklein Jun 21, 2024
aea4bdb
Merge branch 'main' into deprecate-services-migration
smklein Jun 21, 2024
b58352f
review feedback
smklein Jun 21, 2024
a04e9c7
no bail just warn
smklein Jun 21, 2024
4615f1b
Merge branch 'deprecate-services-migration' into nexus-zone-filesyste…
smklein Jun 21, 2024
9f09c32
Merge branch 'main' into deprecate-services-migration
smklein Jun 21, 2024
9db3042
Merge branch 'deprecate-services-migration' into nexus-zone-filesyste…
smklein Jun 21, 2024
a96fc81
optional value
smklein Jun 21, 2024
9858dbf
are we optional yet
smklein Jun 21, 2024
f1e6f7a
lie about filesystem_pools for simulated sled agent
smklein Jun 21, 2024
8a9ade7
Patch test_builder_zones
smklein Jun 24, 2024
d7c462c
Fix test_silos_external_dns_end_to_end
smklein Jun 24, 2024
3c59610
patch v3 schema
smklein Jun 24, 2024
1270098
Patch blueprint edit
smklein Jun 24, 2024
f48fba3
Add schema change
smklein Jun 24, 2024
684932d
fmt
smklein Jun 24, 2024
87b8df9
Merge branch 'main' into nexus-zone-filesystems-2
smklein Jun 24, 2024
52406a6
helios tests
smklein Jun 24, 2024
acaf91f
Merge branch 'main' into nexus-zone-filesystems-2
smklein Jun 24, 2024
b1339d4
Cleanup
smklein Jun 24, 2024
fcea2f1
Merge branch 'main' into nexus-zone-filesystems-2
smklein Jun 25, 2024
53027a3
only pick in-service zpools from reconfigurator - regression test wanted
smklein Jun 25, 2024
ae41399
Merge zpool selection fns
smklein Jun 25, 2024
5b38070
Add colocation test
smklein Jun 25, 2024
f0ab1c2
Merge branch 'main' into nexus-zone-filesystems-2
smklein Jun 26, 2024
b883eec
Ensure expunged disks are not in use after omicron_physical_disks_ensure
smklein Jun 27, 2024
83c7cdf
Fix tests, add comments
smklein Jun 27, 2024
6869d92
Zone bundler
smklein Jun 28, 2024
4292158
Plumb 'PathInPool' structure
smklein Jul 1, 2024
17db428
Destroy instances
smklein Jul 1, 2024
32596df
Remove unused zone code
smklein Jul 1, 2024
d83a553
Merge branch 'main' into nexus-zone-filesystems-2
smklein Jul 1, 2024
d6618e7
Merge branch 'nexus-zone-filesystems-2' into physical_disks_ensure_le…
smklein Jul 1, 2024
2c6eb01
fix helios tests
smklein Jul 1, 2024
e4123a9
Add TODO, re: concurrency safety
smklein Jul 1, 2024
98278d4
Merge branch 'main' into nexus-zone-filesystems-2
smklein Jul 1, 2024
fa91e75
Merge branch 'nexus-zone-filesystems-2' into physical_disks_ensure_le…
smklein Jul 1, 2024
1207c9e
very WIP - adjusting generation
smklein Jul 2, 2024
892a7ca
Stop self-managing disks
smklein Jul 2, 2024
15b8d21
Merge branch 'stop-self-managing-disks' into physical_disks_ensure_le…
smklein Jul 2, 2024
c0e8e07
Fix imports
smklein Jul 2, 2024
654a4ce
Merge branch 'stop-self-managing-disks' into physical_disks_ensure_le…
smklein Jul 2, 2024
187aea3
generation number unity
smklein Jul 2, 2024
7c5a67f
Merge branch 'main' into stop-self-managing-disks
smklein Jul 2, 2024
b50007b
Merge branch 'stop-self-managing-disks' into physical_disks_ensure_le…
smklein Jul 2, 2024
c2ee842
Remove self-managing test too
smklein Jul 2, 2024
a437cc2
imports
smklein Jul 2, 2024
d9ab0e2
Merge branch 'main' into stop-self-managing-disks
smklein Jul 2, 2024
3d91d67
Merge branch 'stop-self-managing-disks' into physical_disks_ensure_le…
smklein Jul 2, 2024
c7d4e2e
Merge branch 'main' into stop-self-managing-disks
smklein Jul 2, 2024
7751f12
Merge branch 'stop-self-managing-disks' into physical_disks_ensure_le…
smklein Jul 2, 2024
8f2301d
Safe against concurrent updates
smklein Jul 2, 2024
48c3578
Patch firmware tests
smklein Jul 2, 2024
e933a46
Add a bunch of logging
smklein Jul 2, 2024
c71504a
[wip] Expunge disk internal API, omdb commands
smklein Jul 3, 2024
05a9f38
More omdb commands, physical disk filtering, safer expunge
smklein Jul 3, 2024
df9835e
copy-pastes, minor diskfilter cleanup
smklein Jul 3, 2024
154a071
review feedback
smklein Jul 3, 2024
691bc85
tx naming
smklein Jul 5, 2024
e360dae
more explicit instance termination
smklein Jul 5, 2024
1c7fb2f
Merge branch 'physical_disks_ensure_lets_go' into omdb-disk-expungement
smklein Jul 5, 2024
bc80de5
typo fix
smklein Jul 5, 2024
ec013d9
better handling of oneshot tx in instance manager
smklein Jul 5, 2024
a818de2
use_only_these_disks
smklein Jul 5, 2024
ec75032
Merge branch 'physical_disks_ensure_lets_go' into omdb-disk-expungement
smklein Jul 5, 2024
f242e0a
Mark vmm failed
smklein Jul 5, 2024
9d93c82
Merge branch 'physical_disks_ensure_lets_go' into omdb-disk-expungement
smklein Jul 5, 2024
77931fd
Merge branch 'main' into stop-self-managing-disks
smklein Jul 12, 2024
6babd19
Merge branch 'stop-self-managing-disks' into physical_disks_ensure_le…
smklein Jul 12, 2024
5ca0f8e
Merge branch 'physical_disks_ensure_lets_go' into omdb-disk-expungement
smklein Jul 12, 2024
d8a5465
Merge branch 'main' into stop-self-managing-disks
smklein Jul 12, 2024
d57ec70
Merge branch 'stop-self-managing-disks' into physical_disks_ensure_le…
smklein Jul 12, 2024
da39958
Merge branch 'physical_disks_ensure_lets_go' into omdb-disk-expungement
smklein Jul 12, 2024
9e1d729
Merge branch 'main' into stop-self-managing-disks
smklein Jul 15, 2024
426daf1
Merge branch 'stop-self-managing-disks' into physical_disks_ensure_le…
smklein Jul 15, 2024
6c15263
Merge branch 'physical_disks_ensure_lets_go' into omdb-disk-expungement
smklein Jul 15, 2024
65a03ce
review feedback
smklein Jul 15, 2024
9181512
fmt
smklein Jul 15, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
94 changes: 89 additions & 5 deletions dev-tools/omdb/src/bin/omdb/db.rs
Original file line number Diff line number Diff line change
Expand Up @@ -62,6 +62,7 @@ use nexus_db_model::IpAttachState;
use nexus_db_model::IpKind;
use nexus_db_model::NetworkInterface;
use nexus_db_model::NetworkInterfaceKind;
use nexus_db_model::PhysicalDisk;
use nexus_db_model::Probe;
use nexus_db_model::Project;
use nexus_db_model::Region;
Expand Down Expand Up @@ -96,7 +97,10 @@ use nexus_types::deployment::Blueprint;
use nexus_types::deployment::BlueprintZoneDisposition;
use nexus_types::deployment::BlueprintZoneFilter;
use nexus_types::deployment::BlueprintZoneType;
use nexus_types::deployment::DiskFilter;
use nexus_types::deployment::SledFilter;
use nexus_types::external_api::views::PhysicalDiskPolicy;
use nexus_types::external_api::views::PhysicalDiskState;
use nexus_types::external_api::views::SledPolicy;
use nexus_types::external_api::views::SledState;
use nexus_types::identity::Resource;
Expand Down Expand Up @@ -281,12 +285,14 @@ pub struct DbFetchOptions {
enum DbCommands {
/// Print information about the rack
Rack(RackArgs),
/// Print information about disks
/// Print information about virtual disks
Disks(DiskArgs),
/// Print information about internal and external DNS
Dns(DnsArgs),
/// Print information about collected hardware/software inventory
Inventory(InventoryArgs),
/// Print information about physical disks
PhysicalDisks(PhysicalDisksArgs),
/// Save the current Reconfigurator inputs to a file
ReconfiguratorSave(ReconfiguratorSaveArgs),
/// Print information about regions
Expand Down Expand Up @@ -407,8 +413,8 @@ enum InventoryCommands {
Cabooses,
/// list and show details from particular collections
Collections(CollectionsArgs),
/// show all physical disks every found
PhysicalDisks(PhysicalDisksArgs),
/// show all physical disks ever found
PhysicalDisks(InvPhysicalDisksArgs),
/// list all root of trust pages ever found
RotPages,
}
Expand Down Expand Up @@ -437,14 +443,21 @@ struct CollectionsShowArgs {
}

#[derive(Debug, Args, Clone, Copy)]
struct PhysicalDisksArgs {
struct InvPhysicalDisksArgs {
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A disk in the inventory is a different thing than a "control plane physical disk", so this was renamed to be more explicit.

#[clap(long)]
collection_id: Option<CollectionUuid>,

#[clap(long, requires("collection_id"))]
sled_id: Option<SledUuid>,
}

#[derive(Debug, Args)]
struct PhysicalDisksArgs {
/// Show disks that match the given filter
#[clap(short = 'F', long, value_enum)]
filter: Option<DiskFilter>,
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Happy to add more filters, just wanted at least one option here.

}

#[derive(Debug, Args)]
struct ReconfiguratorSaveArgs {
/// where to save the output
Expand Down Expand Up @@ -611,6 +624,15 @@ impl DbArgs {
)
.await
}
DbCommands::PhysicalDisks(args) => {
cmd_db_physical_disks(
&opctx,
&datastore,
&self.fetch_opts,
args,
)
.await
}
DbCommands::ReconfiguratorSave(reconfig_save_args) => {
cmd_db_reconfigurator_save(
&opctx,
Expand Down Expand Up @@ -1385,6 +1407,68 @@ async fn cmd_db_disk_physical(
Ok(())
}

#[derive(Tabled)]
#[tabled(rename_all = "SCREAMING_SNAKE_CASE")]
struct PhysicalDiskRow {
id: Uuid,
serial: String,
vendor: String,
model: String,
sled_id: Uuid,
policy: PhysicalDiskPolicy,
state: PhysicalDiskState,
}

impl From<PhysicalDisk> for PhysicalDiskRow {
fn from(d: PhysicalDisk) -> Self {
PhysicalDiskRow {
id: d.id(),
serial: d.serial.clone(),
vendor: d.vendor.clone(),
model: d.model.clone(),
sled_id: d.sled_id,
policy: d.disk_policy.into(),
state: d.disk_state.into(),
}
}
}

/// Run `omdb db physical-disks`.
async fn cmd_db_physical_disks(
opctx: &OpContext,
datastore: &DataStore,
fetch_opts: &DbFetchOptions,
args: &PhysicalDisksArgs,
) -> Result<(), anyhow::Error> {
let limit = fetch_opts.fetch_limit;
let filter = match args.filter {
Some(filter) => filter,
None => {
eprintln!(
"note: listing all in-service disks \
(use -F to filter, e.g. -F in-service)"
);
DiskFilter::InService
}
};

let sleds = datastore
.physical_disk_list(&opctx, &first_page(limit), filter)
.await
.context("listing physical disks")?;
check_limit(&sleds, limit, || String::from("listing physical disks"));

let rows = sleds.into_iter().map(|s| PhysicalDiskRow::from(s));
let table = tabled::Table::new(rows)
.with(tabled::settings::Style::empty())
.with(tabled::settings::Padding::new(1, 1, 0, 0))
.to_string();

println!("{}", table);

Ok(())
}

// SERVICES

// Snapshots
Expand Down Expand Up @@ -3187,7 +3271,7 @@ async fn cmd_db_inventory_cabooses(
async fn cmd_db_inventory_physical_disks(
conn: &DataStoreConnection<'_>,
limit: NonZeroU32,
args: PhysicalDisksArgs,
args: InvPhysicalDisksArgs,
) -> Result<(), anyhow::Error> {
#[derive(Tabled)]
#[tabled(rename_all = "SCREAMING_SNAKE_CASE")]
Expand Down
159 changes: 159 additions & 0 deletions dev-tools/omdb/src/bin/omdb/nexus.rs
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,7 @@ use nexus_client::types::BackgroundTask;
use nexus_client::types::BackgroundTasksActivateRequest;
use nexus_client::types::CurrentStatus;
use nexus_client::types::LastResult;
use nexus_client::types::PhysicalDiskPath;
use nexus_client::types::SledSelector;
use nexus_client::types::UninitializedSledId;
use nexus_db_queries::db::lookup::LookupPath;
Expand All @@ -33,6 +34,7 @@ use nexus_types::internal_api::background::RegionReplacementDriverStatus;
use nexus_types::inventory::BaseboardId;
use omicron_uuid_kinds::CollectionUuid;
use omicron_uuid_kinds::GenericUuid;
use omicron_uuid_kinds::PhysicalDiskUuid;
use omicron_uuid_kinds::SledUuid;
use reedline::DefaultPrompt;
use reedline::DefaultPromptSegment;
Expand Down Expand Up @@ -256,6 +258,8 @@ enum SledsCommands {
Add(SledAddArgs),
/// Expunge a sled (DANGEROUS)
Expunge(SledExpungeArgs),
/// Expunge a disk (DANGEROUS)
ExpungeDisk(DiskExpungeArgs),
}

#[derive(Debug, Args)]
Expand All @@ -277,6 +281,17 @@ struct SledExpungeArgs {
sled_id: SledUuid,
}

#[derive(Debug, Args)]
struct DiskExpungeArgs {
// expunge is _extremely_ dangerous, so we also require a database
// connection to perform some safety checks
#[clap(flatten)]
db_url_opts: DbUrlOptions,

/// Physical disk ID
physical_disk_id: PhysicalDiskUuid,
}

impl NexusArgs {
/// Run a `omdb nexus` subcommand.
pub(crate) async fn run_cmd(
Expand Down Expand Up @@ -401,6 +416,13 @@ impl NexusArgs {
let token = omdb.check_allow_destructive()?;
cmd_nexus_sled_expunge(&client, args, omdb, log, token).await
}
NexusCommands::Sleds(SledsArgs {
command: SledsCommands::ExpungeDisk(args),
}) => {
let token = omdb.check_allow_destructive()?;
cmd_nexus_sled_expunge_disk(&client, args, omdb, log, token)
.await
}
}
}
}
Expand Down Expand Up @@ -1569,3 +1591,140 @@ async fn cmd_nexus_sled_expunge(
);
Ok(())
}

/// Runs `omdb nexus sleds expunge-disk`
async fn cmd_nexus_sled_expunge_disk(
client: &nexus_client::Client,
args: &DiskExpungeArgs,
omdb: &Omdb,
log: &slog::Logger,
_destruction_token: DestructiveOperationToken,
) -> Result<(), anyhow::Error> {
use nexus_db_queries::context::OpContext;

let datastore = args.db_url_opts.connect(omdb, log).await?;
let opctx = OpContext::for_tests(log.clone(), datastore.clone());
let opctx = &opctx;

// First, we need to look up the disk so we can lookup identity information.
let (_authz_physical_disk, physical_disk) =
LookupPath::new(opctx, &datastore)
.physical_disk(args.physical_disk_id.into_untyped_uuid())
.fetch()
.await
.with_context(|| {
format!(
"failed to find physical disk {}",
args.physical_disk_id
)
})?;

// Helper to get confirmation messages from the user.
let mut line_editor = Reedline::create();
let mut read_with_prompt = move |message: &str| {
smklein marked this conversation as resolved.
Show resolved Hide resolved
let prompt = DefaultPrompt::new(
DefaultPromptSegment::Basic(message.to_string()),
DefaultPromptSegment::Empty,
);
if let Ok(reedline::Signal::Success(input)) =
line_editor.read_line(&prompt)
{
Ok(input)
} else {
bail!("expungement aborted")
}
};

// Now check whether its sled-agent was found in the most recent
// inventory collection.
match datastore
.inventory_get_latest_collection(opctx)
.await
.context("loading latest collection")?
{
Some(collection) => {
let disk_identity = omicron_common::disk::DiskIdentity {
vendor: physical_disk.vendor.clone(),
serial: physical_disk.serial.clone(),
model: physical_disk.model.clone(),
};

let mut sleds_containing_disk = vec![];

for (sled_id, sled_agent) in collection.sled_agents {
for sled_disk in sled_agent.disks {
if sled_disk.identity == disk_identity {
sleds_containing_disk.push(sled_id);
}
}
}

match sleds_containing_disk.len() {
0 => {}
1 => {
eprintln!(
"WARNING: physical disk {} is PRESENT in the most \
recent inventory collection (spotted at {}). It is \
dangerous to expunge a disk that is still running, and \
safer to expunge a disk from a system where it has been \
removed. Are you sure you want to proceed anyway?",
args.physical_disk_id, collection.time_done,
);
let confirm = read_with_prompt("y/N")?;
if confirm != "y" {
eprintln!("expungement not confirmed: aborting");
return Ok(());
}
}
_ => {
// This should be impossible due to a unique database index,
// "vendor_serial_model_unique".
//
// Even if someone tried moving a disk, it would need to be
// decommissioned before being re-commissioned elsewhere.
//
// However, we still print out an error message here in the
// (unlikely) even that it happens anyway.
eprintln!(
"ERROR: physical disk {} is PRESENT MULTIPLE TIMES in \
the most recent inventory collection (spotted at {}).
This should not be possible, and is an indication of a \
database issue.",
args.physical_disk_id, collection.time_done,
);
bail!("Physical Disk appeared on multiple sleds");
}
}
}
None => {
eprintln!(
"ERROR: cannot verify that the physical disk inventory status \
because there are no inventory collections present. Please \
make sure that the physical disk has been physically removed, \
or ensure that inventory may be collected."
smklein marked this conversation as resolved.
Show resolved Hide resolved
);
bail!("No inventory");
}
}

eprintln!(
"WARNING: This operation will PERMANENTLY and IRRECOVABLY mark physical disk \
{} ({}) expunged. To proceed, type the physical disk's serial number.",
args.physical_disk_id,
physical_disk.serial,
);
let confirm = read_with_prompt("disk serial number")?;
if confirm != physical_disk.serial {
eprintln!("disk serial number not confirmed: aborting");
return Ok(());
}

client
.physical_disk_expunge(&PhysicalDiskPath {
disk_id: args.physical_disk_id.into_untyped_uuid(),
})
.await
.context("expunging disk")?;
eprintln!("expunged disk {}", args.physical_disk_id);
Ok(())
}
Loading
Loading