Skip to content

blippy: Blueprint Checker Proposal #6987

Open
@smklein

Description

@smklein

See: #6973 for some background context, as well as the ad-hoc meeting from 11/4 with @jgallagher , @davepacheco , @andrewjstone on this topic.

Background

  • The planner and reconfigurator-cli both use the blueprint builder to construct blueprints.
  • The planner would be used by Nexus, and likely has a more conservative bias towards constructing valid blueprints.
  • The reconfigurator-cli acts as something of a "system override", and wants to construct blueprints that are "valid enough", but which may deviate from the constructions that the planner might create.
  • Defining what abnormalities are valid / not valid is somewhat subtle. For example:
    • Assigning the same underlay address to distinct services is probably always invalid. This could be categorized as a hard error.
    • Deploying multiple services which have incompatible versions is invalid, but should it be prohibited from ever being constructed by the reconfigurator-cli?
    • Deploying a blueprint with "no Nexuses" - this could be viewed as a deviation from policy, but on production systems, it'll create an inoperable system. How are we categorizing the validity of a blueprint with this configuration?

Categorizing Validity

It will be important for us to define some of these error cases - aka, what are deviations from an "okay" blueprint, and what's acceptable - as we define:

  • What is valid for the blueprint builder API to produce?
  • What is valid for the reconfigurator-cli to emit?

We've discussed using at least the following categories, though there may be more:

  • Blueprint OK, matches policy: The blueprint is valid, and we cannot find any ways in which it deviates from the policy the planner would use.
  • Blueprint OK, but deviates from policy: The blueprint could be deployed, but does not match our policy. For example: If our policy is to deploy three nexus zones, a blueprint in this category might be attempting to deploy "two" or "four" Nexus zones.
  • Blueprint Erroneous: There are many flavors here, but this category includes:
    • The blueprint cannot be deployed (we know ahead of time that a sled agent could or should reject it)
    • The blueprint would render the system inoperable (e.g. delete all Nexus zones)
    • The blueprint contains an internal inconsistency (data modified without changing generation number, etc)

Identifying Validity

This issue proposes a blueprint checker (perhaps called blippy) which can inspect a blueprint and identify "how valid" the blueprint appears, with categorization of how far the blueprint deviates from the norm.

We could use blippy in the following spots:

  • As a standalone tool for inspecting blueprint
  • As a part of the blueprint builder, to help the planner validate it has not created a "known erroneous" blueprint
  • As a part of the reconfigurator-cli, to help users identify that their changes only deviate from a policy, and are not a violation of correctness guarantees (or, perhaps, we let people do this anyway, but with many warnings)

Metadata

Metadata

Assignees

Labels

Update SystemReplacing old bits with newer, cooler bitsdevelopmentBugs, paper cuts, feature requests, or other thoughts on making omicron development better

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions