Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add config to skip DHCP boot to gnoi.FactoryReset. #204

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

daymare
Copy link

@daymare daymare commented Jun 26, 2024

For Bootz testing we want an interface to skip DHCP boot to be able to write smaller integration tests. One test should DHCP boot up to the point of reaching out to the Bootz server. Another test should skip DHCP boot and test the remaining bootz protocol to get a functioning device.

Copy link

google-cla bot commented Jun 26, 2024

Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

View this failed invocation of the CLA check for more information.

For the most up to date status, view the checks section at the bottom of the pull request.

@LimeHat
Copy link

LimeHat commented Jun 26, 2024

Does this really belong in FactoryReset?

IMO a separate service would be more appropriate, something like BootParameters.proto

@daymare
Copy link
Author

daymare commented Jun 26, 2024

@mhines01: as SME on Bootz
@melzhan: FYI
@sam-homa: FYI

@robshakir please review.

@daymare
Copy link
Author

daymare commented Jun 26, 2024

Does this really belong in FactoryReset?

IMO a separate service would be more appropriate, something like BootParameters.proto

That would mean we would have two separate services for factory reset and engaging bootz? One with and one without DHCP boot.

@LimeHat
Copy link

LimeHat commented Jun 26, 2024

That would mean we would have two separate services for factory reset and engaging bootz?

I'd say that would be a separate service to provision this "pseudo-dhcp" configuration of bootz to the device (that will be active during the next bootz cycle). Bootz trigger doesn't have to change (you can still call the FactoryReset to initiate the process).

In general, bootz and factoryReset are independent from each other. In your specific use case, factoryReset is typically a trigger to initiate bootz (if the device was shipped from a factory in bootz-enabled mode or was enabled to use bootz by any other means), but that's only a specific use case.
There are devices that do not support bootz or rely on other ZTP protocols, or don't do any ZTP at all, and still implement the FactoryReset. Overloading the FactoryReset service with parameters that are not directly related to the reset is suboptimal in my view.

Fundamentally this is about skipping DHCP Boot not Bootz in particular. The SZTP RFC (And by extension the Bootz spec) specifies a "bootstrap-server-list" in the boot option. I chose bootstrap_server_uris because it is more descriptive as to what is actually supposed to go in the variable.
@daymare
Copy link
Author

daymare commented Jul 2, 2024

I see a couple of options here:

  1. Keep as is.
  2. Move to a separate RPC in factory_reset: something like FactoryReset.SkipDHCPBoot. So the user would call SkipDHCPBoot() to set the management config and then Start().
  3. Move to gnoi/bootconfig: something like BootConfig.SetManagementBoot.
  4. Make a new service as you mentioned.

I think it's important to recognize that fundamentally this is about skipping DHCP boot after factory reset. We are doing this to support Bootz testing today, but it's not specific to bootz (Any more than DHCP is specific to SZTP or Bootz) and in the future we may want to use this for other testing or other purposes beyond Bootz.

With that in mind:

  1. Keep as is: I think what we have is fine.
  2. Move to a separate RPC in factory_reset: Might be best. This way the RPC will fail if a device does not update to support the RPC and it is more explicit. But it's maybe a little more complex than it needs to be.
  3. Move to gnoi/bootconfig: I don't like this. The BootConfig service sounds right at first glance, but it is serving a very different purpose and we would have to redefine its purpose to move this there.
  4. Make a new service: I also don't like this. This is fundamentally tied to FactoryReset. We already have FactoryReset and BootConfig so it will be confusing to add an additional BootParameters service.

For now I am keeping it as is and just changed bootz_uris to boot_server_uris to make it generic.

@@ -47,6 +62,8 @@ message ResetError {
bool factory_os_unsupported = 1;
// Zero fill is not supported.
bool zero_fill_unsupported = 2;
// Management config is not supported.
bool mgmt_config_unsupported = 5;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know this has been flagged previously by Marcus, but I'll mention it again here in the interest of having the discussion more broadly. This field won't be a reliable indicator of what's going to happen.

Old devices that aren't aware of the mgmt_config field in the request also won't be aware of the mgmt_config_unsupported field in the response and therefore won't be able to set it to indicate their lack of awareness. Instead, they'll just silently ignore mgmt_config altogether. You won't be able to distinguish between an old device that ignored this part of your request and a new device that processed it like you requested.

Splitting the boot parameters from the FactoryReset service (like was proposed in another comment on the pull request) would make it easier to know what's going on. If you sent an old device something like SetBootstrapServerList in a BootParameters service, it would respond with an unimplemented error.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I covered this in my comment above...
Again, I don't want to make a new service and clutter the namespace when this is fundamentally tied to FactoryReset. I think adding a new RPC to factory reset would be a better way to solve this problem.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bump the version of the service. Clients then have the option to ensure the service implements the new message.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bump the version of the service. Clients then have the option to ensure the service implements the new message.

gnoi package names don't include versions, hence it will not be visible

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure if we are talking about the same thing, but you should be able to query service options via file descriptors via reflection if the server implementes it. Its convoluted, but its there.

extend google.protobuf.FileOptions {

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see what you mean. Indeed, I was thinking about a different thing (traditional grpc versioning, where the version is part of the namespace/package name). Thx for clarifying.

@shanealmeida
Copy link

For now I am keeping it as is and just changed bootz_uris to boot_server_uris to make it generic.

You can't make it generic because boot servers aren't generic. A device that only supports SZTP can't talk to a Bootz server and vice versa. You need to be able to say what the server is in addition to where the server is.

@daymare
Copy link
Author

daymare commented Jul 2, 2024

For now I am keeping it as is and just changed bootz_uris to boot_server_uris to make it generic.

You can't make it generic because boot servers aren't generic. A device that only supports SZTP can't talk to a Bootz server and vice versa. You need to be able to say what the server is in addition to where the server is.

The Bootz protocol specifically states the DHCP option is the same as SZTP (OPTION_V4_SZTP_REDIRECT or OPTION_V6_SZTP_REDIRECT). The difference is the URI format. So we specify what the server is by specifying bootz in the URI.
See part 1 of the API flow here: https://github.com/openconfig/bootz?tab=readme-ov-file#api-flow

@LimeHat
Copy link

LimeHat commented Jul 2, 2024

I think it's important to recognize that fundamentally this is about skipping DHCP boot after factory reset.

Yes, but fundamentally factory reset has nothing to do with how the device boots. gRPC-based services provide a clear way to demarcate this boundary and we should leverage that.

Make a new service: I also don't like this. This is fundamentally tied to FactoryReset. We already have FactoryReset and BootConfig so it will be confusing to add an additional BootParameters service.

I principally disagree. I don't see any tie to factoryreset, with the exception of the fact that it is used to initiate bootz in your environment.

@shanealmeida
Copy link

For now I am keeping it as is and just changed bootz_uris to boot_server_uris to make it generic.

You can't make it generic because boot servers aren't generic. A device that only supports SZTP can't talk to a Bootz server and vice versa. You need to be able to say what the server is in addition to where the server is.

The Bootz protocol specifically states the DHCP option is the same as SZTP (OPTION_V4_SZTP_REDIRECT or OPTION_V6_SZTP_REDIRECT). The difference is the URI format. So we specify what the server is by specifying bootz in the URI. See part 1 of the API flow here: https://github.com/openconfig/bootz?tab=readme-ov-file#api-flow

Just because Bootz and SZTP use URIs in similar ways doesn't mean boot_server_uris is somehow generic.

An SZTP device cannot talk to a Bootz server. It would have to parse the URI (e.g., bootz://example.com) and reject it as invalid (following section 8.3 of the SZTP RFC). Similarly, a Bootz device cannot talk to an SZTP server. It would have to parse the URI (e.g., https://example.com) and reject it (following a combination of the API flow section and the SZTP RFC).

What happens when a new boot protocol comes along that isn't SZTP but still uses https://example.com URIs? If the device supports SZTP and the new protocol, it would have to contact the server, try SZTP, fail, contact the server again, try the new protocol, and hopefully succeed.

I don't think it's a good idea to rely on the device to parse URIs to figure out what it's supposed to do, and I don't think it's a good idea to shoehorn Bootz and SZTP URIs into the same field just because they happen to look alike.

@LimeHat
Copy link

LimeHat commented Jul 3, 2024

To elaborate on my previous comment, we can ask ourselves a few questions to determine why there is no "fundamental tie."

  • can bootz be initiated without doing a factory reset? yes
  • can factory reset be used with devices that don't support bootz, or with disabled bootz? yes

The new message defines parameters that bootz should consume during the next boot cycle (if enabled). It configures bootz to behave in a certain, special way. That's not the job of the "reset" functionality/service.

@coveralls
Copy link

coveralls commented Jul 3, 2024

Pull Request Test Coverage Report for Build 9763749476

Details

  • 0 of 0 changed or added relevant lines in 0 files are covered.
  • No unchanged relevant lines lost coverage.
  • Overall coverage remained the same at 1.142%

Totals Coverage Status
Change from base Build 9747766369: 0.0%
Covered Lines: 166
Relevant Lines: 14537

💛 - Coveralls

Copy link
Member

@dplore dplore left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

instead of making this specific to DHCP, why not add a feature to simply push a config on top of / modifying the factory default config?

ManagementConfig mgmt_config = 4;
}

message ManagementConfig {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it make more sense to use https://github.com/openconfig/bootz/blob/main/proto/bootz.proto#L225-L236 here? The idea being:

  • factory reset the config
  • apply the attached config. (which might include disabling dhcp and anything else you want to override)
  • reboot

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem is that this is not a traditional config.

This is a special config, that should be consumed at the start of the bootz process (which is typically invoked before any startup/running-config is involved) and by the bootz daemon only (it has no meaning to the other system components).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The bootconfig message contains a parameter bootloader_config which is intended to contain items that are invoked/processed before any startup/running config is involved. If an implementation requires bootloader_config to contain some data to enable/disable the DHCP client used during bootz, that is fine. If the implementation uses OC config to do this, that is also fine.

If my comment is not on-track with your concerns, maybe you can share more details about the implementation so we can try and map it to bootz. Happy to do this offline with you if necessary and then we can follow up with some proposal here.

message BootConfig {
  // Proprietary key-value parameters that are required as part of boot
  // configuration (e.g., feature flags, or vendor-specific hardware knobs).
  google.protobuf.Struct metadata = 1;
  // Native format vendor configuration.
  bytes vendor_config = 2;
  // JSON rendered OC configuration.
  bytes oc_config = 3;
  // Bootloader key-value parameters that are required as part of boot
  // configuration.
  google.protobuf.Struct bootloader_config = 4;
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@LimeHat let me know what you think. Would like to keep this PR moving.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If an implementation requires bootloader_config to contain some data to enable/disable the DHCP client used during bootz, that is fine. If the implementation uses OC config to do this, that is also fine.

There's a bit more to this proposal than a simple enablement/disablement of DHCP.
The core part of the bootz protocol is a DHCP response. It includes not only information about IP config, but also the bootstrap URL(s).

This proposal, in essence, is a way to substitute a real DHCP response with a pre-defined static message received via gNOI (pseudo-DHCP).

There are different ways to store this data. It can be stored in bootloader parameters (as you indicated, but this will be more than a flag), in can be stored in persistent memory (e.g. as a file or another implementation-dependent data structure). Probably it can't be stored as OpenConfig data (for instance, I don't think you can specify bootstrap URLs; and if the author wants to preserve compatibility with FactoryReset, it cannot be stored as a regular bootstrap cfg which is supposed to be erased during the reset). As a user, you probably don't to be concerned with implementation-specific details in order to use this.

I also don't want to couple this data with BootConfig message, for a number of reasons, including

  • in my opinion, these functions are logically independent, and,
  • overloading this structure will require the user to know and submit unneeded data to simply enable this pseudo-dhcp thing (e.g. the bytes vendor_config field will have to be present),
  • the whole SetBootConfigRequest RPC is designed to replace the data that was received from a bootstrap server, and that data only.

Personally, I still think that a new service is the best approach. If you'd prefer to extend the existing BootConfig service, I would suggest to create a new RPC for this feature, that seems like a reasonable middle ground.

Feel free to ping me in chat if you want to discuss further offline.

@rgwilton
Copy link

rgwilton commented Aug 1, 2024

Is this proposed change specifically only for testing? I.e., are there any real deployment cases that would make use of these API parameters and functionality. I'm slightly nervous of a proposal to extend the public API if it is only intended to be used during testing scenarios.

@daymare
Copy link
Author

daymare commented Aug 29, 2024

As discussed in the Community meeting a few weeks ago we are going to create a test only space to put this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants