Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update Reboot proto #99

Open
wants to merge 2 commits into
base: main
Choose a base branch
from
Open

Update Reboot proto #99

wants to merge 2 commits into from

Conversation

xw-g
Copy link
Contributor

@xw-g xw-g commented Sep 30, 2022

Propose to:

  • Clarify the behavior when the list of subcomponents is not set.
  • Because RebootStatusRequest supports a list of subcomponents, the RebootStatusResponse should also have a way to identify status per subcomponent.

Another option is to deprecate the list of subcomponents in all reboot related proto, and only allow the request to target a single subcomponent. But then it raises more questions, like:

  • Can the single subcomponent be an optional field?
  • If it can be an optional field, what the default behavior if it's not set?
  • If the default behavior is to apply to the whole device (including all subcomponents), then any status reports should also be per component anyway.

@xw-g
Copy link
Contributor Author

xw-g commented Sep 30, 2022

@marcushines to review.

@coveralls
Copy link

coveralls commented Sep 30, 2022

Pull Request Test Coverage Report for Build 3160674502

  • 0 of 0 changed or added relevant lines in 0 files are covered.
  • No unchanged relevant lines lost coverage.
  • Overall coverage remained the same at 0.0%

Totals Coverage Status
Change from base Build 3093533408: 0.0%
Covered Lines: 0
Relevant Lines: 0

💛 - Coveralls

@xw-g
Copy link
Contributor Author

xw-g commented Sep 30, 2022

string reason = 4; // Reason for reboot.
uint32 count = 5; // Number of reboots since active.
}
repeated Status statuses = 6;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#not sure the logics correctly - if send a statusrequest that is empty what is the expected return? a statuses with a len of 1?

what if there is no reboot in process?
does the system return a single status message saying no reboot or empty statuses

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should the Status have the path that is been queried as part of the response?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • ah, good catch. I copied and pasted the fields too fast and forgot to add the subcomponent paths. I just added it.

  • Reading the existing fields (e.g. count, active), seems like the original idea is to return reboot status (i.e. active=false) even no active reboot or planned reboot on the sub-component? The original idea makes sense to me. RebootStatusRequest reports all reboots activities which covers the history and planning. Therefore, to align with the original idea I think we can clarify here saying that if sub-component not set, return status on all sub-components. Thoughts?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@xw-g The last entry of Reboot Status is covered in the openconfig-platform: last-reboot-reason.

    leaf last-reboot-reason {
      type identityref {
        base oc-platform-types:COMPONENT_REBOOT_REASON;
      }
      description
        "This reports the reason of the last reboot of the component.";
    }

Should the RebootStatusRequest be applicable to only current/planned reboot requests?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i would return error here if unset - as the total number of components may be large

@@ -108,7 +108,8 @@ message RebootRequest {
uint64 delay = 2;
// Informational reason for the reboot.
string message = 3;
// Optional sub-components to reboot.
// Optional, sub-components to reboot.
// If it's not set, reboot the whole network device.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm i feel like we should require the component to reboot since rebooting a chassis as the default seems like a rather large hammer to "default" to

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR does require the user/client to be careful with the big hammer.

I had the similar concern, but I also debated with myself that if the field is required, then we probably want to have a very standard way (across all the vendors) to say this Foo component/sub-component means the whole device for all vendors? Is it safe to use /components/component/name (identity=CHASSIS) here? @dplore

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should the subcomponent be renamed to component? What is the logic behind naming it a subcomponent? In openconfig-platform we have component -> subcomponent relationship. If we assume that reboot works on paths defined in openconfig-platform, then maybe using a subcomponent name is not entirely logical.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes component makes sense - and if not set return not found or invalid parameter

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

then I suppose to reboot the chassis users should use the /components/component[name=chassis] or something similar to indicate the the control plane modules should reboot?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should require a single component. If the component is not specified, there should be an error.

Also agreed that we should explicitly clarify that reboot of /components/component[name=chassis] indicates a full system reboot.

Comment on lines +163 to +164
types.Path subcomponent =
1; // sub-component that the reboot status is for.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unnecessary newline?

@hellt
Copy link
Contributor

hellt commented Oct 29, 2022

Hi all,
Since this PR is targeting the Reboot RPC, I would like to clarify what is the common thinking for the RebootResponse when you reboot the active CPM or the whole chassis.

If gNOI server runs on CPM_A, then what is expected of a system when you reboot this CPM or the whole chassis? Should the response be sent back before the component goes for a reboot? or should it just go for a reboot which results in a TCP session to break?

/cc @marcushines

@rgwilton
Copy link

If the concern is about accidentally rebooting a chassis, as opposed to a linecard, then perhaps the simplest thing to do would be to split these into two separate reboot RPC operations?

E.g., RebootEntireDevice and RebootSubComponent?

I'm also not sure that I see a great reason to allow multiple components to be rebooted in a single RPC, generally rebooting stuff should be pretty rare and not performance critical.

Comment on lines +111 to +112
// Optional, sub-components to reboot.
// If it's not set, reboot the whole network device.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// Optional, sub-components to reboot.
// If it's not set, reboot the whole network device.
// This string must contain the /components/component/name to reboot
// If not set, INVALID_ARGUMENT should be returned.
// If the request is accepted by the device, a gRPC OK should be returned.
// Reboot of /components/component[name=chassis] indicates a full system reboot.
// In the case of a full device reboot, a gRPC OK should still be returned and the gRPC
// session closed. This indicates to the client that the request was accepted and the
// reboot will proceed.
//

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dplore

  // In the case of a full device reboot without delay, a gRPC OK should still be returned and the gRPC 
  // session closed.  This indicates to the client that the request was accepted and the 
  // reboot will proceed. With delay, gRPC OK will be returned

uint32 count = 5 [deprecated = true]; // Number of reboots since active.

message Status {
types.Path subcomponent =
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we use component instead of sub-component? A flat list of components should do the job. Any use cases for hierarchy of components can be determined via gNMI.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Component should be mandatory. The component tree is very large on most devices.

@dplore dplore linked an issue Apr 21, 2023 that may be closed by this pull request
@xw-g
Copy link
Contributor Author

xw-g commented Apr 21, 2023

Internal bug b/245550570.

@LimeHat
Copy link

LimeHat commented Feb 21, 2024

@dplore I'm curious, is there a plan to move this PR forward? Been stale for almost a year.

@dplore
Copy link
Member

dplore commented Feb 22, 2024

It's been pushed out for some time. I have queued it in my sprint plan, but currently is about 3 weeks out for review/refactoring.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

RebootStatus RPC ambiguity
8 participants