-
Notifications
You must be signed in to change notification settings - Fork 20
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Healer Operation is used both for healing the volume and checking for health of the volume, it is upto the CSI driver whether to implement the healer mechanism or just live with health checks. Signed-off-by: Prasanna Kumar Kalever <[email protected]>
- Loading branch information
Prasanna Kumar Kalever
committed
Feb 8, 2022
1 parent
32fa508
commit 96a9434
Showing
1 changed file
with
124 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,124 @@ | ||
# CSI-Addons Operation: Healer | ||
|
||
## Terminology | ||
|
||
| Term | Definition | | ||
| -------- | ------------------------------------------------------------------------------------- | | ||
| VolumeID | The identifier of the volume generated by the plugin. | | ||
| CO | Container Orchestration system that communicates with plugins using CSI service RPCs. | | ||
| SP | Storage Provider, the vendor of a CSI plugin implementation. | | ||
| RPC | [Remote Procedure Call](https://en.wikipedia.org/wiki/Remote_procedure_call). | | ||
|
||
## Objective | ||
|
||
Define a standard that will enable storage providers (SP) to | ||
perform node level volume health check and healing operations. | ||
|
||
### Goals in MVP | ||
|
||
The new extension will define a procedure that | ||
|
||
* can be called for existing volumes | ||
* interacts with the Node-Plugin to check the health condition of the volume | ||
* makes it possible for the SP to heal the volumes if they are in abnormal | ||
condition | ||
|
||
### Non-Goals in MVP | ||
|
||
* Implementation of healing logic is OPTIONAL and completely SP specific | ||
|
||
## Solution Overview | ||
|
||
This specification defines an interface along with the minimum operational and | ||
packaging recommendations for a storage provider (SP) to implement a | ||
health check and heal operations for volumes. The interface declares the | ||
RPCs that a plugin MUST expose. | ||
|
||
## RPC Interface | ||
|
||
* **Node Service**: The Node plugin MUST implement this RPC. | ||
|
||
```protobuf | ||
syntax = "proto3"; | ||
package healer; | ||
import "github.com/container-storage-interface/spec/lib/go/csi/csi.proto"; | ||
import "google/protobuf/descriptor.proto"; | ||
option go_package = "github.com/csi-addons/spec/lib/go/healer"; | ||
// HealerNode holds the RPC method for running heal operations on the | ||
// active (staged/published) volume. | ||
service HealerNode { | ||
// NodeHealer is a procedure that gets called on the CSI NodePlugin. | ||
rpc NodeHealer (NodeHealerRequest) | ||
returns (NodeHealerResponse) {} | ||
} | ||
``` | ||
|
||
### NodeHealer | ||
|
||
```protobuf | ||
// NodeHealerRequest contains the information needed to identify the | ||
// location where the volume is mounted so that local filesystem or | ||
// block-device operations to heal volume can be executed. | ||
message NodeHealerRequest { | ||
// The ID of the volume. This field is REQUIRED. | ||
string volume_id = 1; | ||
// The path on which volume is available. This field is REQUIRED. | ||
// This field overrides the general CSI size limit. | ||
// SP SHOULD support the maximum path length allowed by the operating | ||
// system/filesystem, but, at a minimum, SP MUST accept a max path | ||
// length of at least 128 bytes. | ||
string volume_path = 2; | ||
// The path where the volume is staged, if the plugin has the | ||
// STAGE_UNSTAGE_VOLUME capability, otherwise empty. | ||
// If not empty, it MUST be an absolute path in the root | ||
// filesystem of the process serving this request. | ||
// This field is OPTIONAL. | ||
// This field overrides the general CSI size limit. | ||
// SP SHOULD support the maximum path length allowed by the operating | ||
// system/filesystem, but, at a minimum, SP MUST accept a max path | ||
// length of at least 128 bytes. | ||
string staging_target_path = 3; | ||
// Volume capability describing how the CO intends to use this volume. | ||
// This allows SP to determine if volume is being used as a block | ||
// device or mounted file system. For example - if volume is being | ||
// used as a block device the SP MAY choose to skip calling filesystem | ||
// operations to healer. If volume_capability is omitted the SP MAY | ||
// determine access_type from given volume_path for the volume and | ||
// perform healing. This is an OPTIONAL field. | ||
csi.v1.VolumeCapability volume_capability = 4; | ||
// Secrets required by plugin to complete the healer operation. | ||
// This field is OPTIONAL. | ||
map<string, string> secrets = 5 [(csi.v1.csi_secret) = true]; | ||
} | ||
// NodeHealerResponse holds the information about the result of the | ||
// NodeHealerRequest call. | ||
message NodeHealerResponse { | ||
// Normal volumes are available for use and operating optimally. | ||
// An abnormal volume does not meet these criteria. | ||
// This field is REQUIRED. | ||
bool abnormal = 1; | ||
// The message describing the condition of the volume. | ||
// This field is REQUIRED. | ||
string message = 2; | ||
} | ||
``` | ||
|
||
#### NodeHealer Errors | ||
|
||
| Condition | gRPC Code | Description | Recovery Behavior | | ||
| ---------------------------- | ------------------ | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | ||
| Missing required field | 3 INVALID_ARGUMENT | Indicates that a required field is missing from the request. | Caller MUST fix the request by adding the missing required field before retrying. | | ||
| Volume does not exist | 5 NOT_FOUND | Indicates that a volume corresponding to the specified `volume_id` does not exist. | Caller MUST verify that the `volume_id` is correct and that the volume is accessible and has not been deleted before retrying with exponential back off. | | ||
| Call not implemented | 12 UNIMPLEMENTED | The invoked RPC is not implemented by the CSI-driver or disabled in the driver's current mode of operation. | Caller MUST NOT retry. | | ||
| Operation pending for volume | 10 ABORTED | Indicates that there is already an operation pending for the specified `volume_id`. In general the CSI-Addons CO plugin is responsible for ensuring that there is no more than one call "in-flight" per `volume_id` at a given time. However, in some circumstances, the CSI-Addons CO plugin MAY lose state (for example when the it crashes and restarts), and MAY issue multiple calls simultaneously for the same `volume_id`. The CSI-driver, SHOULD handle this as gracefully as possible, and MAY return this error code to reject secondary calls. | Caller SHOULD ensure that there are no other calls pending for the specified `volume_id`, and then retry with exponential back off. | | ||
| Not authenticated | 16 UNAUTHENTICATED | The invoked RPC does not carry secrets that are valid for authentication. | Caller SHALL either fix the secrets provided in the RPC, or otherwise regalvanize said secrets such that they will pass authentication by the Plugin for the attempted RPC, after which point the caller MAY retry the attempted RPC. | | ||
| Error is Unknown | 2 UNKNOWN | Indicates that a unknown error is generated | Caller MUST study the logs before retrying | |