Skip to content

Commit

Permalink
Report events
Browse files Browse the repository at this point in the history
  • Loading branch information
xing-yang committed Jan 27, 2020
1 parent 4392202 commit e001825
Showing 1 changed file with 80 additions and 80 deletions.
160 changes: 80 additions & 80 deletions keps/sig-storage/20190530-pv-health-monitor.md
Original file line number Diff line number Diff line change
Expand Up @@ -31,7 +31,6 @@ status: implementable
- [Use Cases](#use-cases)
- [Proposal](#proposal)
- [Implementation](#implementation)
- [API changes](#api-changes)
- [CSI changes](#csi-changes)
- [Add GetVolume RPC](#add-getvolume-rpc)
- [Add Node Volume Health Function](#add-node-volume-health-function)
Expand All @@ -43,6 +42,7 @@ status: implementable
- [Alternatives](#alternatives)
- [Alternative option 1](#alternative-option-1)
- [Alternative option 2](#alternative-option-2)
- [Alternative option 3](#alternative-option-3)
- [Optional HTTP(RPC) service](#optional-httprpc-service)
- [Graduation Criteria](#graduation-criteria)
- [Alpha -> Beta](#alpha---beta)
Expand Down Expand Up @@ -78,11 +78,11 @@ If the volume is mounted on a pod and used by an application, the following prob
* Filesystem may be out of capacity.
* Volume may be unmounted by accident outside of Kubernetes.

If the CSI driver has implemented the CSI volume health function proposed in this design document, Kubernetes could communicate with the CSI driver to retrieve any errors detected by the underlying storage system. Kubernetes can store this data on the associated PVC so that user can inquire this information and decide how to handle them. For example, if the volume is out of capacity, user can request a volume expansion to get more space. Kubernetes may also use volume health information stored on the PVC to automatically reconcile.
If the CSI driver has implemented the CSI volume health function proposed in this design document, Kubernetes could communicate with the CSI driver to retrieve any errors detected by the underlying storage system. Kubernetes can report an event and log an error about this PVC so that user can inquire this information and decide how to handle them. For example, if the volume is out of capacity, user can request a volume expansion to get more space. In the first phase, volume health monitoring is informational only as it is only reported in the events and logs. In the future, we will also look into how Kubernetes may use volume health information to automatically reconcile.

There could be conditions that cannot be reported by a CSI driver. There could be network failure where Kubernetes may not be able to get response from the CSI driver. In this case, a call to the CSI driver may time out. There could be network congestion which causes slow response. One or more nodes where the volume is attached to may be down. This can be monitored and detected by the volume health controller so that user knows what has happened.

The Kubernetes components that may modify the PVC status with volume health information include the following:
The Kubernetes components that monitor the volumes and report events with volume health information include the following:

* An external monitoring agent on each kubernetes worker node.
* An external monitoring controller on the master node.
Expand All @@ -93,11 +93,11 @@ Details will be described in the following proposal section.

Volume monitoring is the main focus of this proposal. Reactions are not in the scope of this proposal.

- If a volume provisioned by the CSI driver is deleted, we need to mark the corresponding PVC to inform users.
- If a volume provisioned by the CSI driver is deleted, we need to report an event and log a message to inform users.

- If mounting error occurs, we need to mark the PVC.
- If mounting error occurs, we need to report an event and log a message.

- If other errors occur, we also need to mark the PVC.
- If other errors occur, we also need to report an event and log a message.

The main architecture is shown below:

Expand All @@ -111,86 +111,19 @@ The following areas will be the focus of this proposal at first:

Three main parts are involved here in the architecture.

- API changes: The Status field in the PVC will be used to mark volumes if they are unhealthy.

- External Controller:
- The external controller will be deployed as a sidecar together with the CSI controller driver, similar to how the external-provisioner sidecar is deployed.
- Trigger controller RPC to check the health condition of the CSI volumes.
- The external controller sidecar will also watch for node failure events.

- External Agent:
- The external agent will be deployed as a sidecar together with the CSI node driver on every Kubernetes worker node.
- Trigger node RPC to check PVs’ mounting conditions.
- Trigger node RPC to check volume's mounting conditions.
- Note that currently we do not have CSI support for local storage. As a workaround, we may check the local volumes directly by the agent at first, and then move the checks to RPC interfaces when it is ready.


## Implementation

### API changes

The Status field in the PVC will be used to mark volumes if they are unhealthy.

```
// PersistentVolumeClaimHealthConditionType defines the health condition of PV claim.
// Valid values are "HealthFailure", "HealthWarning", "HealthUnknown".
type PersistentVolumeClaimHealthConditionType string
// These are valid health conditions of PVC
const (
// PersistentVolumeClaimHealthFailure - Volume health failure indicates a severe problem that makes the volume unusable
PersistentVolumeClaimHealthFailure PersistentVolumeClaimHealthConditionType = "HealthFailure"
// PersistentVolumeClaimHealthWarning - Volume health warning indicates there is a problem but volume is still usable
PersistentVolumeClaimHealthWarning PersistentVolumeClaimHealthConditionType = "HealthWarning"
// PersistentVolumeClaimHealthUnknown - Volume health unknown indicates the health condition of the volume is unknown
PersistentVolumeClaimHealthUnknown PersistentVolumeClaimHealthConditionType = "HealthUnknown"
)
// PersistentVolumeClaimHealthCondition represents the current health condition of PV claim
type PersistentVolumeClaimHealthCondition struct {
Type PersistentVolumeClaimHealthConditionType
Status ConditionStatus
ErrorCode string
// +optional
LastProbeTime metav1.Time
// +optional
LastTransitionTime metav1.Time
// +optional
Reason string
// +optional
Message string
}
// PersistentVolumeClaimStatus represents the status of PV claim
type PersistentVolumeClaimStatus struct {
// Phase represents the current phase of PersistentVolumeClaim
// +optional
Phase PersistentVolumeClaimPhase
// AccessModes contains all ways the volume backing the PVC can be mounted
// +optional
AccessModes []PersistentVolumeAccessMode
// Represents the actual resources of the underlying volume
// +optional
Capacity ResourceList
// +optional
Conditions []PersistentVolumeClaimCondition
// +optional
HealthConditions []PersistentVolumeClaimHealthCondition
}
// ConditionStatus defines conditions of resources
type ConditionStatus string
// These are valid condition statuses. "ConditionTrue" means a resource is in the condition;
// "ConditionFalse" means a resource is not in the condition; "ConditionUnknown" means kubernetes
// can't decide if a resource is in the condition or not. In the future, we could add other
// intermediate conditions, e.g. ConditionDegraded.
const (
ConditionTrue ConditionStatus = "True"
ConditionFalse ConditionStatus = "False"
ConditionUnknown ConditionStatus = "Unknown"
)
```

### CSI changes

Container Storage Interface (CSI) specification will be modified to provide volume health check leveraging existing RPCs and adding new ones.
Expand Down Expand Up @@ -449,27 +382,94 @@ message NodeServiceCapability {
### External controller

#### CSI interface
Call GetVolume() RPC for volumes periodically to check the health condition of volumes themselves. To avoid stale information being stored on a PVC, each periodic update will mark the PVC with the latest health information, replacing previous health information added by the external controller. If the PVC becomes healthy after being marked as unhealthy previously, the controller should remove the previous information.

As mentioned earlier, reaction is not in the scope of this proposal but will be considered in the future. Before reacting to any negative health condition, the controller responsible for the reaction should call GetVolume() again to ensure the information is update to date.
Call GetVolume() RPC for volumes periodically to check the health condition of volumes themselves.

#### Node failure event
Watch node failure events.
In the case of a node failure, the controller will mark all local PVCs on that node.
In the case of a node failure, the controller will report an event for all PVCs on that node.
For network storage in the case of a node failure, the controller will log an event.

### External agent

#### CSI interface
Call NodeGetVolumeStats() RPC to check the mounting and other health conditions.
To avoid stale information being stored on a PVC, each periodic update will mark the PVC with the latest health information, replacing previous health information added by the external agent. If the PVC becomes healthy after being marked as unhealthy previously, it should remove the information added previously.

Call both GetVolume() and NodeGetVolumeStats() RPCs for local storage when local storage CSI support is enabled.
For now, check local volumes directly by the agent.

### Alternatives

#### Alternative option 1

The Status field in the PVC will be used to mark volumes if they are unhealthy. The external monitoring controller sidecar will be responsible of monitoring the volumes and updating the PVC status field when needed.

```
// PersistentVolumeClaimHealthConditionType defines the health condition of PV claim.
// Valid values are "HealthFailure", "HealthWarning", "HealthUnknown".
type PersistentVolumeClaimHealthConditionType string
// These are valid health conditions of PVC
const (
// PersistentVolumeClaimHealthFailure - Volume health failure indicates a severe problem that makes the volume unusable
PersistentVolumeClaimHealthFailure PersistentVolumeClaimHealthConditionType = "HealthFailure"
// PersistentVolumeClaimHealthWarning - Volume health warning indicates there is a problem but volume is still usable
PersistentVolumeClaimHealthWarning PersistentVolumeClaimHealthConditionType = "HealthWarning"
// PersistentVolumeClaimHealthUnknown - Volume health unknown indicates the health condition of the volume is unknown
PersistentVolumeClaimHealthUnknown PersistentVolumeClaimHealthConditionType = "HealthUnknown"
)
// PersistentVolumeClaimHealthCondition represents the current health condition of PV claim
type PersistentVolumeClaimHealthCondition struct {
Type PersistentVolumeClaimHealthConditionType
Status ConditionStatus
ErrorCode string
// +optional
LastProbeTime metav1.Time
// +optional
LastTransitionTime metav1.Time
// +optional
Reason string
// +optional
Message string
}
// PersistentVolumeClaimStatus represents the status of PV claim
type PersistentVolumeClaimStatus struct {
// Phase represents the current phase of PersistentVolumeClaim
// +optional
Phase PersistentVolumeClaimPhase
// AccessModes contains all ways the volume backing the PVC can be mounted
// +optional
AccessModes []PersistentVolumeAccessMode
// Represents the actual resources of the underlying volume
// +optional
Capacity ResourceList
// +optional
Conditions []PersistentVolumeClaimCondition
// +optional
HealthConditions []PersistentVolumeClaimHealthCondition
}
// ConditionStatus defines conditions of resources
type ConditionStatus string
// These are valid condition statuses. "ConditionTrue" means a resource is in the condition;
// "ConditionFalse" means a resource is not in the condition; "ConditionUnknown" means kubernetes
// can't decide if a resource is in the condition or not. In the future, we could add other
// intermediate conditions, e.g. ConditionDegraded.
const (
ConditionTrue ConditionStatus = "True"
ConditionFalse ConditionStatus = "False"
ConditionUnknown ConditionStatus = "Unknown"
)
```

To avoid stale information being stored on a PVC, each periodic update will mark the PVC with the latest health information, replacing previous health information added by the external controller. If the PVC becomes healthy after being marked as unhealthy previously, the controller should remove the previous information.

As mentioned earlier, reaction is not in the scope of this proposal but will be considered in the future. Before reacting to any negative health condition, the controller responsible for the reaction should call GetVolume() again to ensure the information is update to date.

#### Alternative option 2

If the agent on the node cannot be used to modify the PVC status and the monitoring logic cannot be added to Kubelet directly, we can introduce a CRD to represent the volume health. This volume health CRD is in the same namespace as the PVC that it is monitoring. It contains the PVC name and health conditions as defined in the main option. It needs to have a one on one mapping with the PVC. In the PVC status, there will be a `volumeHealthName` field pointing back to the volume health CRD.

Both the controller and the agent can create the volume health CRD for a PVC if it does not exist yet. Only one volume health CRD should be created for a PVC. Only the controller can set the `volumeHealthName` field in the PVC status.
Expand Down Expand Up @@ -512,7 +512,7 @@ type PersistentVolumeClaimStatus struct {
}
```

#### Alternative option 2
#### Alternative option 3

We can also reuse the PV Taints and introduce a new Taint called PVUnhealthMessage for PV health condition whose key is specific (PVUnhealthMessage) and value can be set differently.

Expand Down

0 comments on commit e001825

Please sign in to comment.