Disaster Recovery Server

The Qubership Disaster Recovery Daemon (DRD) is a service that establishes communication between the Site Manager and the current cluster operator or disaster recovery controller.

DRD provides the following features:

Disaster Recovery Server implements Site Manager contract and manage current mode in DR resource.
Disaster Recovery Controller provides ability to implement DR controller for services without operator.

Example of DRD chart template is presented here.

Disaster Recovery Server

Common Information

DRD provides all REST endpoints to satisfy the Site Manager contract and takes data from Kubernetes Custom Resource, Kubernetes API. By default, the cluster operator manages this Custom Resource and contains service switchover logic. DRD just triggers the cluster operator via Custom Resource changes. DRD is delivered as a docker image and has a list of environment variables to configure it. DRD can be deployed in the Kubernetes as a separated pod or as a side container for the operator pod.

Environment variables

Name	Format	Description	Example	Required
NAMESPACE	A string.	The name of service namespace.	rabbitmq-service	`true`
RESOURCE_FOR_DR	Four words in a single string separated by a single space.	This parameter specifies four values to find Kubernetes Custom Resource. These values are group, version, resource, and name of Custom Resource. First word, group, can be empty `""`.	netcracker.com v2 rabbitmqservices rabbitmq-service	`true`
USE_DEFAULT_PATHS	A single boolean word.	If this parameter is `true` the default values are used instead of `DISASTER_RECOVERY_*` environment variable values.	true	`false`
DISASTER_RECOVERY_MODE_PATH	Several words separated by a dot.	This parameter specifies the path to disaster recovery `mode` field in Custom Resource.	spec.disasterRecovery.mode	`true` if `USE_DEFAULT_PATHS` variable is not set to `true`
DISASTER_RECOVERY_NOWAIT_PATH	Several words separated by a dot.	This parameter specifies the path to disaster recovery `no-wait` field in Custom Resource.	spec.disasterRecovery.noWait	`true` if `USE_DEFAULT_PATHS` variable is not set to `true`
DISASTER_RECOVERY_STATUS_MODE_PATH	Several words separated by a dot.	This parameter specifies the path to disaster recovery status `mode` field in Custom Resource.	status.disasterRecoveryStatus.mode	`true` if `USE_DEFAULT_PATHS` variable is not set to `true`
DISASTER_RECOVERY_STATUS_STATUS_PATH	Several words separated by a dot.	This parameter specifies the path to disaster recovery status `status` field in Custom Resource.	status.disasterRecoveryStatus.status	`true` if `USE_DEFAULT_PATHS` variable is not set to `true`
DISASTER_RECOVERY_STATUS_COMMENT_PATH	Several words separated by a dot.	This parameter specifies the path to disaster recovery status `comment` field in Custom Resource.	status.disasterRecoveryStatus.comment	`false`
DISASTER_RECOVERY_NOWAIT_AS_STRING	A single boolean word.	If this parameter is `true` the disaster recovery daemon uses string values for `no-wait` parameter, otherwise boolean value is used.	false	`false`
HEALTH_MAIN_SERVICES_ACTIVE	Several word pairs separated by commas. Each pair contains two words separated by a single space. The first word is a Kubernetes workload type and the second one is the workload name.	This parameter specifies the main services for the health check on active side.	deployment kafka-1,deployment kafka-2	`true`
HEALTH_ADDITIONAL_SERVICES_ACTIVE	Several word pairs separated by commas. Each pair contains two words separated by a single space. The first word is a Kubernetes workload type and the second one is the workload name.	This parameter specifies the additional services for the health check on active side.	deployment rabbitmq-backup-daemon	`false`
HEALTH_MAIN_SERVICES_STANDBY	Several word pairs separated by commas. Each pair contains two words separated by a single space. The first word is a Kubernetes workload type and the second one is the workload name.	This parameter specifies the main services for the health check on standby side. If the parameter is empty or is absent, the health status will be always `UP` on standby side.	deployment kafka-1,deployment kafka-2	`false`
HEALTH_ADDITIONAL_SERVICES_STANDBY	Several word pairs separated by commas. Each pair contains two words separated by a single space. The first word is a Kubernetes workload type and the second one is the workload name.	This parameter specifies the additional services for the health check on standby side.	deployment rabbitmq-backup-daemon	`false`
HEALTH_MAIN_SERVICES_DISABLED	Several word pairs separated by commas. Each pair contains two words separated by a single space. The first word is a Kubernetes workload type and the second one is the workload name.	This parameter specifies the main services for the health check on `disable` side. If the parameter is empty or is absent, the health status will be always `UP` on `disable` side.	deployment kafka-1,deployment kafka-2	`false`
HEALTH_ADDITIONAL_SERVICES_DISABLED	Several word pairs separated by commas. Each pair contains two words separated by a single space. The first word is a Kubernetes workload type and the second one is the workload name.	This parameter specifies the additional services for the health check on `disable` side.	deployment rabbitmq-backup-daemon	`false`
SITE_MANAGER_SERVICE_ACCOUNT_NAME	A single word.	This parameter specifies the Site Manager service account name.	site-manager	`false`
SITE_MANAGER_NAMESPACE	A single word.	This parameter specifies the Site Manager namespace.	site-manager	`false`
SITE_MANAGER_CUSTOM_AUDIENCE	A single word.	This parameter specifies the Site Manager custom audience applied for token during authntication.	sm-services	`false`
SERVER_PORT	A number.	This parameter specifies the DRD server port. The default value is `8068`.	8069	`false`
ADDITIONAL_HEALTH_ENDPOINT	A string.	This parameter specifies additional health endpoint. The endpoint response contains information about full cluster health state (if `EXTERNAL_FULL_HEALTH_ENABLED` is `true`) or additional cluster health state (if `EXTERNAL_FULL_HEALTH_ENABLED` is `false`). In the second case, the result will be calculate as `HEALTH_ADDITIONAL_SERVICES` variable.	http://(POD_IP):8069/healthz	`false`
EXTERNAL_FULL_HEALTH_ENABLED	A boolean string.	If this parameter is `true` the `ADDITIONAL_HEALTH_ENDPOINT` variable will be used as external full health endpoint. In this case all `HEALTH_*` environment variables are not necessary.	true	`false`
TLS_ENABLED	A boolean string.	If this parameter is `true` TLS will be enabled for DRD container.	false	`false`
CERTS_PATH	Path string.	This parameter specifies path to folder with TLS certificates in DRD container.	/tls/	`false`
CIPHER_SUITES	Comma-separated list of strings. Each word is suite name supported by GO e.g. `TLS_RSA_WITH_3DES_EDE_CBC_SHA`	This parameter specifies the list of cipher suites that are used to negotiate the security settings for a network connection using TLS or SSL network protocol	""	`false`
TREAT_STATUS_AS_FIELD	A boolean.	This parameter specifies whether resource status should be treated as field. It is necessary when initially `DISASTER_RECOVERY_STATUS_STATUS_PATH` does not have Status sub-resource. In that case status is set as a field to chosen resource. For example, it may be applicable for some of custom resources or ConfigMaps.	false	`false`

REST API

DRD REST server provides three methods of interaction:

GET healthz method allows finding out the state of the current cluster side.
```
curl -XGET localhost:8068/healthz
```
Where 8068 is the default server port.

The response to such a request is as follows:
```
{"status":"up"}
```
Where:
- status is the current state of the cluster side. The four possible status values are as follows:
  - up - All service's workloads are ready.
  - degraded - Some of the service's workloads (the main health service or additional health service) are not ready.
  - down - The main health service is down.
  - disabled - The service is switched off.
GET sitemanager method allows finding out the mode of the current cluster side and the actual state of the switchover procedure.
```
curl -XGET localhost:8068/sitemanager
```
Where 8068 is the default server port.

The response to such a request is as follows:
```
{"mode":"standby","status":"done"}
```
Where:
- mode is the mode in which the cluster side is deployed. The possible mode values are as follows:
  - active - The service accepts external requests from clients.
  - standby - The service does not accept external requests from clients.
  - disabled - The service does not accept external requests from clients.
- status is the current state of switchover for the service cluster side. The three possible status values are as follows:
  - running - The switchover is in progress.
  - done - The switchover is successful.
  - failed - Something went wrong during the switchover.
- comment is the message which contains a detailed description of the problem.
POST sitemanager method allows switching mode for the current side of the service cluster.
```
curl -XPOST -H "Content-Type: application/json" localhost:8068/sitemanager -d '{"mode":"<MODE>"}'
```
Where:
- Where 8068 is the default server port.
- <MODE> is the mode to be applied to the cluster side. The possible mode values are as follows:
  - active - The service accepts external requests from clients.
  - standby - The service does not accept external requests from clients.
  - disabled - The service does not accept external requests from clients.
The response to such a request is as follows:
```
{"mode":"standby"}
```
Where:
- mode is the mode that is applied to the cluster side. The possible values are active, standby, and disabled.
- status is the state of the request on the REST server. The only possible value is failed, when something goes wrong while processing the request.
- comment is the message which contains a detailed description of the problem and is only filled out if the status value is failed.

Authentication

All the DRD SM endpoints can be secured via Kubernetes JWT Service Account Tokens. A Site Manager Kubernetes token should be specified in the Request Header. Examples for DRD REST endpoints:

curl -XGET -H "Authorization: Bearer <TOKEN>" localhost:8068/healthz

curl -XGET -H "Authorization: Bearer <TOKEN>" localhost:8068/sitemanager

curl -XPOST -H "Content-Type: application/json, Authorization: Bearer <TOKEN>" localhost:8068/sitemanager -d '{"mode":"<MODE>"}'

Where TOKEN is a Site Manager Kubernetes token.

Authentication will be enabled only if both SITE_MANAGER_SERVICE_ACCOUNT_NAME and SITE_MANAGER_NAMESPACE environment variables are specified. If these environment variables are not specified, the authentication will be disabled.

If authentication is enabled and the SITE_MANAGER_CUSTOM_AUDIENCE environment variable is specified, then custom audience is applied to TokenReview request.

Example of Configurations

Custom Resource

Custom Resource with default paths:

apiVersion: qubership.org/v1
kind: MyService
metadata:
  name: example-service
  namespace: my-namespace
spec:
  disasterRecovery:
    mode: 'standby'
    noWait: false
status:
  disasterRecoveryStatus:
    comment: 'replication has finished successfully'
    mode: 'standby'
    status: 'done'

Environment Variables:

- name: NAMESPACE
  value: 'my-namespace'
- name: RESOURCE_FOR_DR
  value: 'qubership.org v1 myservices example-service'
- name: USE_DEFAULT_PATHS
  value: 'true'
- name: HEALTH_MAIN_SERVICES_ACTIVE
  value: 'StatefulSet example-service'

Config Map

Config Map:

kind: ConfigMap
apiVersion: v1
metadata:
  name: example-service-dr-config
  namespace: my-namespace
data:
  mode: 'standby'
  noWait: 'false'
  status_comment: 'replication has finished successfully'
  status_mode: 'standby'
  status_status: 'done'

Environment Variables:

- name: NAMESPACE
  value: 'my-namespace'
- name: RESOURCE_FOR_DR
  value: '"" v1 configmaps example-service-dr-config'
- name: USE_DEFAULT_PATHS
  value: 'false'
- name: DISASTER_RECOVERY_MODE_PATH
  value: 'data.mode'
- name: DISASTER_RECOVERY_NOWAIT_PATH
  value: 'data.noWait'
- name: DISASTER_RECOVERY_STATUS_MODE_PATH
  value: 'data.status_mode'
- name: DISASTER_RECOVERY_STATUS_STATUS_PATH
  value: 'data.status_status'
- name: DISASTER_RECOVERY_STATUS_COMMENT_PATH
  value: 'data.status_comment'
- name: DISASTER_RECOVERY_NOWAIT_AS_STRING
  value: 'true'
- name: HEALTH_MAIN_SERVICES_ACTIVE
  value: 'StatefulSet example-service'

Disaster Recovery Extension

Disaster Recovery Daemon provides an ability to implement controller for watching changes of Disaster Recovery resource for cases when service does not have its own operator.

DRD extension is a golang application which starts server and controller with function which implements custom DR logic and/or custom health check logic.

Extension Repository

Make a repository or folder for your golang application.
In go.mod add import github.com/Netcracker/qubership-disaster-recovery-daemon with actual version.
Implement Main.go which starts server and controller with function which implements custom DR logic and custom health check logic.
Build a Docker image with your golang application.
Add a new deployment or container for DRD application to your Helm chart, with corresponding environment variables.

Configuration

To start custom server or controller you need to provide configuration object with necessary parameters.

Configuration can be loaded from some kind of sources with implementing interface configuration loader config.ConfigLoader, by default DRD provides only environment variables configuration loader config.DefaultEnvConfigLoader which uses corresponding environment variables.

cfgLoader := config.GetDefaultEnvConfigLoader()
cfg, err := config.NewConfig(cfgLoader)

DR Server and Health

To create and start DR server you need created configuration:

server.NewServer(cfg).Run()

You can also specify custom health check function (by default DRD uses pods readiness probes to calculate health):

server.NewServer(cfg).WithHealthFunc(healthFunc, false).Run()

Contract for health check function is:

WithHealthFunc(healthFunc func(request entity.HealthRequest) (entity.HealthResponse, error), fullHealth bool)

entity.HealthRequest contains fields:

mode is a current DR mode for cluster side. Type: string. Values: active, standby or disabled). This is required field.

entity.HealthResponse contains fields:

status is a result of health check operation. Type: string. Values: up, down or degraded). This is required field.
comment is a comment of performing health check operation. Type: string.

fullHealth function argument means whether health overrides pod readiness health check (if fullHealth: true) or should be used as additional health status (if fullHealth: false). If fullHealth: false then the following rules are applied:

All pods ready: UP, additional health: UP -> UP
Some pods are not ready: DEGRADED, additional health: UP -> DEGRADED
All pods are down: DOWN, additional health: UP -> DOWN
All pods ready: UP, additional health: DOWN or DEGRADED -> DEGRADED

NOTE: The health check function is an optional feature, if no function is specified the default approach with HEALTH_MAIN_SERVICES_ACTIVE... is used.

DR Controller

To create and start controller you need created configuration and controller func:

controller.NewController(cfg).
        WithFunc(func).
        WithRetry(3, time.Second * 5).
        Run()

WithFunc takes DR controller function. DR function must be set for DR controller.

Contract for DR controller function is:

func(controllerRequest entity.ControllerRequest) (entity.ControllerResponse, error)

entity.ControllerRequest contains fields:

mode is a disaster recovery mode from resource. Type: string. Values: active, standby or disabled).
noWait is a flag meaning this is failover operation. Type: bool.
eventType is a type of resource event. Type: string. Values: ADDED, MODIFIED or DELETED).
object is an original DR resource object.

entity.ControllerResponse contains fields:

mode is a disaster recovery mode after performing DR operation. Type: string. Values: active, standby or disabled). This is required field.
status is a result of performing DR operation. Type: string. Values: done, running or failed). This is required field.
comment is a comment of performing DR operation. Type: string.

The result of operation execution will be saved to DR Resource.

WithRetry takes number of attempts and delay for retry policy. Controller runs retry only if error happens during function execution, if function returned failed status, no retry is called. If no retry parameters are specified controller calls function only one time.

Example

The below is an example of Main.go for custom resource Config Map presented above:

package main

import (
	"context"
	"github.com/Netcracker/qubership-disaster-recovery-daemon/api/entity"
	"github.com/Netcracker/qubership-disaster-recovery-daemon/client"
	"github.com/Netcracker/qubership-disaster-recovery-daemon/config"
	"github.com/Netcracker/qubership-disaster-recovery-daemon/controller"
	"github.com/Netcracker/qubership-disaster-recovery-daemon/server"
	"k8s.io/api/core/v1"
	metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"
	"k8s.io/apimachinery/pkg/runtime"
	"log"
)

func main() {
	// Make a config loader
	cfgLoader := config.GetDefaultEnvConfigLoader()

	// Build a config
	cfg, err := config.NewConfig(cfgLoader)
	if err != nil {
		log.Fatalln(err.Error())
	}

	// Easy way to create a kubernetes client if necessary
	kubeClient := client.MakeKubeClientSet()

	// Start DRD server with custom health function inside. This func calculates only additional health status (fullHealth: false)
	go server.NewServer(cfg).
		WithHealthFunc(func(request entity.HealthRequest) (entity.HealthResponse, error) {
			// Do some health check logic, e,g, using kubernetes client kubeClient.CoreV1()...
			return entity.HealthResponse{Status: entity.UP}, nil
		}, false).
		Run()

	// Start DRD controller with external DR function
	controller.NewController(cfg).
		WithFunc(drFunction).
		Run()
}

// DR function implementation
func drFunction(controllerRequest entity.ControllerRequest) (entity.ControllerResponse, error) {
	var configMap v1.ConfigMap
	// Convert unstructured object to expected type (ConfigMap in our case)
	err := runtime.DefaultUnstructuredConverter.FromUnstructured(controllerRequest.Object, &configMap)
	if err != nil {
		return entity.ControllerResponse{}, err
	}
	// Do some DR logic
	return entity.ControllerResponse{
		SwitchoverState: entity.SwitchoverState{
			Mode:    controllerRequest.Mode,
			Status:  entity.DONE,
			Comment: "switchover successfully done",
		},
	}, nil
}

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
.github		.github
_demo		_demo
api/entity		api/entity
client		client
cmd		cmd
config		config
controller		controller
docker		docker
internal		internal
pic		pic
pkg/httpserver		pkg/httpserver
server		server
utils		utils
.gitignore		.gitignore
CODE-OF-CONDUCT.md		CODE-OF-CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
SECURITY.md		SECURITY.md
go.mod		go.mod
go.sum		go.sum

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Disaster Recovery Server

Common Information

Environment variables

REST API

Authentication

Example of Configurations

Custom Resource

Config Map

Disaster Recovery Extension

Extension Repository

Configuration

DR Server and Health

DR Controller

Example

About

Releases 2

Packages

Contributors 2

Languages

License

Netcracker/qubership-disaster-recovery-daemon

Folders and files

Latest commit

History

Repository files navigation

Disaster Recovery Server

Common Information

Environment variables

REST API

Authentication

Example of Configurations

Custom Resource

Config Map

Disaster Recovery Extension

Extension Repository

Configuration

DR Server and Health

DR Controller

Example

About

Resources

License

Code of conduct

Security policy

Stars

Watchers

Forks

Releases 2

Packages 0

Contributors 2

Languages

Packages