Skip to content

Commit

Permalink
Add discovered deployment-rbd dr resources
Browse files Browse the repository at this point in the history
The ManagedClusterBinding belongs to the ramen-ops namespace and
not to the application, so we keep it in the ramen-ops directory.

The rest of the resources are in the deployment-rbd directory. I'm not
sure how easy it will be to share a base kustomization with other
workloads so lets start with something simple.

Unfinished:
- disable DR
- undeploy

Signed-off-by: Nir Soffer <[email protected]>
  • Loading branch information
nirs committed Jun 4, 2024
1 parent e97d598 commit 97df338
Show file tree
Hide file tree
Showing 6 changed files with 435 additions and 0 deletions.
375 changes: 375 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -203,3 +203,378 @@ kubectl delete -k subscription/deployment-k8s-regional-rbd
```

At this point the application is managed again by *OCM*.

## Deploy OCM discovered application

The sample application is configured to run on cluster `dr1`. To deploy
it on cluster `dr1` and make it possible to fail over or relocate to
cluster `dr2` we need to create the namespace on both clusters:

```
kubectl create ns deployment-rbd --context dr1
kubectl create ns deployment-rbd --context dr2
```

To deploy the application apply the deployment-rbd workload to the
`deployment-rbd` namespace on cluster `dr1`:

```
kubectl apply -k workloads/deployment/k8s-regional-rbd -n deployment-rbd --context dr1
```

To view the deployed application use:

```
kubectl get deploy,pod,pvc -n deployment-rbd --context dr1
```

Example output:

```
NAME READY UP-TO-DATE AVAILABLE AGE
deployment.apps/busybox 1/1 1 1 24s
NAME READY STATUS RESTARTS AGE
pod/busybox-6bbf88b9f8-fz2kn 1/1 Running 0 24s
NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS VOLUMEATTRIBUTESCLASS AGE
persistentvolumeclaim/busybox-pvc Bound pvc-c45a3892-167b-4dbc-a250-09c5f288c766 1Gi RWO rook-ceph-block <unset> 24s
```

## Enabling DR for OCM discovered application

Unlike OCM managed applications, the DR resources for all applications
are in the `ramen-ops` namespace.

To prepare the `ramen-ops` namespaces apply the managed clusterset
binding resource. This should be done once before enabling DR for
discovered applications.

```
kubectl apply -f dr/discovered/ramen-ops/binding.yaml --context hub
```

Example output:

```
managedclustersetbinding.cluster.open-cluster-management.io/default created
```

To enable DR for the application, apply the DR resources to the hub
cluster:

```
kubectl apply -k dr/discovered/deployment-rbd --context hub
```

Example output:

```
placement.cluster.open-cluster-management.io/deployment-rbd-placement created
placementdecision.cluster.open-cluster-management.io/deployment-rbd-placement-decision created
drplacementcontrol.ramendr.openshift.io/deployment-rbd-drpc created
```

To set the application placement, patch the placement decision resource
in the `ramern-ops` namespace on the hub:

```
kubectl patch placementdecision deployment-rbd-placement-decision \
--subresource status \
--patch '{"status": {"decisions": [{"clusterName": "dr1", "reason": "dr1"}]}}' \
--type merge \
--namespace ramen-ops \
--context hub
```

Example output:

```
placementdecision.cluster.open-cluster-management.io/deployment-rbd-placement-decision patched (no change)
```

At this point *Ramen* take over and start protecting the application.

*Ramen* creates a `VolumeReplicationGroup` resource in the `ramen-ops`
namespace in cluster `dr1`:

```
kubectl get vrg -l app=deployment-rbd -n ramen-ops --context dr1
```

Example output:

```
$ kubectl get vrg deployment-rbd-drpc -n ramen-ops --context dr1
NAME DESIREDSTATE CURRENTSTATE
deployment-rbd-drpc primary Primary
```

*Ramen* also creates a `VolumeReplication` resource, setting up
replication for the application PVC from the primary cluster to the
secondary cluster:

```
kubectl get vr busybox-pvc -n deployment-rbd --context dr1
```

Example output:

```
NAME AGE VOLUMEREPLICATIONCLASS PVCNAME DESIREDSTATE CURRENTSTATE
busybox-pvc 10m vrc-sample busybox-pvc primary Primary
```

## Failing over an OCM discovered application

In case of disaster you can force the application to run on the other
cluster. The application will start on the other cluster using the data
from the last replication. Data since the last replication is lost.

In the ramen testing environment we can simulate a disaster by pausing
the minikube VM running cluster `dr1`:

```
virsh -c qemu:///system suspend dr1
```

Example output:

```
Domain 'dr1' suspended
```

At this point the application is not accessible. To recover from the
disaster, we can fail over the application the secondary cluster.

To start a `Failover` action, patch the application `DRPlacementControl`
resource in the `ramen-ops` namespace on the hub cluster. We need to set
the `action` and `failoverCluster`:

```
kubectl patch drpc deployment-rbd-drpc \
--patch '{"spec": {"action": "Failover", "failoverCluster": "dr2"}}' \
--type merge \
--namespace ramen-ops \
--context hub
```

Example output:

```
drplacementcontrol.ramendr.openshift.io/deployment-rbd-drpc patched
```

The application will start on the failover cluster ("dr2"). Nothing will
change on the primary cluster ("dr1") since it is still paused.

To watch the application status while failing over, run:

```
kubectl get drpc deployment-rbd-drpc -n ramen-ops --context hub -o wide -w
```

Example output:

```
NAME AGE PREFERREDCLUSTER FAILOVERCLUSTER DESIREDSTATE CURRENTSTATE PROGRESSION START TIME DURATION PEER READY
deployment-rbd-drpc 17m dr1 dr2 Failover FailedOver WaitForReadiness 2024-06-04T18:10:44Z False
deployment-rbd-drpc 18m dr1 dr2 Failover FailedOver WaitForReadiness 2024-06-04T18:10:44Z False
deployment-rbd-drpc 18m dr1 dr2 Failover FailedOver Cleaning Up 2024-06-04T18:10:44Z False
deployment-rbd-drpc 18m dr1 dr2 Failover FailedOver WaitOnUserToCleanUp 2024-06-04T18:10:44Z False
```

*Ramen* will proceed until the point where the application should be
deleted from the primary cluster ("dr1"). Note the progression
`WaitOnUserToCleanup`.

The application is running now on cluster `dr2`:

```
kubectl get deploy,pod,pvc -n deployment-rbd --context dr2
```

Example output:

```
NAME READY UP-TO-DATE AVAILABLE AGE
deployment.apps/busybox 1/1 1 1 3m58s
NAME READY STATUS RESTARTS AGE
pod/busybox-6bbf88b9f8-fz2kn 1/1 Running 0 3m58s
NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS VOLUMEATTRIBUTESCLASS AGE
persistentvolumeclaim/busybox-pvc Bound pvc-c45a3892-167b-4dbc-a250-09c5f288c766 1Gi RWO rook-ceph-block <unset> 4m11s
```

To complete the failover, we need to recover the primary cluster, so we
can start replication from the secondary cluster to the primary cluster.

In the ramen testing environment, we can resume the minikube VM running
cluster `dr1`:

```
virsh -c qemu:///system resume dr1
```

Example output:

```
Domain 'dr1' resumed
```

When the cluster becomes accessible again, you need to delete the
application from the primary cluster since *Ramen* does not support
deleting applications:

```
kubectl delete -k workloads/deployment/k8s-regional-rbd -n deployment-rbd --context dr1
```

Example output:

```
persistentvolumeclaim "busybox-pvc" deleted
deployment.apps "busybox" deleted
```

To wait until the application data is replicated again to the other
cluster run:

```
kubectl wait drpc deployment-rbd-drpc \
--for condition=Protected \
--namespace ramen-ops \
--timeout 5m \
--context hub
```

Example output:

```
drplacementcontrol.ramendr.openshift.io/deployment-rbd-drpc condition met
```

To check the application DR status run:

```
kubectl get drpc deployment-rbd-drpc -n ramen-ops --context hub -o wide
```

Example output:

```
NAME AGE PREFERREDCLUSTER FAILOVERCLUSTER DESIREDSTATE CURRENTSTATE PROGRESSION START TIME DURATION PEER READY
deployment-rbd-drpc 28m dr1 dr2 Failover FailedOver Completed 2024-06-04T18:10:44Z 11m24.41686883s True
```

The failover has completed, and the application data is replicated again
to the primary cluster.

## Relocate an OCM discovered application

To move the application back to the primary cluster after a disaster you
can use the `Relocate` action. You will delete the application on the
secondary cluster, and *Ramen* will start it on the primary cluster. No
data is lost during this operation.

To start the relocate operation, patch the application
`DRPlacementControl` resource in the `ramen-ops` namespace on the hub.
We need to set `action` and if needed, `preferredCluster`:

```
kubectl patch drpc deployment-rbd-drpc \
--patch '{"spec": {"action": "Relocate", "preferredCluster": "dr1"}}' \
--type merge \
--namespace ramen-ops \
--context hub
```

Example output:

```
drplacementcontrol.ramendr.openshift.io/deployment-rbd-drpc patched
```

*Ramen* will prepare for relocation, and proceed until the point the
application should be deleted from the cluster. To watch the progress
run:

```
kubectl get drpc deployment-rbd-drpc -n ramen-ops --context hub -o wide -w
```

Example output:

```
NAME AGE PREFERREDCLUSTER FAILOVERCLUSTER DESIREDSTATE CURRENTSTATE PROGRESSION START TIME DURATION PEER READY
deployment-rbd-drpc 91m dr1 dr2 Relocate Initiating PreparingFinalSync 2024-06-04T19:25:52Z True
deployment-rbd-drpc 92m dr1 dr2 Relocate Relocating RunningFinalSync 2024-06-04T19:25:52Z True
deployment-rbd-drpc 92m dr1 dr2 Relocate Relocating WaitOnUserToCleanUp 2024-06-04T19:25:52Z False
```

When ramen shows the progression `WaitOnUserToCleanUp` you need to
delete the application from the secondary cluster:

```
kubectl delete -k workloads/deployment/k8s-regional-rbd -n deployment-rbd --context dr2
```

Example output:

```
persistentvolumeclaim "busybox-pvc" deleted
deployment.apps "busybox" deleted
```

At this pint *Ramen* will proceed with starting the application on the
primary cluster, and setting up replication to the secondary cluster.

To wait until the application is relocated to the primary cluster, run:

```
kubectl wait drpc deployment-rbd-drpc \
--for jsonpath='{.status.phase}=Relocated' \
--namespace ramen-ops \
--timeout 5m \
--context hub
```

Example output:

```
drplacementcontrol.ramendr.openshift.io/deployment-rbd-drpc condition met
```

To wait until the application is replicating data again to the secondary
cluster, wait for the `Protected` condition:

```
kubectl wait drpc deployment-rbd-drpc \
--for condition=Protected \
--namespace ramen-ops \
--timeout 5m \
--context hub
```

Example output:

```
drplacementcontrol.ramendr.openshift.io/deployment-rbd-drpc condition met
```

To check the application DR status run:

```
kubectl get drpc deployment-rbd-drpc -n ramen-ops --context hub -o wide
```

Example output:

```
NAME AGE PREFERREDCLUSTER FAILOVERCLUSTER DESIREDSTATE CURRENTSTATE PROGRESSION START TIME DURATION PEER READY
deployment-rbd-drpc 103m dr1 dr2 Relocate Relocated Completed 2024-06-04T19:25:52Z 3m46.040128192s True
```

The relocate has completed, and the application data is replicated again
to the secondary cluster.
Loading

0 comments on commit 97df338

Please sign in to comment.