Add discovered deployment-rbd dr resources

The ManagedClusterBinding belongs to the ramen-ops namespace and not to the application, so we keep it in the ramen-ops directory. The rest of the resources are in the deployment-rbd directory. I'm not sure how easy it will be to share a base kustomization with other workloads so lets start with something simple. Unfinished: - disable DR - undeploy Signed-off-by: Nir Soffer <[email protected]>
RamenDR · Jun 4, 2024 · 97df338 · 97df338
1 parent e97d598
commit 97df338
Show file tree

Hide file tree

Showing 6 changed files with 435 additions and 0 deletions.
diff --git a/README.md b/README.md
@@ -203,3 +203,378 @@ kubectl delete -k subscription/deployment-k8s-regional-rbd
    ```
 
    At this point the application is managed again by *OCM*.
+
+## Deploy OCM discovered application
+
+The sample application is configured to run on cluster `dr1`. To deploy
+it on cluster `dr1` and make it possible to fail over or relocate to
+cluster `dr2` we need to create the namespace on both clusters:
+
+```
+kubectl create ns deployment-rbd --context dr1
+kubectl create ns deployment-rbd --context dr2
+```
+
+To deploy the application apply the deployment-rbd workload to the
+`deployment-rbd` namespace on cluster `dr1`:
+
+```
+kubectl apply -k workloads/deployment/k8s-regional-rbd -n deployment-rbd --context dr1
+```
+
+To view the deployed application use:
+
+```
+kubectl get deploy,pod,pvc -n deployment-rbd --context dr1
+```
+
+Example output:
+
+```
+NAME                      READY   UP-TO-DATE   AVAILABLE   AGE
+deployment.apps/busybox   1/1     1            1           24s
+
+NAME                           READY   STATUS    RESTARTS   AGE
+pod/busybox-6bbf88b9f8-fz2kn   1/1     Running   0          24s
+
+NAME                                STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS      VOLUMEATTRIBUTESCLASS   AGE
+persistentvolumeclaim/busybox-pvc   Bound    pvc-c45a3892-167b-4dbc-a250-09c5f288c766   1Gi        RWO            rook-ceph-block   <unset>                 24s
+```
+
+## Enabling DR for OCM discovered application
+
+Unlike OCM managed applications, the DR resources for all applications
+are in the `ramen-ops` namespace.
+
+To prepare the `ramen-ops` namespaces apply the managed clusterset
+binding resource. This should be done once before enabling DR for
+discovered applications.
+
+```
+kubectl apply -f dr/discovered/ramen-ops/binding.yaml --context hub
+```
+
+Example output:
+
+```
+managedclustersetbinding.cluster.open-cluster-management.io/default created
+```
+
+To enable DR for the application, apply the DR resources to the hub
+cluster:
+
+```
+kubectl apply -k dr/discovered/deployment-rbd --context hub
+```
+
+Example output:
+
+```
+placement.cluster.open-cluster-management.io/deployment-rbd-placement created
+placementdecision.cluster.open-cluster-management.io/deployment-rbd-placement-decision created
+drplacementcontrol.ramendr.openshift.io/deployment-rbd-drpc created
+```
+
+To set the application placement, patch the placement decision resource
+in the `ramern-ops` namespace on the hub:
+
+```
+kubectl patch placementdecision deployment-rbd-placement-decision \
+    --subresource status \
+    --patch '{"status": {"decisions": [{"clusterName": "dr1", "reason": "dr1"}]}}' \
+    --type merge \
+    --namespace ramen-ops \
+    --context hub
+```
+
+Example output:
+
+```
+placementdecision.cluster.open-cluster-management.io/deployment-rbd-placement-decision patched (no change)
+```
+
+At this point *Ramen* take over and start protecting the application.
+
+*Ramen* creates a `VolumeReplicationGroup` resource in the `ramen-ops`
+namespace in cluster `dr1`:
+
+```
+kubectl get vrg -l app=deployment-rbd -n ramen-ops --context dr1
+```
+
+Example output:
+
+```
+$ kubectl get vrg deployment-rbd-drpc -n ramen-ops --context dr1
+NAME                  DESIREDSTATE   CURRENTSTATE
+deployment-rbd-drpc   primary        Primary
+```
+
+*Ramen* also creates a `VolumeReplication` resource, setting up
+replication for the application PVC from the primary cluster to the
+secondary cluster:
+
+```
+kubectl get vr busybox-pvc -n deployment-rbd --context dr1
+```
+
+Example output:
+
+```
+NAME          AGE   VOLUMEREPLICATIONCLASS   PVCNAME       DESIREDSTATE   CURRENTSTATE
+busybox-pvc   10m   vrc-sample               busybox-pvc   primary        Primary
+```
+
+## Failing over an OCM discovered application
+
+In case of disaster you can force the application to run on the other
+cluster.  The application will start on the other cluster using the data
+from the last replication. Data since the last replication is lost.
+
+In the ramen testing environment we can simulate a disaster by pausing
+the minikube VM running cluster `dr1`:
+
+```
+virsh -c qemu:///system suspend dr1
+```
+
+Example output:
+
+```
+Domain 'dr1' suspended
+```
+
+At this point the application is not accessible. To recover from the
+disaster, we can fail over the application the secondary cluster.
+
+To start a `Failover` action, patch the application `DRPlacementControl`
+resource in the `ramen-ops` namespace on the hub cluster. We need to set
+the `action` and `failoverCluster`:
+
+```
+kubectl patch drpc deployment-rbd-drpc \
+    --patch '{"spec": {"action": "Failover", "failoverCluster": "dr2"}}' \
+    --type merge \
+    --namespace ramen-ops \
+    --context hub
+```
+
+Example output:
+
+```
+drplacementcontrol.ramendr.openshift.io/deployment-rbd-drpc patched
+```
+
+The application will start on the failover cluster ("dr2"). Nothing will
+change on the primary cluster ("dr1") since it is still paused.
+
+To watch the application status while failing over, run:
+
+```
+kubectl get drpc deployment-rbd-drpc -n ramen-ops --context hub -o wide -w
+```
+
+Example output:
+
+```
+NAME                  AGE   PREFERREDCLUSTER   FAILOVERCLUSTER   DESIREDSTATE   CURRENTSTATE   PROGRESSION        START TIME             DURATION   PEER READY
+deployment-rbd-drpc   17m   dr1                dr2               Failover       FailedOver     WaitForReadiness   2024-06-04T18:10:44Z              False
+deployment-rbd-drpc   18m   dr1                dr2               Failover       FailedOver     WaitForReadiness   2024-06-04T18:10:44Z              False
+deployment-rbd-drpc   18m   dr1                dr2               Failover       FailedOver     Cleaning Up        2024-06-04T18:10:44Z              False
+deployment-rbd-drpc   18m   dr1                dr2               Failover       FailedOver     WaitOnUserToCleanUp   2024-06-04T18:10:44Z              False
+```
+
+*Ramen* will proceed until the point where the application should be
+deleted from the primary cluster ("dr1"). Note the progression
+`WaitOnUserToCleanup`.
+
+The application is running now on cluster `dr2`:
+
+```
+kubectl get deploy,pod,pvc -n deployment-rbd --context dr2
+```
+
+Example output:
+
+```
+NAME                      READY   UP-TO-DATE   AVAILABLE   AGE
+deployment.apps/busybox   1/1     1            1           3m58s
+
+NAME                           READY   STATUS    RESTARTS   AGE
+pod/busybox-6bbf88b9f8-fz2kn   1/1     Running   0          3m58s
+
+NAME                                STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS      VOLUMEATTRIBUTESCLASS   AGE
+persistentvolumeclaim/busybox-pvc   Bound    pvc-c45a3892-167b-4dbc-a250-09c5f288c766   1Gi        RWO            rook-ceph-block   <unset>                 4m11s
+```
+
+To complete the failover, we need to recover the primary cluster, so we
+can start replication from the secondary cluster to the primary cluster.
+
+In the ramen testing environment, we can resume the minikube VM running
+cluster `dr1`:
+
+```
+virsh -c qemu:///system resume dr1
+```
+
+Example output:
+
+```
+Domain 'dr1' resumed
+```
+
+When the cluster becomes accessible again, you need to delete the
+application from the primary cluster since *Ramen* does not support
+deleting applications:
+
+```
+kubectl delete -k workloads/deployment/k8s-regional-rbd -n deployment-rbd --context dr1
+```
+
+Example output:
+
+```
+persistentvolumeclaim "busybox-pvc" deleted
+deployment.apps "busybox" deleted
+```
+
+To wait until the application data is replicated again to the other
+cluster run:
+
+```
+kubectl wait drpc deployment-rbd-drpc \
+    --for condition=Protected \
+    --namespace ramen-ops \
+    --timeout 5m \
+    --context hub
+```
+
+Example output:
+
+```
+drplacementcontrol.ramendr.openshift.io/deployment-rbd-drpc condition met
+```
+
+To check the application DR status run:
+
+```
+kubectl get drpc deployment-rbd-drpc -n ramen-ops --context hub -o wide
+```
+
+Example output:
+
+```
+NAME                  AGE   PREFERREDCLUSTER   FAILOVERCLUSTER   DESIREDSTATE   CURRENTSTATE   PROGRESSION   START TIME             DURATION          PEER READY
+deployment-rbd-drpc   28m   dr1                dr2               Failover       FailedOver     Completed     2024-06-04T18:10:44Z   11m24.41686883s   True
+```
+
+The failover has completed, and the application data is replicated again
+to the primary cluster.
+
+## Relocate an OCM discovered application
+
+To move the application back to the primary cluster after a disaster you
+can use the `Relocate` action. You will delete the application on the
+secondary cluster, and *Ramen* will start it on the primary cluster. No
+data is lost during this operation.
+
+To start the relocate operation, patch the application
+`DRPlacementControl` resource in the `ramen-ops` namespace on the hub.
+We need to set `action` and if needed, `preferredCluster`:
+
+```
+kubectl patch drpc deployment-rbd-drpc \
+    --patch '{"spec": {"action": "Relocate", "preferredCluster": "dr1"}}' \
+    --type merge \
+    --namespace ramen-ops \
+    --context hub
+```
+
+Example output:
+
+```
+drplacementcontrol.ramendr.openshift.io/deployment-rbd-drpc patched
+```
+
+*Ramen* will prepare for relocation, and proceed until the point the
+application should be deleted from the cluster. To watch the progress
+run:
+
+```
+kubectl get drpc deployment-rbd-drpc -n ramen-ops --context hub -o wide -w
+```
+
+Example output:
+
+```
+NAME                  AGE   PREFERREDCLUSTER   FAILOVERCLUSTER   DESIREDSTATE   CURRENTSTATE   PROGRESSION          START TIME             DURATION   PEER READY
+deployment-rbd-drpc   91m   dr1                dr2               Relocate       Initiating     PreparingFinalSync   2024-06-04T19:25:52Z              True
+deployment-rbd-drpc   92m   dr1                dr2               Relocate       Relocating     RunningFinalSync     2024-06-04T19:25:52Z              True
+deployment-rbd-drpc   92m   dr1                dr2               Relocate       Relocating     WaitOnUserToCleanUp   2024-06-04T19:25:52Z              False
+```
+
+When ramen shows the progression `WaitOnUserToCleanUp` you need to
+delete the application from the secondary cluster:
+
+```
+kubectl delete -k workloads/deployment/k8s-regional-rbd -n deployment-rbd --context dr2
+```
+
+Example output:
+
+```
+persistentvolumeclaim "busybox-pvc" deleted
+deployment.apps "busybox" deleted
+```
+
+At this pint *Ramen* will proceed with starting the application on the
+primary cluster, and setting up replication to the secondary cluster.
+
+To wait until the application is relocated to the primary cluster, run:
+
+```
+kubectl wait drpc deployment-rbd-drpc \
+    --for jsonpath='{.status.phase}=Relocated' \
+    --namespace ramen-ops \
+    --timeout 5m \
+    --context hub
+```
+
+Example output:
+
+```
+drplacementcontrol.ramendr.openshift.io/deployment-rbd-drpc condition met
+```
+
+To wait until the application is replicating data again to the secondary
+cluster, wait for the `Protected` condition:
+
+```
+kubectl wait drpc deployment-rbd-drpc \
+    --for condition=Protected \
+    --namespace ramen-ops \
+    --timeout 5m \
+    --context hub
+```
+
+Example output:
+
+```
+drplacementcontrol.ramendr.openshift.io/deployment-rbd-drpc condition met
+```
+
+To check the application DR status run:
+
+```
+kubectl get drpc deployment-rbd-drpc -n ramen-ops --context hub -o wide
+```
+
+Example output:
+
+```
+NAME                  AGE    PREFERREDCLUSTER   FAILOVERCLUSTER   DESIREDSTATE   CURRENTSTATE   PROGRESSION   START TIME             DURATION          PEER READY
+deployment-rbd-drpc   103m   dr1                dr2               Relocate       Relocated      Completed     2024-06-04T19:25:52Z   3m46.040128192s   True
+```
+
+The relocate has completed, and the application data is replicated again
+to the secondary cluster.