retry command for GCS poll way of running (#668)

* retry command for GCS poll way of running * retry command for GCS poll way of running
GoogleCloudPlatform · Oct 20, 2023 · 1d29dd1 · 1d29dd1
1 parent 386defc
commit 1d29dd1
Showing 1 changed file with 27 additions and 3 deletions.
diff --git a/docs/troubleshoot/minimal.md b/docs/troubleshoot/minimal.md
@@ -36,7 +36,7 @@ The following error scenarios are possible currently when doing low downtime mig
 1. Other SpannerExceptions - which are marked for retry
 1. In addition, there is a possibility of severe errors that would require manual intervention. Examples of severe error could be error during transformation.
 
-Points 1 to 4 above are retryable errors - the Dataflow job automatically retries them at intervals of 10 minutes for 500 times. In most cases, this should be good enough for the retryable records to succeed, however, even if after exhausting all the retries, these are not successful - then these records are marked as ‘severe' error category. Such ‘severe' errors can be retried later with a ‘retryDLQ' mode of the Dataflow job (discussed below in the ‘Retry command' section).  
+Points 1 to 4 above are retryable errors - the Dataflow job automatically retries them at intervals of 10 minutes for 500 times. In most cases, this should be good enough for the retryable records to succeed, however, even if after exhausting all the retries, these are not successful - then these records are marked as ‘severe' error category. Such ‘severe' errors can be retried later with a ‘retryDLQ' mode of the Dataflow job (discussed [below](#to-re-run-for-reprocessing-dlq-directory)).  
 The following scenarios results in skipping of records, they are not really errors:
 
 1. Invalid structure of records read from Datastream output
@@ -76,15 +76,39 @@ Migration progress can be tracked by monitoring the Dataflow job and following c
 
 It can happen that in retryDLQ mode, there are still permanent errors. To identify that all the retryable errors have been processed and only permanent errors remain for reprocessing - one can look at the ‘Successful events' count - it would remain constant after every retry iteration. Each retry iteration, the ‘elementsReconsumedFromDeadLetterQueue' would increment.
 
-### Retry command
+### Re-run commands
+
+#### To rerun regular flow
+
+To rerun the regular flow, the same command as original needs to be fired. Note: This will only work when not using the PubSub subscriptions for GCS files.The processing starts all over again, meaning the same Datastream outputs get reprocessed.
+
+```
+gcloud dataflow flex-template run <jobName> \
+ --project=<project-name> --region=<region-name> \
+ --template-file-gcs-location=gs://dataflow-templates-southamerica-west1/2023-09-12-00_RC00/flex/Cloud_Datastream_to_Spanner \
+ --num-workers 1 --max-workers 50 \
+ --enable-streaming-engine \
+ --parameters databaseId=<database id>,deadLetterQueueDirectory=<GCS location of the DLQ directory>,inputFilePattern=<gcs location of the datastream output>,instanceId=<spanner-instance-id>,sessionFilePath=<GCS location of the session json>,streamName=<data stream name>,transformationContextFilePath=<path to transformation context json>
+
+```
+
+These job parameters can be taken from the original job.
+
+#### To re-run for reprocessing DLQ directory
 
 This will reprocess the records marked as ‘severe' error records from the DLQ.  
 Before running the Dataflow job, check if the main Dataflow job has non-zero retryable error count. In case there are referential error records - check that the dependent table data is populated completely from the source database.
 
 Sample command to run the Dataflow job in retryDLQ mode is
 
 ```sh
-gcloud beta dataflow flex-template run <jobname> --region=<the region where the dataflow job must run> --template-file-gcs-location=gs://dataflow-templates/latest/flex/Cloud_Datastream_to_Spanner --additional-experiments=use_runner_v2 --parameters inputFilePattern=<GCS location of the input file pattern>,streamName=<Datastream name>,instanceId=<Spanner Instance Id>,databaseId=<Spanner Database Id>,sessionFilePath=<GCS path to session file>,deadLetterQueueDirectory=<GCS path to the DLQ>,runMode=retryDLQ
+gcloud  dataflow flex-template run <jobname> \
+--region=<the region where the dataflow job must run> \
+--template-file-gcs-location=gs://dataflow-templates/latest/flex/Cloud_Datastream_to_Spanner \
+--additional-experiments=use_runner_v2 \
+--parameters inputFilePattern=<GCS location of the input file pattern>,streamName=<Datastream name>, \
+instanceId=<Spanner Instance Id>,databaseId=<Spanner Database Id>,sessionFilePath=<GCS path to session file>, \
+deadLetterQueueDirectory=<GCS path to the DLQ>,runMode=retryDLQ
 ```
 
 The following parameters can be taken from the regular forward migration Dataflow job: