-
Notifications
You must be signed in to change notification settings - Fork 62
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix for Generic error for persistent task on starting replication #1003
base: main
Are you sure you want to change the base?
Conversation
…tore state Signed-off-by: Nishant Goel <[email protected]>
@@ -828,7 +828,11 @@ open class IndexReplicationTask(id: Long, type: String, action: String, descript | |||
} | |||
} catch(e: Exception) { | |||
val err = "Unable to initiate restore call for $followerIndexName from $leaderAlias:${leaderIndex.name}" | |||
val aliasErrMsg = "cannot rename index [${leaderIndex.name}] into [$followerIndexName] because of conflict with an alias with the same name" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Relying on exception message isn't robust as any simple change in string will impact the logic.
Have we exhausted other ways to detect? (Exception type, error code etc). If so, we may want to check on substring instead of complete sentence.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is just to show specific error message in case of alias conflict. Main concern is to handle failed state in case of restore state. If string will change in future as well, then also we get the failed state with message related to "unable to initiate restore" and more specific error will come in logs
(!replicateIndexReq.waitForRestore && replicationState == ReplicationState.RESTORING) | ||
(!replicateIndexReq.waitForRestore && replicationState == ReplicationState.RESTORING) || | ||
(!replicateIndexReq. waitForRestore && replicationState == ReplicationState. FAILED) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does handling for FAILED exist?
(Also a nit - there is a space after the .
)
Does this need any update to test case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ohh yes thanks for pointing out the space after .
@@ -828,7 +828,11 @@ open class IndexReplicationTask(id: Long, type: String, action: String, descript | |||
} | |||
} catch(e: Exception) { | |||
val err = "Unable to initiate restore call for $followerIndexName from $leaderAlias:${leaderIndex.name}" | |||
val aliasErrMsg = "cannot rename index [${leaderIndex.name}] into [$followerIndexName] because of conflict with an alias with the same name" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This scenario is one of cases in which the replication fails to start and proper message is not thrown. We need to make it generic to cover all cases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, For other cases if something fails in restore state then we will get this message in index replication status API "Unable to initiate restore call for followerIndexName from leaderAlias:{leaderIndex.name}".
Initially when failed state was not handled properly, we get generic error of "Timed out when waiting for persistent task after 1m". Which might happened because of failure in restore state or error in any other state.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated the failed state message with whatever exception message will come
Signed-off-by: Nishant Goel <[email protected]>
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #1003 +/- ##
============================================
- Coverage 76.15% 74.64% -1.52%
+ Complexity 1042 1026 -16
============================================
Files 141 141
Lines 4785 4784 -1
Branches 527 527
============================================
- Hits 3644 3571 -73
- Misses 789 863 +74
+ Partials 352 350 -2 ☔ View full report in Codecov by Sentry. |
Description
This change will fix the generic error coming in case of replication failed in restore phase.
This change will also return the acknowledgement as 200 when replication failed in restore phase and put the failed reason in metadata store, which can be view via replication status API
Issues Resolved
#707
Check List
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.