-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix resuming subtasks #90
base: v0.10.5_patched
Are you sure you want to change the base?
Conversation
ee980f4
to
faf9d38
Compare
25391a7
to
1df6fb4
Compare
584db23
to
2ca6c4e
Compare
d702ca8
to
67e5426
Compare
67e5426
to
b79c994
Compare
ResumingTask resumingTask = ImmutableResumingTask.builder() | ||
.sourceTaskId(archivedTask.getId()) | ||
.fullName(archivedTask.getFullName()) | ||
.config(TaskConfig.validate(workflowTask.getConfig())) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ここのconfigのみarchivedTaskではなく、workflowTaskのconfigをINSERTしている。
これはtaskの定義が変更された場合に新しいrevisionのconfigである必要があるため。
具体的には、この修正がないと以下のIntegration Testでのretryで落ちる。
RetryIT.java
+step1:
sh>: touch ${outdir}/1-1.out
+step2:
+a:
sh>: touch ${outdir}/1-2a.out
+b:
fail>: step2b fail
+step1:
sh>: touch ${outdir}/2-1.out
+step2:
+a:
sh>: touch ${outdir}/2-2a.out
+b:
sh>: touch ${outdir}/2-2b.out
id = store.addSubtask(attemptId, task); | ||
} else { | ||
TaskStateCode state; | ||
switch(archivedTask.getState()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
元々のTaskControl#addTasksでは、resumingTasksの有無によってBLOCKED or SUCCESSで分岐していたが、6aa7cb9 によって「resumeで取得するtaskがSUCCESSのみ」という前提が変わったため、stateによる分岐が必要になった。
@@ -185,6 +188,133 @@ private static long addTasks(TaskControlStore store, | |||
return rootTaskId; | |||
} | |||
|
|||
private static long addInitialTasks(TaskControlStore store, long attemptId, long rootTaskId, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
既存のTaskControl#addTasks は別の箇所からも参照されているため、ただ単に引数のresumingTasksをArchivedTasksに変更することはできなかった。
そのため、新たにaddInitialTasksを実装した。
参照: INSERT時のparameterの不足
.collect(Collectors.toMap(WorkflowTask::getFullName, task -> indexToId.get(workflowTasks.indexOf(task)))); | ||
|
||
archivedTasks.stream() | ||
.filter(archivedTask -> !taskNameAndIds.keySet().contains(archivedTask.getFullName()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
すでにINSERT済みのtaskは除外する。
|
||
archivedTasks.stream() | ||
.filter(archivedTask -> !taskNameAndIds.keySet().contains(archivedTask.getFullName()) | ||
&& archivedTask.getFullName().contains("^sub")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
単にtask名に「^」を含む場合だと、「^failure-alert」や「^error」などの除外すべきsubtaskも含まれるため、狭義のsubtaskに限定する必要がある。
archivedTasks.stream() | ||
.filter(archivedTask -> !taskNameAndIds.keySet().contains(archivedTask.getFullName()) | ||
&& archivedTask.getFullName().contains("^sub")) | ||
.sorted(Comparator.comparingInt((t) -> (int) t.getId())) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
後続のtask_dependenciesテーブルへのINESRTのために、upstreamのtaskを先にtasksテーブルへのINSERTしてIDを採番する必要があるため、あらかじめretry failed前のattemptのTask IDでソートする。
return true; | ||
} | ||
return false; | ||
}) | ||
.map(archived -> ResumingTask.of(archived)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ResumingTaskに変換せず、ArchivedTaskのまま返すように。
参照: INSERT時のparameterの不足
tasks, ImmutableList.of(), | ||
false, true, true, | ||
resumingTasks); | ||
long taskId = addInitialTasks(store, attemptId, rootTaskId, tasks, archivedTasks); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
a463168
(#90) のaddInitialTasksに変更。
.map(task -> { | ||
if (!task.getParentId().isPresent()) { | ||
if (!task.getParentId().isPresent() && task.getState() == TaskStateCode.SUCCESS) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
すでに成功したattemptのresumeは禁止されているため、L310のfilterの変更に合わせて修正。
!task.getParentId().isPresent()
は「parentIdがないtask = rootTask」のこと。
(rootTaskがSUCCESS = attemptもSUCCESS)
List<Long> successTasks = tasks.stream() | ||
.filter(task -> task.getState() == TaskStateCode.SUCCESS) | ||
List<Long> ids = tasks.stream() | ||
.filter(t-> statuses.contains(t.getState())) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
resumeするtaskをSUCCESSのみから変更。
参照: SUCCESSのみが取得対象
|
||
taskNameAndIds.put(archivedSubtask.getFullName(), id); | ||
|
||
archivedTasks.stream() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TaskControl#addTasksと同じようにtask_dependenciesテーブルへのINSERTが必要なため、task名からupstreamのIDを取得しINSERTする。
一通りみてみて方針等問題なさそうと思いました。 |
概要
20231218 Digdag retry failedとSubtaskの調査 の問題を解決するためのpatch。
レビューのための実装の解説をまとめたので、20240227 Digdag retry failedによるReference Errorの対応方針 を参照してください。
動作確認
Integration Testを実行。
(ただし、一部でcredentialsが必要なテストがあるのと、元々Digdag本家のCIでもcoverage 100%ではないため、今回の動作確認でもcoverage 100%ではない)
TODO