WIP: execution replay #299

nh13 · 2017-06-09T04:09:21Z

No description provided.

nh13 · 2017-06-09T07:55:54Z

How to test:

Run a pipeline

java -Xmx8G -jar dagr-0.2.0-SNAPSHOT.jar --report report.txt --experimental-execution SleepyPipeline -j -n 3 -s 42

Ensure that the only dependency is Task 2 depends on task 0 by looking at the logging statements:

[2017/00/00 00:00:00 | SleepyPipeline | Info] Task 2 will depend on task 0

Modify the execution_log.txt file to remove the last to succeeded tasks and save it as "replay.txt". It should look something like this:

#DEFINITION,SIMPLE_NAME,NAME,CODE,PARENT_CODE,CHILD_NUMBER
#RELATIONSHIP,PARENT_CODE,CHILD_CODE
#STATUS,TASK_CODE,STATUS_NAME,STATUS_ORDINAL
DEFINITION,SleepyPipeline,SleepyPipeline,2120598994,-1,0
STATUS,2120598994,Pending,0
STATUS,2120598994,Queued,1
DEFINITION,SleepInJvmTask,task-0,823563107,2120598994,0
RELATIONSHIP,2120598994,823563107
DEFINITION,SleepInJvmTask,task-1,600184819,2120598994,1
RELATIONSHIP,2120598994,600184819
DEFINITION,SleepInJvmTask,task-2,642647395,2120598994,2
RELATIONSHIP,2120598994,642647395
STATUS,2120598994,Running,3
STATUS,823563107,Pending,0
STATUS,600184819,Pending,0
STATUS,642647395,Pending,0
STATUS,823563107,Queued,1
STATUS,600184819,Queued,1
STATUS,823563107,Submitted,2
STATUS,600184819,Submitted,2
STATUS,600184819,Running,3
STATUS,823563107,Running,3
STATUS,600184819,SucceededExecution,10
STATUS,823563107,SucceededExecution,10
STATUS,642647395,Queued,1
STATUS,642647395,Submitted,2
STATUS,642647395,Running,3

Now run it with the replay file:

java -Xmx8G -jar dagr-0.2.0-SNAPSHOT.jar --report report.txt --experimental-execution --replay-log replay.txt SleepyPipeline -j -n 3 -s 42

Validate via the logging to stdout that the task-0 and task-1 are manually succeeded, while the SleepyPipeline and task-2 execute. The new execution_log.txt will look something like this:

#DEFINITION,SIMPLE_NAME,NAME,CODE,PARENT_CODE,CHILD_NUMBER
#RELATIONSHIP,PARENT_CODE,CHILD_CODE
#STATUS,TASK_CODE,STATUS_NAME,STATUS_ORDINAL
DEFINITION,SleepyPipeline,SleepyPipeline,1255049730,-1,0
STATUS,1255049730,Pending,0
STATUS,1255049730,Queued,1
DEFINITION,SleepInJvmTask,task-0,458297460,1255049730,0
RELATIONSHIP,1255049730,458297460
DEFINITION,SleepInJvmTask,task-1,1770706846,1255049730,1
RELATIONSHIP,1255049730,1770706846
DEFINITION,SleepInJvmTask,task-2,2060167275,1255049730,2
RELATIONSHIP,1255049730,2060167275
STATUS,1255049730,Running,3
STATUS,458297460,Pending,0
STATUS,1770706846,Pending,0
STATUS,2060167275,Pending,0
STATUS,458297460,Queued,1
STATUS,1770706846,Queued,1
STATUS,458297460,ManuallySucceeded,11
STATUS,1770706846,ManuallySucceeded,11
STATUS,2060167275,Queued,1
STATUS,2060167275,Submitted,2
STATUS,2060167275,Running,3
STATUS,2060167275,SucceededExecution,10
STATUS,1255049730,SucceededExecution,10

codecov-io · 2017-06-09T08:19:17Z

Codecov Report

Merging #299 into master will increase coverage by 2.13%.
The diff coverage is 94.94%.

@@            Coverage Diff             @@
##           master     #299      +/-   ##
==========================================
+ Coverage   91.38%   93.51%   +2.13%     
==========================================
  Files          31       46      +15     
  Lines        1137     1742     +605     
  Branches      106      127      +21     
==========================================
+ Hits         1039     1629     +590     
- Misses         98      113      +15

Impacted Files	Coverage Δ
...e/src/main/scala/dagr/core/tasksystem/Linker.scala	`100% <ø> (ø)`	⬆️
...c/main/scala/dagr/core/tasksystem/Dependable.scala	`72.72% <ø> (ø)`	⬆️
...n/scala/dagr/core/tasksystem/SimpleInJvmTask.scala	`100% <ø> (ø)`	⬆️
...src/main/scala/dagr/core/tasksystem/UnitTask.scala	`100% <ø> (ø)`	⬆️
...src/main/scala/dagr/core/tasksystem/Pipeline.scala	`94.73% <ø> (+5.26%)`	⬆️
...rc/main/scala/dagr/core/config/Configuration.scala	`98.38% <ø> (+1.56%)`	⬆️
.../main/scala/dagr/core/tasksystem/Schedulable.scala	`100% <ø> (ø)`	⬆️
...main/scala/dagr/core/tasksystem/ShellCommand.scala	`75% <ø> (ø)`	⬆️
core/src/main/scala/dagr/core/exec/Scheduler.scala	`100% <ø> (ø)`
...ain/scala/dagr/core/execsystem2/TaskExecutor.scala	`0% <0%> (ø)`
... and 50 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 19dbac9...d9d7caa. Read the comment docs.

nh13 · 2017-06-10T19:10:41Z

@tfenne I am very close to having replay supported in the "old" execsystem, but I am running out of time for today!

nh13 · 2017-06-11T05:35:14Z

No tests for replay but it works in my hands for both execution systems!

nh13 · 2017-06-18T19:48:09Z

@tfenne I now have test coverage across all the code I want.

I was thinking we could do a multi-stage review if you are up for it? I see three main sections:

The abstraction of various execution classes (dagr.core.exec), including a generic executor, task status, and classes for callbacks (ex. when a task status changes).
The changes to the existing executor. A lot of these changes were made to be able to support both the existing and new executor.
The new execution system.

Probably the order should be 3, 2, 1. It may be hard to do each entirely separately, but I think it should be possible with only a minimal look at the others while reviewing one. Thoughts on how to proceed?

nh13 · 2017-06-18T19:48:27Z

build.sbt

@@ -95,7 +95,7 @@ lazy val commonSettings = Seq(
  testOptions in Test  += Tests.Argument(TestFrameworks.ScalaTest, "-h", Option(System.getenv("TEST_HTML_REPORTS")).getOrElse(htmlReportsDirectory)),
  testOptions in Test  += Tests.Argument("-l", "LongRunningTest"), // ignores long running tests
  // uncomment for full stack traces
-  //testOptions in Test  += Tests.Argument("-oD"),
+  testOptions in Test  += Tests.Argument("-oDF"),


TODO: comment me

nh13 · 2017-06-18T19:48:39Z

build.sbt

@@ -123,6 +123,7 @@ lazy val core = Project(id="dagr-core", base=file("core"))
      "com.fulcrumgenomics" %%  "commons"           %  "0.2.0-SNAPSHOT",
      "com.fulcrumgenomics" %%  "sopt"              %  "0.2.0-SNAPSHOT",
      "com.github.dblock"   %   "oshi-core"         %  "3.3",
+	  "com.beachape"        %%  "enumeratum"        %  "1.5.12",


TODO: tabs -> spaces

nh13 · 2017-06-19T17:32:23Z

core/src/main/scala/dagr/core/reporting/ExecutionLogger.scala

+  case class Definition(simpleName: String, name: String, code: Int, parentCode: Int, childNumber: Int) extends ToStringIsSimpleNameUpperCase {
+    /** Returns true if the the other definition represents the same task as the this definition.  Assumes that
+      * the parents have the same number of child tasks. */
+    def equivalent(that: Definition): Boolean = {


@tfenne this is the where we determine two tasks are "equivalent" across executions. You'll likely find that it is too conservative, so any ideas would be helpful.

nh13 · 2017-06-19T17:36:33Z

core/src/main/scala/dagr/core/exec/TaskCache.scala

+      val equivalentChildren = currentChildDefinitions.exists { left =>
+        currentChildDefinitions.exists { right => left != right && left.equivalent(right) }
+      }
+


@tfenne this is the where we determine if a pipeline-like task (a task that returns other tasks) is equivalent across executions. We look for:

The task must produce the same # of children.

None of the children produced are "equivalent" to each other (i.e. one cannot be misinterpreted for the other across executions).

Each child has one and only one equivalent "definition" from the previous execution, meaning for each child task we found one and only one potential matching task from the previous execution.

If the above are true, we evaluate whether or not to run the children separately (by their previous status), otherwise, we force all the children to run.

nh13 · 2017-06-19T17:38:03Z

core/src/test/scala/dagr/core/exec/SimpleTaskCacheTest.scala

+    }
+    Logger.out = tmpOut
+  }
+}


TODO: add a test of a pipeline within a pipeline within a pipeline, such that the middle-level pipeline produces different pipelines on the second execution, and so the middle and bottom level pipelines are forced to execute.

Refactor task and exec system prior to new execution system.

This system can be used with the --experimental-execution option.

Outline of tests for TaskCache

nh13 force-pushed the nh_exec2_replay branch 5 times, most recently from d439a16 to 7c85a28 Compare June 9, 2017 07:50

nh13 force-pushed the nh_exec2_replay branch from 7c85a28 to 5fbb3b2 Compare June 9, 2017 07:58

nh13 force-pushed the nh_exec2 branch from e4e7933 to 508e2c1 Compare June 9, 2017 08:11

nh13 force-pushed the nh_exec2_replay branch from 5fbb3b2 to 6e35f52 Compare June 9, 2017 08:12

nh13 force-pushed the nh_exec2_replay branch from 0d5d71d to db43946 Compare June 10, 2017 19:10

nh13 force-pushed the nh_exec2_replay branch from db43946 to dddfbb9 Compare June 11, 2017 05:33

nh13 mentioned this pull request Jun 11, 2017

New Exec system #278

Closed

nh13 changed the base branch from nh_exec2 to master June 11, 2017 05:34

nh13 force-pushed the nh_exec2_replay branch 4 times, most recently from 561451c to de947d5 Compare June 11, 2017 06:14

nh13 mentioned this pull request Jun 17, 2017

Improve the documentation of Dagr #303

Open

58 tasks

nh13 commented Jun 18, 2017

View reviewed changes

nh13 commented Jun 19, 2017

View reviewed changes

nh13 added 3 commits July 3, 2017 10:56

Fixup full-stack traces

df92b42

Support replay and a major refactor.

67084cb

Refactor task and exec system prior to new execution system.

Adding a new execution system.

fe97ea1

This system can be used with the --experimental-execution option.

nh13 added 5 commits July 3, 2017 10:56

Improving test coverage

7a1225e

ExecutionLoggerTest.scala

1bcba2d

TODO: remove me

01f3c0b

Outline of tests for TaskCache

Refactor TaskCache, now with tests.

09a048f

Small fixups

d9d7caa

nh13 force-pushed the nh_exec2_replay branch from 286b795 to d9d7caa Compare July 3, 2017 17:56

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

WIP: execution replay #299

WIP: execution replay #299

nh13 commented Jun 9, 2017

nh13 commented Jun 9, 2017 •

edited

Loading

codecov-io commented Jun 9, 2017 •

edited

Loading

nh13 commented Jun 10, 2017

nh13 commented Jun 11, 2017

nh13 commented Jun 18, 2017

nh13 Jun 18, 2017

nh13 Jun 18, 2017

nh13 Jun 19, 2017

nh13 Jun 19, 2017

nh13 Jun 19, 2017

WIP: execution replay #299

Are you sure you want to change the base?

WIP: execution replay #299

Conversation

nh13 commented Jun 9, 2017

nh13 commented Jun 9, 2017 • edited Loading

codecov-io commented Jun 9, 2017 • edited Loading

Codecov Report

nh13 commented Jun 10, 2017

nh13 commented Jun 11, 2017

nh13 commented Jun 18, 2017

nh13 Jun 18, 2017

Choose a reason for hiding this comment

nh13 Jun 18, 2017

Choose a reason for hiding this comment

nh13 Jun 19, 2017

Choose a reason for hiding this comment

nh13 Jun 19, 2017

Choose a reason for hiding this comment

nh13 Jun 19, 2017

Choose a reason for hiding this comment

nh13 commented Jun 9, 2017 •

edited

Loading

codecov-io commented Jun 9, 2017 •

edited

Loading