new files

wrench-project · Apr 29, 2021 · 7aaff12 · 7aaff12
1 parent f90fdad
commit 7aaff12
Show file tree

Hide file tree

Showing 3 changed files with 315 additions and 0 deletions.
diff --git a/papers/ANNOTATED_BIBLIOGRAPHY.md b/papers/ANNOTATED_BIBLIOGRAPHY.md
@@ -0,0 +1,147 @@
+
+---
+
+### Batch Queue Resource Scheduling for Workflow Applications
+by Zhang, Koelbel, Cooper
+
+[PDF](./papers/zhang_batch.pdf)
+
+#### Relevant quotes:
+
+  - "Our work seeks to reduce the workflow turnaroud time by
+intelligently using batch queues without relying on reservations"
+  - "One advanced feature of Maui that we do use is the start time estimation functionality"
+  - "Our work draws inspiration from [pilot jobs], but attempt to choose a more propitious size for the [pilot jobs]"
+
+#### General approach:
+
+  - Decide a group of next levels in the DAG to run as a single job,  based on
+	  estimated wait time divided by runtime
+  - Submit a corresponding pilot job
+  - When the pilot job starts, start running the tasks and 
+	  repeat the above once to submit the next pilot job
+     - <b>So while a pilot job runs, another one travels through the queue</b>
+ - If a running job expires with tasks sill running / not done,
+	  then "put them back in the DAG" and cancel all pilot jobs
+ - Repeat until everything is executed
+
+The "meat" is in the first step: how to decide how many levels to aggregate
+in a job. Their answer is a greedy algorithm that looks the the
+configuration with the lowest waittime/runtime ratio, but stops at the
+first local minimum it finds (trying 1 level, 2 levels, etc...)
+
+One question: they never say how many hosts their ask for for a pilot job.
+I believe they ask for the full parallelism. Of course it would be easy to 
+consider all kings of options there as well. An easy extension of their
+work?
+
+
+
+
+#### More details
+
+```
+  - applyGroupingHeuristic(DAG)
+  - Wait for an event
+  - If event is "pilot job started":
+    - schedule all the tasks for that pilot job
+    - applyGroupingHeuristic(DAG) 
+  - If event is "pilot job has expired":
+    - cancel all pending pilot jobs 
+    - "undo" tasks that were killed or not executed 
+      and put them back into 
+    - applyGroupingHeuristic(DAG)
+  - If event is "a task has finished"
+  	 - mark it as done, make children ready
+  	 - if the pilot job that did that task has no more tasks 
+  	   to do, then terminate that pilot job
+```
+
+The above has at most one pending pilot job in the queue. The authors talk about "interference" between pilot jobs...
+The "smarts" of the above algorithm are in the applyGroupingHeuristic() function. 
+
+```
+applyGroupingHeuristic():
+	- job_description = groupByLevel()
+	- If the number of consecutive levels is the full DAG
+	  height then:
+	    -  If the expected runtime <  expected wait time / 2
+	       then run the whole DAG as on batch job 
+	    - Otherwise, use one batch job per task
+	- Otherwise, submit a pilot job for running the consecutive
+	  levels together
+```
+
+The above algorithm calls the the groupByLevel() function. In case that function says "run the whole DAG" then there are two options
+using a simple heuristic to decide "as one job" or "as n jobs for n tasks".  
+
+```
+groupByLevel():
+   - ratio[0] = + infty
+   - For i=1 to DAG.height
+     - Consider running the next i levels in one pilot job
+   	    - (using as many host as parallelism? not specified)
+   	  - ratio[i] = estimated wait time / execution time
+   	  - if ratio[i] > ratio[i-1], break
+   	  
+   - return (i-1, estimated wait time, execution time, number_of_hosts)
+```
+
+
+groupByLevel() is the main heuristic really, which decides how to aggregate
+by level.
+
+
+#### Papers referencing it
+  - We looked at them and only the Shahid, Raza, .. (see below) seems marginally related
+
+---
+
+### Using Imbalance Metrics to Optimize Task Clustering in Scientific Workflow Executions
+by Chen et al.
+
+[PDF](./papers/chen-fgcs-2014.pdf)
+
+This is a pretty straightfoward paper: each task execution has some overhead, so we should cluster tasks to reduce that overhead
+
+A few key assumptions:
+
+  - Tasks are sequential (fine)
+  - When aggregating tasks in job, the tasks run in sequence (not fine)
+  - There is a clustering overhead (which we could simulate easily)
+  - All clustering is off-line
+
+Two clustering approaches:
+
+  - "Vertical clustering" 
+  - "Horizontal clustering" 
+
+
+Vertical clustering is about data locality (we don't care)
+
+Horizontal cluster is simple: when creating clusters of independent tasks
+in a level, try to create clusters with as close as possible runtimes to
+promote load-balancing.
+
+
+This paper has a good Related Work section, which should serve as
+inspiration.
+
+--- 
+
+### Level based batch scheduling strategy with idle slot reduction under DAG constraints for computational grid
+
+by Shahid, Raza, Sajid
+
+[PDF](./papers/1-s2.0-S0164121215001260-main.pdf)
+
+Pretty far from what we do.  They have sets of DAGs.  They schedule sets of
+DAG levels to reduce flow time (HEFT within each level, careful scheduling
+of DAG levels to outperform seom global HEFT).  No notion of clustering to
+hide submission overhead.  No notion of background load (machine is
+dedicated).  Metric is over the set of workflows.  
+Not even clear that we should reference it.  
+The "communication overlap" is simply picking whether to run a job locally or
+on a "remote" node. 
+
+
diff --git a/scripts/p3_plott_gantt_chart.py b/scripts/p3_plott_gantt_chart.py
@@ -0,0 +1,84 @@
+#!/usr/bin/env python2.7
+#
+# This is a really simple Python script to do quick visualization of
+# a schedule. Workflow tasks in red, and background tasks in green.
+########################################################################
+
+import random
+import matplotlib.patches as patches
+import matplotlib.pyplot as pp
+import sys
+
+
+
+if __name__ == '__main__':
+
+    if (len(sys.argv) != 2):
+        print >> sys.stderr, "Usage: ./plot_gantt_chart.py <csv file>"
+        exit(-1)
+
+    # Read all input
+    fname = sys.argv[1]
+    with open(fname) as f:
+        lines = f.readlines()
+        lines = [x.strip() for x in lines]
+
+    # Create a plot
+    pp.figure()
+    ax = pp.gca()
+    pp.xlabel('Time')
+    pp.ylabel('Processors')
+
+    min_x = -1
+    max_x = -1
+    min_y = -1
+    max_y = -1
+
+    min_red = 0.15
+    max_red = 1.00
+    current_red = min_red
+
+    for line in lines: 
+        tokens = line.split(",")
+        start_time = float(tokens[8])
+        finish_time = float(tokens[3])
+        desired_color = tokens[5].split(":")[1].split('"')[0]
+
+        if (desired_color == "red"):
+            rgb = (current_red, 0.2, 0.2)
+            current_red += .17
+            if (current_red > max_red):
+                current_red = min_red
+        else:
+            rgb = (0.0, 0.7, 0.0)
+
+        processors = tokens[0]
+        proc_meta_tokens = processors.split(' ')
+        for meta_token in proc_meta_tokens:
+            proc_tokens = meta_token.split('-')
+            min_proc = int(proc_tokens[0])
+            if len(proc_tokens) == 2:
+                max_proc = int(proc_tokens[1])
+            else:
+                max_proc = int(proc_tokens[0])
+
+            #print "Drawing : (", min_proc, ",", start_time, "), width=", (finish_time - start_time), " height=", (max_proc - min_proc), "color=", desired_color
+            rect = patches.Rectangle([start_time, min_proc], (finish_time - start_time), (max_proc - min_proc + 1), facecolor=rgb, edgecolor="black", linewidth=0.2)
+            ax.add_patch(rect)
+
+            if (min_x == -1) or (min_x > start_time):
+                min_x = start_time
+            if (max_x == -1) or (max_x < finish_time):
+                max_x = finish_time
+            if (min_y == -1) or (min_y > min_proc):
+                min_y = min_proc
+            if (max_y == -1) or (max_y < max_proc + 1):
+                max_y = max_proc + 1
+
+
+    print(min_x, max_x, min_y, max_y)
+
+    ax.set_xlim([min_x,max_x])
+    ax.set_ylim([min_y,max_y])
+    pp.savefig('/tmp/gantt.pdf')
+    print("Figure saved in /tmp/gantt.pdf")
diff --git a/scripts/plot_gantt_chart.py b/scripts/plot_gantt_chart.py
@@ -0,0 +1,84 @@
+#!/usr/bin/env python2.7
+#
+# This is a really simple Python script to do quick visualization of
+# a schedule. Workflow tasks in red, and background tasks in green.
+########################################################################
+
+import random
+import matplotlib.patches as patches
+import matplotlib.pyplot as pp
+import sys
+
+
+
+if __name__ == '__main__':
+
+    if (len(sys.argv) != 2):
+        print >> sys.stderr, "Usage: ./plot_gantt_chart.py <csv file>"
+        exit(-1)
+
+    # Read all input
+    fname = sys.argv[1]
+    with open(fname) as f:
+        lines = f.readlines()
+        lines = [x.strip() for x in lines]
+
+    # Create a plot
+    pp.figure()
+    ax = pp.gca()
+    pp.xlabel('Time')
+    pp.ylabel('Processors')
+
+    min_x = -1
+    max_x = -1
+    min_y = -1
+    max_y = -1
+
+    min_red = 0.15
+    max_red = 1.00
+    current_red = min_red
+
+    for line in lines: 
+        tokens = line.split(",")
+        start_time = float(tokens[8])
+        finish_time = float(tokens[3])
+        desired_color = tokens[5].split(":")[1].split('"')[0]
+
+        if (desired_color == "red"):
+            rgb = (current_red, 0.2, 0.2)
+            current_red += .17
+            if (current_red > max_red):
+                current_red = min_red
+        else:
+            rgb = (0.0, 0.7, 0.0)
+
+        processors = tokens[0]
+        proc_meta_tokens = processors.split(' ')
+        for meta_token in proc_meta_tokens:
+            proc_tokens = meta_token.split('-')
+            min_proc = int(proc_tokens[0])
+            if len(proc_tokens) == 2:
+                max_proc = int(proc_tokens[1])
+            else:
+                max_proc = int(proc_tokens[0])
+
+            #print "Drawing : (", min_proc, ",", start_time, "), width=", (finish_time - start_time), " height=", (max_proc - min_proc), "color=", desired_color
+            rect = patches.Rectangle([start_time, min_proc], (finish_time - start_time), (max_proc - min_proc + 1), facecolor=rgb, edgecolor="black", linewidth=0.2)
+            ax.add_patch(rect)
+
+            if (min_x == -1) or (min_x > start_time):
+                min_x = start_time
+            if (max_x == -1) or (max_x < finish_time):
+                max_x = finish_time
+            if (min_y == -1) or (min_y > min_proc):
+                min_y = min_proc
+            if (max_y == -1) or (max_y < max_proc + 1):
+                max_y = max_proc + 1
+
+
+    print min_x, max_x, min_y, max_y
+
+    ax.set_xlim([min_x,max_x])
+    ax.set_ylim([min_y,max_y])
+    pp.savefig('/tmp/gantt.pdf')
+    print "Figure saved in /tmp/gantt.pdf"