Skip to content

Commit

Permalink
Merge pull request #67 from christopherwharrop/hotfix/slurm-recover-j…
Browse files Browse the repository at this point in the history
…obid

Hotfix/slurm recover jobid
  • Loading branch information
christopherwharrop authored May 30, 2019
2 parents 91569ac + 59f51d7 commit 48b200a
Showing 1 changed file with 11 additions and 3 deletions.
14 changes: 11 additions & 3 deletions lib/workflowmgr/slurmbatchsystem.rb
Original file line number Diff line number Diff line change
Expand Up @@ -289,6 +289,10 @@ def submit(task)
queued_jobs=""
errors=""
exit_status=0

# Wait a few seconds for information to propagate before trying to look if job was still submitted
sleep(5)

begin

# Get the username of this process
Expand Down Expand Up @@ -316,9 +320,9 @@ def submit(task)
# Look for a job that matches the randomID we inserted into the comment
queued_jobs.split("\n").each { |job|

# Skip headers
next if job=~/CLUSTER/
next if job=~/JOBID/
# Skip headings
next if job[0..4] == 'JOBID'
next if job[0..7] == 'CLUSTER:'

# Extract job id
jobid=job[0..39].strip
Expand All @@ -331,6 +335,10 @@ def submit(task)
end
}

WorkflowMgr.stderr("WARNING: Unable to retrieve jobid after sbatch failed with socket time out when submitting #{task.attributes[:name]}",1)

return nil,output

else
WorkflowMgr.stderr("WARNING: job submission failed: #{output}", 1)
return nil,output
Expand Down

0 comments on commit 48b200a

Please sign in to comment.