Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Virtualize filenames as in-container paths from point of view of WDL command #4527

Merged
merged 6 commits into from
Jul 25, 2023

Conversation

adamnovak
Copy link
Member

@adamnovak adamnovak commented Jul 11, 2023

This should fix problems with WDL file manipulation functions not working right when called from command placeholder substitutions.

Now we have the "virtual" file name space be the inside of the container path space during WDL task commands, while the devirtualized path space is always the host side path space.

This should fix #4514 for files that aren't just converted from strings where that isn't really allowed, and should also fix #4515.

It doesn't really do anything for #4516, but is sort of relevant. With this, you only get the in-container path inside the command itself. If you go File to String elsewhere in the task, you get a URI.

Changelog Entry

To be copied to the draft changelog by merger:

  • WDL commands now see in-container paths for files but devirtualize to out-of-container paths

Reviewer Checklist

  • Make sure it is coming from issues/XXXX-fix-the-thing in the Toil repo, or from an external repo.
    • If it is coming from an external repo, make sure to pull it in for CI with:
      contrib/admin/test-pr otheruser theirbranchname issues/XXXX-fix-the-thing
      
    • If there is no associated issue, create one.
  • Read through the code changes. Make sure that it doesn't have:
    • Addition of trailing whitespace.
    • New variable or member names in camelCase that want to be in snake_case.
    • New functions without type hints.
    • New functions or classes without informative docstrings.
    • Changes to semantics not reflected in the relevant docstrings.
    • New or changed command line options for Toil workflows that are not reflected in docs/running/{cliOptions,cwl,wdl}.rst
    • New features without tests.
  • Comment on the lines of code where problems exist with a review comment. You can shift-click the line numbers in the diff to select multiple lines.
  • Finish the review with an overall description of your opinion.

Merger Checklist

  • Make sure the PR passes tests.
  • Make sure the PR has been reviewed since its last modification. If not, review it.
  • Merge with the Github "Squash and merge" feature.
    • If there are multiple authors' commits, add Co-authored-by to give credit to all contributing authors.
  • Copy its recommended changelog entry to the Draft Changelog.
  • Append the issue number in parentheses to the changelog entry.

@adamnovak
Copy link
Member Author

I tested this manually with:

cat >test.wdl <<'EOF'
version 1.0
workflow wf {
  input {}
  call t {
  }
  output {
    File out = t.out
  }
}

task t {
  input {}
  File f = write_lines(["hi"]) 
  command <<<
    echo ~{f} > output.txt
    echo ~{sep="," read_lines(f)} >>output.txt
    echo ~{size(f)} >>output.txt
    echo ~{write_lines(["new file in placeholder"])} >>output.txt 
  >>>
  output {
    File out = "output.txt"
  }
}
EOF
DOCKER_HOST=unix:///Users/anovak/.docker/run/docker.sock toil-wdl-runner test.wdl --logDebug --retryCount 0

The DOCKER_HOST part is just for me, to make sure Toil talks to Docker Desktop. at its non-standard location.

The output looks like this now (instead of crashing):

/mnt/miniwdl_task_container/work/_miniwdl_inputs/0/tmp4au6lvoy
hi
3.000000
/mnt/miniwdl_task_container/work/_miniwdl_inputs/0/tmpxbsc7t9s

I think this is right: Files coming into the command via bindings and via expressions evaluated in placeholders both produce in-container paths, but Files can still be operated on by the standard library functions.

@adamnovak adamnovak requested a review from stxue1 July 11, 2023 16:53
@adamnovak
Copy link
Member Author

Shloka said this works for her real workflow.

@stxue1
Copy link
Contributor

stxue1 commented Jul 11, 2023

This looks good to me. The fix runs right on my machine and it looks similar to how MiniWDL deals with placeholder expressions. A bunch of the conformance tests all run properly on my end as well.


if result is None:
# We really shouldn't have files in here that we didn't virtualize.
raise RuntimeError(f"File {filename} in container is not mounted from the host and can't be opened form the host")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

opened form the host

Copy link
Member

@DailyDreaming DailyDreaming left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Minor comment.

@adamnovak adamnovak merged commit 7cf8c33 into master Jul 25, 2023
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants