-
Notifications
You must be signed in to change notification settings - Fork 240
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Name WDL output files in a human-readable way #5046
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Just minor comments. It seems unlikely, but I just wanted to ask if there's a possibility of longer filenames (possibly too long) with the deduplication.
Thanks for adding the tests!
src/toil/wdl/wdltoil.py
Outdated
""" | ||
Set up the standard library. | ||
|
||
|
||
:param task_path: Dotted WDL name of the part of the wrokflow this library is working for. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
wrokflow
src/toil/wdl/wdltoil.py
Outdated
:param enforce_existence: If true, then if a file is detected as | ||
nonexistent, raise an error. Else, let it pass through | ||
:param share_files_with: Use the same file upload and download paths as | ||
the provided standard library. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This description makes share_files_with
sound like a bool
. Could it be a little more descriptive?
file_to_mountpoint: Dict[str, str], | ||
current_directory_override: Optional[str] = None, | ||
share_files_with: Optional[ToilWDLStdLibBase] = None | ||
): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could a note in the docstring be added for share_files_with
?
This fixes #5008 by cramming WDL task names into the encoded Toil file URIs, along with the UUIDs that identify source directories. When we make a directory to hold the files from a given UUID, we will name it after the task (or workflow) that uploaded the files.
If I run:
I get an output directory like this:
If a task uploads files from multiple directories, we will start adding deduplicating numbers onto the ends of the directories.
Weird cases like a scatter node in a workflow or a subworkflow uploading a file from a directory also referenced by the main workflow should work, but only one WDL task path gets to name the directory we create.
This adds some complexity to devirtualizing files because now you need to keep track of this additional bit of state. We might want to roll it up into an object with the caches. I had to change the
share_files
function to a kwarg on the standard library constructor because I neither wanted nor needed to deal with what would happen if you tried to merge this new state.Since we have one copy of the download logic for outputs and for task input files, tasks should also get nicer input file names for free now.
Changelog Entry
To be copied to the draft changelog by merger:
Reviewer Checklist
issues/XXXX-fix-the-thing
in the Toil repo, or from an external repo.camelCase
that want to be insnake_case
.docs/running/{cliOptions,cwl,wdl}.rst
Merger Checklist