Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow for flexible columns in manifest for vector alignment genotyping pipeline #71

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

tnguyensanger
Copy link
Contributor

@tnguyensanger tnguyensanger commented Dec 9, 2020

Addresses #70

NB: this will fail miniwdl validation. But it passes womtool validation and runs on cromwell server v49. I suspect it's because miniwdl doesn't support read_objects() WDL function from WDL v1.0, but not 100% sure.

Array[Object] lanelet_infos = read_objects(per_sample_manifest_file)

The preferred approach is to get miniwdl to recognize the WDL as valid, whether that means to modify the WDL or miniwdl settings. Another approach is to switch github actions to use womtool instead of miniwdl for validation. However, womtool syntax validation is not nearly as thorough as miniwdl.

… as they contain at least sample_id, irods_path
@tnguyensanger
Copy link
Contributor Author

tnguyensanger commented Dec 10, 2020

Surfacing communications with @gbggrant. TL;DR: Try using a Struct instead of Object.

From: George Grant [email protected]
Date: Thursday, 10 December 2020 at 13:39
To: Thuy Nguyen [email protected]
Subject: Re: Advice on miniwdl [EXT]

Hi Thuy,

Here's what I got back from Mike Lin ([email protected]):

Hi George, yea we've so far punted on this since the miniwdl project started shortly after the OpenWDL group moved towards deprecating Object. The object{} literal syntax is there only for building structs. It could be finished if it really unlocked some high-value use case, but otherwise we were hoping to get away with this approach. Is there anything insufficient with the newer structs that prevent using them here? (Of course I understand it's not always possible to invest time in fixing what ain't broke)

Not 100% sure I understand what he's suggesting - but it sounds like you should be able to replace Object with a Struct. I'll forward on the email chain to you if you want to follow up directly with him.

George

@tnguyensanger
Copy link
Contributor Author

I tried replacing Array[Object] with Array[Map[String, String]] and got the following error:

$ miniwdl check ~/gitrepo/pipelines/releases/BatchImportShortReadAlignmentAndGenotyping/BatchImportShortReadAlignmentAndGenotyping_0.1.0.wdl
(/Users/tn6/gitrepo/pipelines/releases/BatchImportShortReadAlignmentAndGenotyping/BatchImportShortReadAlignmentAndGenotyping_0.1.0.wdl Ln 16 Col 1) Failed to import ImportShortReadAlignment.wdl
(ImportShortReadAlignment.wdl Ln 21 Col 1) Failed to import ImportShortReadLaneletAlignment.wdl
(ImportShortReadLaneletAlignment.wdl Ln 36 Col 46) No such function: read_objects
      Array[Map[String, String]] lanelet_infos = read_objects(per_sample_manifest_file)

I also tried using Array[struct]. I have to admit I have little experience with wdl structs, so I’m not sure if I used the correct syntax. However, I still get the same read_objects error.

$  miniwdl check ~/gitrepo/pipelines/releases/BatchImportShortReadAlignmentAndGenotyping/BatchImportShortReadAlignmentAndGenotyping_0.1.0.wdl
(/Users/tn6/gitrepo/pipelines/releases/BatchImportShortReadAlignmentAndGenotyping/BatchImportShortReadAlignmentAndGenotyping_0.1.0.wdl Ln 16 Col 1) Failed to import ImportShortReadAlignment.wdl
(ImportShortReadAlignment.wdl Ln 21 Col 1) Failed to import ImportShortReadLaneletAlignment.wdl
(ImportShortReadLaneletAlignment.wdl Ln 37 Col 35) No such function: read_objects
      Array[Manifest] lanelet_infos = read_objects(per_sample_manifest_file)
                                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(ImportShortReadLaneletAlignment.wdl Ln 40 Col 25) Not an array
        String irods_path = lanelet_infos[idx][LANELET_INFO_COLNAME_IRODS_PATH]
                            ^^^^^^^^^^^^^^^^^^

Where Manifest struct is defined in Manifest.wdl:

version 1.0
 
struct Manifest {
  String sample_id
  String irods_path
}

And imported by ImportShortReadLaneletAlignment.wdl:

import "Manifest.wdl"

Copy link
Contributor

@gbggrant gbggrant left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. Not sure what to do about miniwdl validation error.


command <<<

python <<CODE
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should be inset (both for the the CODE block, and then for the embedded python)

@tnguyensanger
Copy link
Contributor Author

More communications with Mike Lin, developer of Miniwdl. Tl;dr: try miniwdl v0.10.0:
https://github.com/chanzuckerberg/miniwdl/releases/tag/v0.10.0

From: Mike Lin [email protected]
Date: Monday, 4 January 2021 at 06:45
To: Thuy Nguyen [email protected]
Cc: George Grant [email protected]
Subject: Re: Validation error from miniwdl [EXT]

Dear Thuy, the new miniwdl v0.10.0 [github.com] implements read_object() and read_objects() but only to load Map[String,String] (which can in turn be coerced to a struct type) -- it still lacks general Object support. I hope this helps you, any feedback welcome! Mike

On Tue, Dec 15, 2020 at 8:34 PM Thuy Nguyen [email protected] wrote:
Hi

We quite like miniwdl for wdl validation. It’s easy to use and is very good at catching syntax issues. Adding read_objects() support is most preferable for us. What is your rough estimate for when such support could be available? I realize we have been discussing all of this quite informally. I’m happy to create an official feature request in the github if it’s easier.

Many thanks!
Thuy

From: Mike Lin [email protected]
Date: Monday, 14 December 2020 at 23:11
To: Thuy Nguyen [email protected]
Cc: George Grant [email protected]
Subject: Re: Validation error from miniwdl [EXT]

Dear Thuy, sorry for these problems -- miniwdl's support for Object is incomplete for the reasons I mentioned with George below, and read_object() and read_objects() in particular are absent. We can work with you to converge on a solution though.
• If you're open to moving towards structs, read_json() should currently work (but would require the manifest to be formatted as JSON instead of TSV)
• We could add read_object() and read_objects() only for the purpose of initializing structs from TSV
• Fully completing Object support is the least-preferred option for us for the reasons mentioned below but it's possible if it really unblocks something critical
Let me know what you think, any other feedback welcome too!

Hi George, yea we've so far punted on this since the miniwdl project started shortly after the OpenWDL group moved towards deprecating Object. The object{} literal syntax is there only for building structs. It could be finished if it really unlocked some high-value use case, but otherwise we were hoping to get away with this approach. Is there anything insufficient with the newer structs that prevent using them here? (Of course I understand it's not always possible to invest time in fixing what ain't broke)

On Mon, Dec 14, 2020 at 11:50 AM Thuy Nguyen [email protected] wrote:
Hi Mike

I’m the developer that George was referring to earlier. I hope you and George don’t mind if I contact you directly. I’m wondering if read_objects() from WDL v1.0 is supported by miniwdl?

Our use case is to read in a TSV with a header such that the columns can be in any order. I wasn’t sure of any other way in WDL v1.0 to do this other than Array[Objects] arr_of_row_objects = read_objects(tsv_with_header). However, I’m definitely up for alternatives.

I tried replacing Array[Object] with Array[Map[String, String]] and got the following error:

$ miniwdl check ~/gitrepo/pipelines/releases/BatchImportShortReadAlignmentAndGenotyping/BatchImportShortReadAlignmentAndGenotyping_0.1.0.wdl
(/Users/tn6/gitrepo/pipelines/releases/BatchImportShortReadAlignmentAndGenotyping/BatchImportShortReadAlignmentAndGenotyping_0.1.0.wdl Ln 16 Col 1) Failed to import ImportShortReadAlignment.wdl
(ImportShortReadAlignment.wdl Ln 21 Col 1) Failed to import ImportShortReadLaneletAlignment.wdl
(ImportShortReadLaneletAlignment.wdl Ln 36 Col 46) No such function: read_objects
Array[Map[String, String]] lanelet_infos = read_objects(per_sample_manifest_file)

I also tried using Array[struct]. I have to admit I have little experience with wdl structs, so I’m not sure if I used the correct syntax. However, I still get the same read_objects error.

$ miniwdl check ~/gitrepo/pipelines/releases/BatchImportShortReadAlignmentAndGenotyping/BatchImportShortReadAlignmentAndGenotyping_0.1.0.wdl
(/Users/tn6/gitrepo/pipelines/releases/BatchImportShortReadAlignmentAndGenotyping/BatchImportShortReadAlignmentAndGenotyping_0.1.0.wdl Ln 16 Col 1) Failed to import ImportShortReadAlignment.wdl
(ImportShortReadAlignment.wdl Ln 21 Col 1) Failed to import ImportShortReadLaneletAlignment.wdl
(ImportShortReadLaneletAlignment.wdl Ln 37 Col 35) No such function: read_objects
Array[Manifest] lanelet_infos = read_objects(per_sample_manifest_file)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(ImportShortReadLaneletAlignment.wdl Ln 40 Col 25) Not an array
String irods_path = lanelet_infos[idx][LANELET_INFO_COLNAME_IRODS_PATH]
^^^^^^^^^^^^^^^^^^

Where Manifest struct is defined in Manifest.wdl:

version 1.0

struct Manifest {
String sample_id
String irods_path
}

And imported by ImportShortReadLaneletAlignment.wdl:

import "Manifest.wdl"

Any suggestions on how to get past the “No such function: read_objects” error?

Many thanks!
Thuy Nguyen
Senior Bioinformatician
Malaria Programme
Wellcome Sanger Institute

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants