Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

nf-core download: Container image for Bowtie2 align module does not resolve #2392

Closed
MatthiasZepper opened this issue Aug 8, 2023 · 3 comments
Assignees
Labels
bug Something isn't working download nf-core download won't-fix Cost of fixing is not justified / It is impossible
Milestone

Comments

@MatthiasZepper
Copy link
Member

Description of the bug

While downloading the newest release '2.1.2' of atacseq, I noticed that nf-core download fails to extract the container image string from the most recent versions of the Bowtie2 align module. Since Bowtie2 is a very frequently used module, this will soon affect further pipelines with new releases, too.

I will have time for that earliest in two weeks, so if somebody wants to investigate, go for it!

Command used and terminal output

nf-core download atacseq -r '2.1.2' -r '2.0' --outdir "atacseq" -s 'singularity' -l "quay.io" -l "docker.io" -t

Downloading atacseq
Tue Aug  8 19:35:57 CEST 2023

                                          ,--./,-.
          ___     __   __   __   ___     /,-._.--~\
    |\ | |__  __ /  ` /  \ |__) |__         }  {
    | \| |       \__, \__/ |  \ |___     \`-._,-`-,
                                          `._,._,'

    nf-core/tools version 2.10.dev0 - https://nf-co.re


INFO     Saving 'nf-core/atacseq'
          Pipeline revision: '2.1.2, 2.0, 1.2.2'
          Use containers: 'singularity'
          Container library: 'quay.io, docker.io'
          Using $NXF_SINGULARITY_CACHEDIR': /home/zepper/singularity/nextflowcache'
          Successfully read 568 containers from the remote '$NXF_SINGULARITY_CACHEDIR' contents.
          Output file: 'atacseq.git'
          Enabled for seqeralabs® Nextflow Tower: 'True'
INFO     Collecting workflow from GitHub
INFO     Processing workflow revision 2.1.2, found 35 container images in total.
Collecting container images ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━   0% • 0/35 completedINFO     31 containers are already cached remotely and won't be retrieved.
Downloading singularity images ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% • 35/35 completed
ERROR    Cannot parse container string in '/home/zepper/.config/nfcore/nf-core/atacseq/modules/nf-core/bowtie2/align/main.nf':

             ${ workflow.containerEngine ==

         ⚠ Skipping this singularity image.
INFO     Processing workflow revision 2.0, found 41 container images in total.
Collecting container images ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━   0% • 0/41 completedINFO     41 containers are already cached remotely and won't be retrieved.
Image file exists at destination ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% • 41/41 completed

System information

No response

@MatthiasZepper MatthiasZepper added the bug Something isn't working label Aug 8, 2023
@MatthiasZepper MatthiasZepper added this to the 2.10 milestone Aug 8, 2023
@MatthiasZepper MatthiasZepper self-assigned this Aug 8, 2023
@MatthiasZepper
Copy link
Member Author

With great relief, this error can be somewhat downgraded.

A first exploration of the bug showed that it affects version 2.0, but not in 2.1.2. The presentation of the error message is misleading in this respect, since the error is printed above the INFO Processing workflow revision 2.0, found 41 container images in total. and thus seems to be raised by the other revision.

Obviously, issues in older revisions are vexing, but not as critical as problems with up-to-date versions of pipelines.

@MatthiasZepper
Copy link
Member Author

MatthiasZepper commented Aug 17, 2023

Sadly, this is a won't fix case:

    container "${ workflow.containerEngine == "singularity" && !task.ext.singularity_pull_docker_container ?
        "https://depot.galaxyproject.org/singularity/mulled-v2-ac74a7f02cebcfcc07d8e8d1d750af9c83b4d45a:a0ffedb52808e102887f6ce600d092675bf3528a-0" :
        "quay.io/biocontainers/mulled-v2-ac74a7f02cebcfcc07d8e8d1d750af9c83b4d45a:a0ffedb52808e102887f6ce600d092675bf3528a-0" }"

The problem with this module is, that the same quote character is used for the wrapping quotes and (without being escaped) also for the inner quotes. I am surprised that Nextflow accepts this without hiccups, but I do not see a way to accommodate the regex to that.

I had already worked with a simpler regex r"container\\s*[\"\']([^\"\']*)[\"\']" before, but that unfortunately abridged the matches in more recent DSL2, because the other quote character typically appears inside the container string. So a simple "Either single or double quote" did not fly.

Therefore, we now use a regex that is "aware" of the outer quote type being used. The other quote character is ignored and so are escaped versions of the outer quote type:

re.compile(r"container\s+[\\s{}=$]*(?P<quote>[\'\"])(?P<param>(?:.(?!\1))*.?)\1[\\s}]*", re.DOTALL)
  • container\\s+[\\s{}$=]* matches the literal word "container" followed by whitespace, brackets, equal or variable names
  • (?P<quote>[\'\"]) The quote character is captured into the quote group \1.
  • The pattern (?:.(?!\1))*.? is used to match any character (.) not followed by the closing quote character (?!\1).
  • This capture happens greedy *, but we add a .? to ensure that we don't match the whole file until the last occurrence of the closing quote character, but rather stop at the first occurrence.
  • \1 inserts the matched quote character into the regex, either " or '.
  • It may be followed by whitespace or closing bracket [\s}]*.
  • re.DOTALL is used to account for the string to be spread out across multiple lines.

Evidently, this regex depends on different quotes being used. I also can't match on the correct double quote being accompanied by a closing bracket, since other modules use arbitrary white spaces before/after at this position. So accommodating the regex for this module would break it for others.

@MatthiasZepper MatthiasZepper added the won't-fix Cost of fixing is not justified / It is impossible label Aug 17, 2023
@MatthiasZepper
Copy link
Member Author

Just for future reference: The same issue was also aready discussed in issues of atacseq and chipseq.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working download nf-core download won't-fix Cost of fixing is not justified / It is impossible
Projects
None yet
Development

No branches or pull requests

2 participants