Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

optimize PURGEDUPS_SPLITFA process #98

Open
MartinPippel opened this issue May 6, 2024 · 1 comment
Open

optimize PURGEDUPS_SPLITFA process #98

MartinPippel opened this issue May 6, 2024 · 1 comment

Comments

@MartinPippel
Copy link
Contributor

Is your feature request related to a problem? Please describe.
not really, but this step can potentially run faster without copying the compressed file. It also seems that the following line:

def useGzip = !( assembly instanceof List ? assembly.every{ it.name.endsWith(".gz") } : assembly.name.endsWith(".gz") )

is not doing what its supposed to do. (Probably due to the negation symbol at the beginning?) In my case its copying the compressed file:

cat hifiasm-raw-default.asm.bp.p_ctg.fasta.gz > MYSPECIESNAME_hifiasm-purged-default_hap0.merged.fasta.gz

Describe the solution you'd like
So I think Dengfeng's split_fa script can directly read compressed files from stdin see here and the PURGEDUPS_SPLITFA process could potentially be reduced to:

script:
    def args = task.ext.args ?: ''
    def prefix = task.ext.prefix ?: "${meta.id}"
    """
    cat ${prefix}.merged.fasta.gz | split_fa $args - | gzip -c > ${prefix}.split.fasta.gz

    cat <<-END_VERSIONS > versions.yml
    "${task.process}":
        purgedups: \$( purge_dups -h |& sed '3!d; s/.*: //' )
    END_VERS

One potential pitfall might be if the user specifies the - option with the external arguments.

@mahesh-panchal
Copy link
Collaborator

It also seems that the following line:

def useGzip = !( assembly instanceof List ? assembly.every{ it.name.endsWith(".gz") } : assembly.name.endsWith(".gz") )

is not doing what its supposed to do. (Probably due to the negation symbol at the beginning?) In my case its copying the compressed file:

cat hifiasm-raw-default.asm.bp.p_ctg.fasta.gz > MYSPECIESNAME_hifiasm-purged-default_hap0.merged.fasta.gz

This part is correct. Concatenating gzip files results in a valid gzipped file.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants