Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New module: custom/splitfastqbylane #2837

Open
wants to merge 12 commits into
base: master
Choose a base branch
from

Conversation

anoronh4
Copy link
Contributor

@anoronh4 anoronh4 commented Feb 3, 2023

This is a custom module using awk to split a single fastq or fastq pair into multiple fastqs or fastq pairs, where each comes from a single lane+flowcell source. This is for when the raw data input is merged. This tool uses an awk statement to read the input file and direct the output to different files based on the content of the fastq header.

PR checklist

Closes #2836

  • This comment contains a description of changes (with reason).
  • If you've fixed a bug or added code that should be tested, add tests!
  • If you've added a new tool - have you followed the module conventions in the contribution docs
  • If necessary, include test data in your PR.
  • Remove all TODO statements.
  • Emit the versions.yml file.
  • Follow the naming conventions.
  • Follow the parameters requirements.
  • Follow the input/output options guidelines.
  • Add a resource label
  • Use BioConda and BioContainers if possible to fulfil software requirements.
  • Ensure that the test works with either Docker / Singularity. Conda CI tests can be quite flaky:
    • PROFILE=docker pytest --tag <MODULE> --symlink --keep-workflow-wd --git-aware
    • PROFILE=singularity pytest --tag <MODULE> --symlink --keep-workflow-wd --git-aware
    • PROFILE=conda pytest --tag <MODULE> --symlink --keep-workflow-wd --git-aware

@anoronh4 anoronh4 self-assigned this Feb 3, 2023
@anoronh4 anoronh4 changed the title added custom/splitfastqbylane module New module: custom/splitfastqbylane Feb 8, 2023
@anoronh4 anoronh4 removed their assignment Feb 14, 2023
@SPPearce
Copy link
Contributor

SPPearce commented Jun 3, 2024

Is this module still useful?

@anoronh4
Copy link
Contributor Author

anoronh4 commented Aug 8, 2024

Is this module still useful?

In my and my colleague's opinion yes. We still have previously merged fastq files that come our way, and for base quality recalibration purposes, we align them separately so that each read gets a read-group label based on the flowcell/lane it came from. i asked for feedback on this module some time ago.

@SPPearce
Copy link
Contributor

SPPearce commented Aug 8, 2024

Yes, I see that nobody reviewed the module 😔.
If you are still interested, we can get it finished off and merged in.
It will need swapping to nf-test, but I can help with that.

@adamrtalbot
Copy link
Contributor

Is this module still useful?

In my and my colleague's opinion yes. We still have previously merged fastq files that come our way, and for base quality recalibration purposes, we align them separately so that each read gets a read-group label based on the flowcell/lane it came from. i asked for feedback on this module some time ago.

BQSR will read the lane tag in the BAM file, assuming you're using it correctly.

Other than that the code looks fine but we should update to nf-test now.

@SPPearce
Copy link
Contributor

So shall we swap this to nf-test and merge it in?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

new module: custom/splitfastqbylane
3 participants