-
Notifications
You must be signed in to change notification settings - Fork 39
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Save run-merged reads for single run samples #262
Comments
I understand the confusion. Since you supplied that single FASTQ to the pipeline, I guess, you do have it somewhere but I can see that it would be more convenient to have all reads together after run merging. |
Yes, I agree it is not ideal. Unfortunately this is an 'intrinsict' thing with the 'react' based design of Nextflow. It's not trivial to work out which is the 'final' FASTQ file that we could publish to a final directory, particularly when you are mixing different data inputs (in this case single run vs multi-run) We've had the same problem in eager too for a very long time. The possible options I can see are:
(3) might be possible but will needs a lot of thought going into it, and time to test all possible scenarios. |
EDIT: I HAVE AN IDEA @Midnighter please validate: what about having a |
I need to look at context to comment on your idea. I was wondering, why not send all fastq to the merging process and publish everything even if they are single runs? |
A cat process that cats a single file into a new file is also redundant: it's just a copy, so to me that's a waste of computing resources, time and disk space |
It could basically be a noop for a single input file but allow to publish all its output files in one directory easily. I'll think about your comment above tomorrow, though. |
OK, I am not so familiar with the Nextflow design. I would likely do some sort of spaghetti with Just stating that you have to activate both
issue because you would now duplicate all the data for the samples with multiple sequencing runs. |
Sorry to be clear, none of the above are actual solutions just partly addressing the problem even if it's 'we can't do it'. Ultimately the issue is working out what is the final step that is run isn't so hard (even if it's a lot conditionals), the problem trying to track samples with the exception where the file to be saved needs to come from the upstream step. In away, this is a different type of metadata than that of just dectecing which processes have been turned on or off |
so for example, for the conditional to save the output from
and for host removal
I think should work, But the question now is how do we extract/collect the the |
If I can get this to work, maybe we could go one step better and at the same time directly generate a samplesheet of the processed reads that can go directly to other pipelines, e.g. mag (idea inspired by @rotifergirl) |
Should also x-ref: nf-core/eager#945 |
Description of the bug
Maybe there is a misunderstanding about what particular parameters of the pipeline mean but I think the way setting the parameter
--save_runmerged_reads
currently performs when run merging is activated is not what I expected to do. I am not sure whether to file this underbug
orenhancement
so apologies for calling it a bug.I have the following use case: I have ten samples with multiple runs that I would like to have merged prior to taxonomic profiling, but I have a single sample with only a single run that doesn't require merging. I would like to have the FastQ files after pre-processing and, if necessary, run-merging for all these files.
At the moment, enabling
--save_runmerged_reads
will give me the FastQ files for all ten samples that required run merging, but not the single FastQ sample that doesn't need it. I haven't enabled any other parameters for keeping the files, e.g. after host removal (--save_hostremoval_unmapped
). Therefore, I am currently lacking the FastQ for this single sample.I personally would have expected that the FastQ files of samples that don't need run merging are still exported into the
results
directory. If this isn't happening on purpose, then I would suggest to add a parameter--save_reads_for_taxprofiling
or something along the lines that allows to get the final FastQ files per sample. I would like to avoid having to export the reads both after the host removal and run merging because it would duplicate all the data for the samples that were run merged.Command used and terminal output
nextflow run nf-core/taxprofiler -r dev -profile eva --input samplelist.csv --databases databaselist.csv --perform_shortread_qc --shortread_qc_tool adapterremoval --shortread_qc_mergepairs --shortread_qc_includeunmerged --shortread_qc_minlength 35 --perform_shortread_hostremoval --perform_runmerging --save_runmerged_reads --run_kraken2
Relevant files
No response
System information
No response
The text was updated successfully, but these errors were encountered: