Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add comment about setting a default prefix that isn't just meta.id #2608

Open
wants to merge 4 commits into
base: main
Choose a base branch
from
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 7 additions & 1 deletion sites/docs/src/content/docs/guidelines/components/modules.md
Original file line number Diff line number Diff line change
Expand Up @@ -390,13 +390,19 @@ Channel names MUST follow `snake_case` convention and be all lower case.
Output file (and/or directory) names SHOULD just consist of only `${prefix}` and the file-format suffix (e.g. `${prefix}.fq.gz` or `${prefix}.bam`).

- This is primarily for re-usability so that other developers have complete flexibility to name their output files however they wish when using the same module.
- As a result of using this syntax, if the module has the same named inputs and outputs then you can add a line in the `script` section like below (another example [here](https://github.com/nf-core/modules/blob/e20e57f90b6787ac9a010a980cf6ea98bd990046/modules/lima/main.nf#L37)) which will raise an error asking the developer to change the `args.prefix` variable to rename the output files so they don't clash.
- As a result of using this syntax, if the module could _potentially_ have the same named inputs and outputs add a line in the `script` section like below (another example [here](https://github.com/nf-core/modules/blob/e20e57f90b6787ac9a010a980cf6ea98bd990046/modules/lima/main.nf#L37)) which will raise an error asking the developer to change the `args.prefix` variable to rename the output files so they don't clash.

```nextflow
script:
if ("$bam" == "${prefix}.bam") error "Input and output names are the same, set prefix in module configuration to disambiguate!"
```

- If the input and output files are likely to have the same name, then an appropriate default prefix may be set, for example:
SPPearce marked this conversation as resolved.
Show resolved Hide resolved
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- If the input and output files are likely to have the same name, then an appropriate default prefix may be set, for example:
- If the input and output files are likely to have the same name, then an appropriate default prefix MAY be set, for example:

I feel this should be left for the developer to decide how the resulting file should be called. The error should make them aware of this.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Setting the default prefix does give the control to the developer, as they can overwrite it in the modules.config. The problem is when it's hard-coded into the output path.
I also added the -C bash flag to the shell directive in the template so that should also prevent accidental clobbering.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem is when it's hard-coded into the output path.

I don't think I follow... isn't the suggestion here to technically embed a hardcoded string?

Copy link
Member

@mahesh-panchal mahesh-panchal Jul 8, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, the example clearly shows prefix having a default of "${meta.id}_sorted". If I wanted something different, I just update in the config

    ext.prefix = { "${meta.id}_mysorted" } 

The command should still look like:

mycommand --input $file > ${prefix}.out

but the file goes from <meta.id>_sorted.out to <meta.id>_mysorted.out

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it is setting a default, but can be entirely overwritten in the usual way

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Disagree on both points (1. it should, pipeline devs should be very aware of the output files, 2. I don't think we should sacrifice flexibility just to avoid a small config file).

Bbut like I said - 'small mound' 😆 Maybe ask for one more opinion and you can merge

I mean, we could say that pipeline devs should make each module from scratch because that would be purer. I don't understand your objection here. This sacrifices no flexibility whatsoever, it defines a default prefix that can then be overwritten in exactly the usual way.

Copy link
Member

@mahesh-panchal mahesh-panchal Jul 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also don't understand the objections. Is there any way you can clarify in a toy example of what the objection is?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I follow James here. IMO it's better to have a common way of doing things in modules so you don't have to check how it's done in every module. If I know every module has the meta.id as default, then I don't have to check the default every time. Adding different defaults could become confusing for some developers. But also not a hill I'm willing to die on :)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My experience with using nf-core modules, when written by someone else is usually that I'll have to check the code anyway to know what the output channels are named at the least, and what inputs I'm expecting. If I then see the

if ( file.name == "${prefix}.ext" ){

I'll know there's a possibility for a filename collision. However, this is where I would expect the module to have a sensible default not to have filename collision if I just plug and play. More often than not though, the current state is that once I use it, I'll discover the default is not actually sensible and have to modify my modules.config to deal with it. This to me is a time waster. If we did use "sensible defaults", rather than "meta.id", then I'll see when I inspect the module to change prefix if I wanted it differently, but I'd rather not assume that ${prefix}.ext will result in a filename collision if I don't set my own prefix anyway. That's at least my experience, which is why I'm for this update.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we should move this discussion to the next maintainers meeting? This seems like a topic that could fit very well in the meetings


```nextflow
def prefix = task.ext.prefix ?: "${meta.id}_sorted"
```

## Input/output options

### Required `path` channel inputs
Expand Down
Loading