Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Simplify samshee module #268

Open
grst opened this issue Oct 1, 2024 · 4 comments
Open

Simplify samshee module #268

grst opened this issue Oct 1, 2024 · 4 comments
Assignees
Labels
enhancement Improvement for existing functionality

Comments

@grst
Copy link
Member

grst commented Oct 1, 2024

Description of feature

Sameshee now has a proper CLI implemented, see
https://github.com/lit-regensburg/samshee?tab=readme-ov-file#commandline-tool

With that, I think we should

  • get rid of the custom python script
  • turn samshee into a nf-core/modules module.
@grst grst added the enhancement Improvement for existing functionality label Oct 1, 2024
@grst
Copy link
Member Author

grst commented Oct 7, 2024

I think this is a bit higher priority now as I had a samplesheet that failed (and shouldn't) with samshee v0.1.12, but passes with v0.2.0.

In particular, lines like

[Header]
Description,,,,,,,,

that have an empty value in col2 led to an error.

So let's prioritize this and make a samshee v0.2.0 module please.

@apeltzer @atrigila @nschcolnicov

@nschcolnicov nschcolnicov self-assigned this Oct 7, 2024
@nschcolnicov
Copy link
Contributor

Created a PR for this: @grst @atrigila nf-core/modules#6749. Let me know what you think.

@grst
Copy link
Member Author

grst commented Oct 8, 2024

It wasn't evident for me what's the default logic of the samshee CLI so I did some digging:

  • When the output format of samshee is Illumina v2, it will always validate the samplesheet as Illumina V2 sheet, even when no --schema option is present.
  • Therefore, a V1 samplesheet will always fail unless --output-format sectioned is given. People with V1 samplesheets can either skip samshee altogether or specify this as a tool option. Either way, samshee doesn't properly validate V1 samplesheets except that the content is parseable since there is no suitable schema available until now). I think this is OK behavior as long as this is properly documented.
  • The samplesheet could have not data section and still be valid. If we want to catch that, we'd need to add --schema '{"required": ["BCLConvert_Data"]}'. We can discuss if we should enable certain rules in the pipeline by default, but the standard V2 samplesheet logic is probably already a good start.

@nschcolnicov
Copy link
Contributor

nschcolnicov commented Oct 10, 2024

Sorry for taking so long to get back to you, I was focused on the template update. Regarding the output-format sectioned , I tested it on the test profile for example and it still makes it fail. Regardless this is something we could add to the pipeline by having a parameter such as "samplesheet_version" that can set this argument on and off depending on its value.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Improvement for existing functionality
Projects
None yet
Development

No branches or pull requests

2 participants