Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bioinformatics concepts and best practise should be more thoroughly addressed #55

Open
tbooth opened this issue Jul 17, 2024 · 2 comments
Labels
reviewer Issues arising from comments on https://github.com/carpentries-lab/reviews/issues/17

Comments

@tbooth
Copy link
Collaborator

tbooth commented Jul 17, 2024

Comments from @cmeesters on this topic:

  • fastx is totally outdated, tools like cutadept are meanwhile good replacements. It is ok to use fastx for didactics, though, as participants can view all steps for quality processing in detail. A note on the state-of-art should be given regardless.

  • the assembly part comes out of the blue and is unrelated to everything before. If you want it, you need additional material, describing the background. Best put it into a separate chapter (or several), then.

  • genome assembly is an intricate challenge, recommending a relatively outdated tool like velvet is dangerous, as there are numerous follow-up implementation tailored for various genome types.

@tbooth tbooth added the reviewer Issues arising from comments on https://github.com/carpentries-lab/reviews/issues/17 label Jul 17, 2024
@tbooth
Copy link
Collaborator Author

tbooth commented Jul 17, 2024

The choice of fastx and velvet is deliberate. These tools are simple, stable, and serve the purpose of the tutorial which is to show how to orchestrate commands with Snakemake.

I will, as suggested, add notes that these are not the recommended tools for real analysis work. I don't propose to comment on what is the state-of-the-art as this is beyond the scope of the lesson and introduces a further maintenance burden on the lesson maintainer (ie., me).

For the genome assembly, the only thing we need to know is that Velvet is a program that will take a bunch of short reads (in paired FASTQ files) and try to build them into long contigs (output as a FASTA file), and we are aiming to make the longest possible contig by tuning a parameter called "K". Everything else is a distraction from this defined task!

When actually teaching the course, several learners have made the same points given above and asked to go into more detail of the assembly process, or other bioinformatics topics. But this is not the place to learn the "intricate challenge" of actual genome assembly and if any learner starts thinking it is then we are in trouble. I will add instructor notes that this should be clearly emphasised.

@tbooth
Copy link
Collaborator Author

tbooth commented Jul 17, 2024

Also:

Kallisto performs (wording according to docs) a "pseuoalignment" - it is not a classical aligner and
should not be mentioned as such.

Indeed - I'll correct this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
reviewer Issues arising from comments on https://github.com/carpentries-lab/reviews/issues/17
Projects
None yet
Development

No branches or pull requests

1 participant