Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

hisat-3n conda installation #374

Open
isaacvock opened this issue May 31, 2022 · 3 comments
Open

hisat-3n conda installation #374

isaacvock opened this issue May 31, 2022 · 3 comments

Comments

@isaacvock
Copy link

Quick question/suggestion: is there a way to install hisat-3n via conda or plans to make a hisat-3n conda recipe? Would be very helpful for integrating into Snakemake alignment workflows

@paulinarosales
Copy link

Are there any options following this Issue? At the moment I'm finding it very hard to install hisat-3n on an HPC system and I'm not able to use it for Snakemake workflows

@isaacvock
Copy link
Author

There hasn't been an update to the hisat-3n branch (or any branch for that matter) in about 2 years, so the original developers/maintainers may have graduated or moved on.

I have used hisat-3n to align TimeLapse-seq/SLAM-seq data, but can say from experience that the benefit of using it is somewhat minor. STAR does a pretty good at accurately recovering high-mutation content reads on simulated data, even at simulated rates of s4U incorporation of around 10%. "accurate recovery of high mutation content reads" is judged by the distribution of mutation rate in reads, and the extent to which: 1) the R package I developed (bakR) provides an estimate for the mutation rate in new reads close to the true simulated value., and 2) the estimated fraction of reads that are new is consistently close to the true simulated fraction news. I think STAR was originally designed to be particularly robust to genome-mutations, and thus generally does not penalize mutations as heavily as other alignment inconsistencies, which is probably what helps it in this setting. Therefore, I have just defaulted to using STAR in my pipelines (bam2bakR and fastq2EZbakR, the latter still under development).

The other options besides just using STAR and accepting some loss of high-mutation content reads include:

  1. grandRescue, part of the gedi suite from the Erhard lab, uses STAR in conjunction with some custom tooling to better recover high mutation content reads. They also make a good point in their manuscript that there is nothing that stops you from doing 3-base genome alignment with STAR (and thus simulating the benefits of HISAT-3N), minus the challenge of having to manually impute where T's were originally located in your genome.
  2. NextGenMap has a --slamseq scoring setting that specifically eliminates penalties for T-to-C mutations. This is nice as you get the benefit of aligning to a higher-complexity 4-base genome, while also not penalizing mutations of interest. The downside is that NextGenMap is not splice aware, so if you are working with total RNA-seq data, its best to align to a transcriptome and thus throw out reads from pre-mRNA.

@paulinarosales
Copy link

Thank you very much for the elaborate response and helpful comments! I'll have a look at the recommended options :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants