hisat-3n conda installation #374

isaacvock · 2022-05-31T16:15:47Z

Quick question/suggestion: is there a way to install hisat-3n via conda or plans to make a hisat-3n conda recipe? Would be very helpful for integrating into Snakemake alignment workflows

paulinarosales · 2024-03-28T14:46:15Z

Are there any options following this Issue? At the moment I'm finding it very hard to install hisat-3n on an HPC system and I'm not able to use it for Snakemake workflows

isaacvock · 2024-03-30T12:44:47Z

There hasn't been an update to the hisat-3n branch (or any branch for that matter) in about 2 years, so the original developers/maintainers may have graduated or moved on.

I have used hisat-3n to align TimeLapse-seq/SLAM-seq data, but can say from experience that the benefit of using it is somewhat minor. STAR does a pretty good at accurately recovering high-mutation content reads on simulated data, even at simulated rates of s4U incorporation of around 10%. "accurate recovery of high mutation content reads" is judged by the distribution of mutation rate in reads, and the extent to which: 1) the R package I developed (bakR) provides an estimate for the mutation rate in new reads close to the true simulated value., and 2) the estimated fraction of reads that are new is consistently close to the true simulated fraction news. I think STAR was originally designed to be particularly robust to genome-mutations, and thus generally does not penalize mutations as heavily as other alignment inconsistencies, which is probably what helps it in this setting. Therefore, I have just defaulted to using STAR in my pipelines (bam2bakR and fastq2EZbakR, the latter still under development).

The other options besides just using STAR and accepting some loss of high-mutation content reads include:

grandRescue, part of the gedi suite from the Erhard lab, uses STAR in conjunction with some custom tooling to better recover high mutation content reads. They also make a good point in their manuscript that there is nothing that stops you from doing 3-base genome alignment with STAR (and thus simulating the benefits of HISAT-3N), minus the challenge of having to manually impute where T's were originally located in your genome.
NextGenMap has a --slamseq scoring setting that specifically eliminates penalties for T-to-C mutations. This is nice as you get the benefit of aligning to a higher-complexity 4-base genome, while also not penalizing mutations of interest. The downside is that NextGenMap is not splice aware, so if you are working with total RNA-seq data, its best to align to a transcriptome and thus throw out reads from pre-mRNA.

paulinarosales · 2024-04-02T12:07:57Z

Thank you very much for the elaborate response and helpful comments! I'll have a look at the recommended options :)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

hisat-3n conda installation #374

hisat-3n conda installation #374

isaacvock commented May 31, 2022

paulinarosales commented Mar 28, 2024

isaacvock commented Mar 30, 2024

paulinarosales commented Apr 2, 2024

hisat-3n conda installation #374

hisat-3n conda installation #374

Comments

isaacvock commented May 31, 2022

paulinarosales commented Mar 28, 2024

isaacvock commented Mar 30, 2024

paulinarosales commented Apr 2, 2024