ALU, one of the most successful transposable elements, remains actively mobile in the human genome with a copy number well in excess of 1 million. Detecting ALU insertions, however, proves to be challenging due to the chimera artifacts generated by both PCR and single-cell genome amplification. These artifacts often introduce false positive insertions (Fig1, Fig2).
lowFrequencyInsertion is a tool specifically designed for the sensitive detection of low-frequency ALU insertions. It rules out chimera artifacts, leveraging the fact that the ALU length (~300 bp) closely mirrors the normal fragment length (~400 bp) of the next-generation sequencing (Fig2).
- novoalign version: 3.09.04
- parallel[1] version: 20220722
- pysam version: 0.19.1
- python version: 3.10.5
- samtools version: 1.15.1
lowFI
Detect ALU insertions supported by specific soft-clipped read pairs.
Usage: lowFI [options]
[-i <input file, the absolute path is necessary, bam/sam, mandatory>]
[-o <output file name, suffix will be added automatically, mandatory>]
[-u <upper limit of soft-clipped part length, limit itself is included, optional, default: 130>]
[-l <lower limit of soft-clipped part length, limit itself is included, optional, default: 20>]
[-p <number of jobs to be run in parallel, optional, default: 2>]
[-m <memory per thread used for samtools sort, optional, defalut: 2G>]
[-T <ALU consensus sequences novoalign index file, mandatory>]
[-G <Genome novoalign index file, mandatory>]
[-R <ALU annotation file, bed, mandatory>]
[-X <nonreference insertion detection result, bed, optional>]
[-h <help>]
[1] Tange, Ole. (2018). GNU Parallel 2018. In GNU Parallel 2018 (p. 112). Ole Tange. https://doi.org/10.5281/zenodo.1146014