miPyRNA

miPyRNA: a python-based package for small RNA-Seq data analysis

Today, massive amounts of data are generated by Next-Generation Sequencing (NGS) technologies, enabling the exploration of small RNA profiles, including microRNAs (miRNAs). In recent years, numerous algorithms, statistical methods, and software tools have been developed to address the specific steps of miRNA analysis, such as identification, quantification, and differential expression analysis. However, a streamlined and reproducible workflow for miRNA data analysis remains a significant challenge.

To address this, we have developed a Python package, miPyRNA, designed specifically for efficient, manageable, and reproducible miRNA analysis from NGS data. This tool integrates current software with custom Python scripts, providing users with a versatile platform for miRNA data processing. Unlike other tools that confine users to pre-defined workflows, miPyRNA allows for greater flexibility by combining widely used command-line tools with tailored Python-based functionality. This approach enables fast and accurate identification of miRNAs, differential expression analysis, and downstream functional studies, empowering researchers to gain deeper insights into the regulatory roles of miRNAs in biological processes.

Input

miPyRNA requires a input file containing information of samples and input read files. Input template and example files here:

# Project title/Information lines should start with #
SampleName	Replication	Identifier	File1	File2
AddFull Sample Name Here	Add Replication Here	Add sample Identifier Here	Add Sample File Name Here	Add Reverese File here if Paired END

Example input file:

#Arabidopsis transcriptome study under high light stress
SampleName	Replication	Identifier	File1	File2
GL0.5h1	GL0.5h1	GL0.5	SRR6767632_001.fastq.gz	SRR6767632_002.fastq.gz
GLO.5h2	GLO.5h2	GL0.5	SRR6767633_001.fastq.gz	SRR6767633_002.fastq.gz
GL6h1	GL6h1	GL6	SRR6767634_001.fastq.gz	SRR6767634_002.fastq.gz
GL6h2	GL6h2	GL6	SRR6767635_001.fastq.gz	SRR6767635_002.fastq.gz
GL12h1	GL12h1	GL12	SRR6767636_001.fastq.gz	SRR6767636_002.fastq.gz
GL12h2	GL12h2	GL12	SRR6767637_001.fastq.gz	SRR6767637_002.fastq.gz
GL24h1	GL24h1	GL24	SRR6767639_001.fastq.gz	SRR6767639_002.fastq.gz
GL24h2	GL24h2	GL24	SRR6767640_001.fastq.gz	SRR6767640_002.fastq.gz
GL48h1	GL48h1	GL48	SRR6767642_001.fastq.gz	SRR6767642_002.fastq.gz
GL48h2	GL48h2	GL48	SRR6767643_001.fastq.gz	SRR6767643_002.fastq.gz
GL72h1	GL72h1	GL72	SRR6767644_001.fastq.gz	SRR6767644_002.fastq.gz
GL72h2	GL72h2	GL72	SRR6767645_001.fastq.gz	SRR6767645_002.fastq.gz

Analysis approach

miPyRNA Small RNA-Seq Data Analysis Workflow

Steps

Quality Control
- Perform an initial assessment of raw sequencing reads to ensure data quality.
- Use tools like FastQC or custom scripts to evaluate sequence quality, GC content, and adapter contamination.
Adapter Trimming
- Remove adapter sequences and low-quality bases from the raw reads using tools like Cutadapt or Trimmomatic.
- Generate clean, high-quality reads for downstream analysis.
Read Mapping
- Align trimmed reads to the reference genome or small RNA databases (e.g., miRBase) using tools like Bowtie or HISAT2, optimized for small RNA sequences.
miRNA Identification
- Use deep learning-based models for identifying known and novel miRNAs in plants and animals.
- Train and implement neural networks tailored for miRNA recognition, leveraging features such as sequence composition, secondary structure, and evolutionary conservation.
- Predict secondary structures and validate novel miRNA candidates.
Quantification
- Calculate expression levels of identified miRNAs in terms of reads per million (RPM) or normalized counts.
Differential Expression Analysis
- Perform statistical analysis to identify differentially expressed miRNAs between conditions using tools like DESeq2, edgeR, or limma.
Functional Annotation
- Annotate target genes of miRNAs using target prediction algorithms such as TargetScan or miRanda.
- Perform enrichment analyses (e.g., Gene Ontology, KEGG) for target genes.
Visualization
- Generate plots such as expression heatmaps, volcano plots, and scatter plots to interpret results effectively.
- Provide a graphical summary of significant miRNAs and their targets.
Report Generation
- Compile results into a detailed, reproducible report, including raw and processed data, figures, and analysis logs.

This updated workflow incorporates state-of-the-art deep learning models to enhance the accuracy and specificity of miRNA identification in both plants and animals, ensuring robust and reliable analysis with miPyRNA.

Development Environment and Prerequisite

This source code was developed in Linux, and has been tested on Linux and OS X. The main prerequisite is Python > 3.7. Following are the external dependencies:

Flexbar – flexible barcode and adapter removal https://github.com/seqan/flexbar
Trimmomatic: A flexible read trimming tool for Illumina NGS data http://www.usadellab.org/cms/?page=trimmomatic
Trim Galore https://github.com/FelixKrueger/TrimGalore
SortMeRNA [https://github.com/sortmerna/sortmerna] (https://github.com/sortmerna/sortmerna)
STAR Aligner https://github.com/alexdobin/STAR
HISAT2 http://daehwankimlab.github.io/hisat2/
Bowtie2 https://github.com/BenLangmead/bowtie2
Subread https://subread.sourceforge.net/
HTSeq https://github.com/simon-anders/htseq
Samtools https://github.com/samtools/samtools
Bamtools https://github.com/pezmaster31/bamtools
R Language https://cran.r-project.org/bin/windows/base/
DESeq2 https://bioconductor.org/packages/release/bioc/html/DESeq2.html
edgeR https://bioconductor.org/packages/release/bioc/html/edgeR.html
Python 3 https://www.python.org/downloads/

Installation

miPyRNA Installation Guide

This guide explains how to install miPyRNA using either a Miniconda environment or Docker for cross-platform compatibility.

1. Create a Dedicated Miniconda3 Environment

To set up miPyRNA in a Miniconda environment, first, clone the repository from GitHub by running:

git clone https://github.com/navduhan/mipyrna.git

Download the Miniconda installer:

https://docs.conda.io/en/latest/miniconda.html#linux-installers

cd mipyrna

conda env create -f mipyrna_environment.yaml

pip install .

2. Create a docker image from docker file for cross-platform

clone the repository from GitHub by running:

```bash
git clone https://github.com/navduhan/mipyrna.git

cd mipyrna

docker build -t mipyrna .

Run mipyrna

mipyrna -h

Queries and Contact

Written by Naveen Duhan ([email protected]),

Kaundal Bioinformatics Lab, Utah State University,

Released under the terms of GNU General Public Licence v3

In case of technical problems (bugs etc.) please contact Naveen Duhan ([email protected])

For any Questions on the scientific aspects of the miPyRNA-0.2 method please contact:

Rakesh Kaundal, ([email protected])

Naveen Duhan, ([email protected])

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
mipyrna		mipyrna
.gitignore		.gitignore
README.md		README.md
dockerfile		dockerfile
mipyrna.py		mipyrna.py
mipyrna_environment.yaml		mipyrna_environment.yaml
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

miPyRNA

Input

Analysis approach

miPyRNA Small RNA-Seq Data Analysis Workflow

Steps

Development Environment and Prerequisite

Installation

miPyRNA Installation Guide

1. Create a Dedicated Miniconda3 Environment

2. Create a docker image from docker file for cross-platform

Run mipyrna

Queries and Contact

About

Releases

Packages

Languages

usubioinfo/mipyrna

Folders and files

Latest commit

History

Repository files navigation

miPyRNA

Input

Analysis approach

miPyRNA Small RNA-Seq Data Analysis Workflow

Steps

Development Environment and Prerequisite

Installation

miPyRNA Installation Guide

1. Create a Dedicated Miniconda3 Environment

2. Create a docker image from docker file for cross-platform

Run mipyrna

Queries and Contact

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages