Skip to content

Latest commit

 

History

History
168 lines (111 loc) · 8.06 KB

README.md

File metadata and controls

168 lines (111 loc) · 8.06 KB

miPyRNA

miPyRNA: a python-based package for small RNA-Seq data analysis

Today, massive amounts of data are generated by Next-Generation Sequencing (NGS) technologies, enabling the exploration of small RNA profiles, including microRNAs (miRNAs). In recent years, numerous algorithms, statistical methods, and software tools have been developed to address the specific steps of miRNA analysis, such as identification, quantification, and differential expression analysis. However, a streamlined and reproducible workflow for miRNA data analysis remains a significant challenge.

To address this, we have developed a Python package, miPyRNA, designed specifically for efficient, manageable, and reproducible miRNA analysis from NGS data. This tool integrates current software with custom Python scripts, providing users with a versatile platform for miRNA data processing. Unlike other tools that confine users to pre-defined workflows, miPyRNA allows for greater flexibility by combining widely used command-line tools with tailored Python-based functionality. This approach enables fast and accurate identification of miRNAs, differential expression analysis, and downstream functional studies, empowering researchers to gain deeper insights into the regulatory roles of miRNAs in biological processes.

Input

miPyRNA requires a input file containing information of samples and input read files. Input template and example files here:

# Project title/Information lines should start with #
SampleName Replication Identifier File1 File2
AddFull Sample Name Here Add Replication Here Add sample Identifier Here Add Sample File Name Here Add Reverese File here if Paired END

Example input file:

#Arabidopsis transcriptome study under high light stress
SampleName Replication Identifier File1 File2
GL0.5h1 GL0.5h1 GL0.5 SRR6767632_001.fastq.gz SRR6767632_002.fastq.gz
GLO.5h2 GLO.5h2 GL0.5 SRR6767633_001.fastq.gz SRR6767633_002.fastq.gz
GL6h1 GL6h1 GL6 SRR6767634_001.fastq.gz SRR6767634_002.fastq.gz
GL6h2 GL6h2 GL6 SRR6767635_001.fastq.gz SRR6767635_002.fastq.gz
GL12h1 GL12h1 GL12 SRR6767636_001.fastq.gz SRR6767636_002.fastq.gz
GL12h2 GL12h2 GL12 SRR6767637_001.fastq.gz SRR6767637_002.fastq.gz
GL24h1 GL24h1 GL24 SRR6767639_001.fastq.gz SRR6767639_002.fastq.gz
GL24h2 GL24h2 GL24 SRR6767640_001.fastq.gz SRR6767640_002.fastq.gz
GL48h1 GL48h1 GL48 SRR6767642_001.fastq.gz SRR6767642_002.fastq.gz
GL48h2 GL48h2 GL48 SRR6767643_001.fastq.gz SRR6767643_002.fastq.gz
GL72h1 GL72h1 GL72 SRR6767644_001.fastq.gz SRR6767644_002.fastq.gz
GL72h2 GL72h2 GL72 SRR6767645_001.fastq.gz SRR6767645_002.fastq.gz

Analysis approach

miPyRNA Small RNA-Seq Data Analysis Workflow

Steps

  1. Quality Control

    • Perform an initial assessment of raw sequencing reads to ensure data quality.
    • Use tools like FastQC or custom scripts to evaluate sequence quality, GC content, and adapter contamination.
  2. Adapter Trimming

    • Remove adapter sequences and low-quality bases from the raw reads using tools like Cutadapt or Trimmomatic.
    • Generate clean, high-quality reads for downstream analysis.
  3. Read Mapping

    • Align trimmed reads to the reference genome or small RNA databases (e.g., miRBase) using tools like Bowtie or HISAT2, optimized for small RNA sequences.
  4. miRNA Identification

    • Use deep learning-based models for identifying known and novel miRNAs in plants and animals.
    • Train and implement neural networks tailored for miRNA recognition, leveraging features such as sequence composition, secondary structure, and evolutionary conservation.
    • Predict secondary structures and validate novel miRNA candidates.
  5. Quantification

    • Calculate expression levels of identified miRNAs in terms of reads per million (RPM) or normalized counts.
  6. Differential Expression Analysis

    • Perform statistical analysis to identify differentially expressed miRNAs between conditions using tools like DESeq2, edgeR, or limma.
  7. Functional Annotation

    • Annotate target genes of miRNAs using target prediction algorithms such as TargetScan or miRanda.
    • Perform enrichment analyses (e.g., Gene Ontology, KEGG) for target genes.
  8. Visualization

    • Generate plots such as expression heatmaps, volcano plots, and scatter plots to interpret results effectively.
    • Provide a graphical summary of significant miRNAs and their targets.
  9. Report Generation

    • Compile results into a detailed, reproducible report, including raw and processed data, figures, and analysis logs.

This updated workflow incorporates state-of-the-art deep learning models to enhance the accuracy and specificity of miRNA identification in both plants and animals, ensuring robust and reliable analysis with miPyRNA.

Development Environment and Prerequisite

This source code was developed in Linux, and has been tested on Linux and OS X. The main prerequisite is Python > 3.7. Following are the external dependencies:

Installation

miPyRNA Installation Guide

This guide explains how to install miPyRNA using either a Miniconda environment or Docker for cross-platform compatibility.


1. Create a Dedicated Miniconda3 Environment

To set up miPyRNA in a Miniconda environment, first, clone the repository from GitHub by running:

git clone https://github.com/navduhan/mipyrna.git

Download the Miniconda installer:

https://docs.conda.io/en/latest/miniconda.html#linux-installers

cd mipyrna

conda env create -f mipyrna_environment.yaml

pip install .

2. Create a docker image from docker file for cross-platform

clone the repository from GitHub by running:

```bash
git clone https://github.com/navduhan/mipyrna.git

cd mipyrna

docker build -t mipyrna .

Run mipyrna

mipyrna -h

Queries and Contact

Written by Naveen Duhan ([email protected]),

Kaundal Bioinformatics Lab, Utah State University,

Released under the terms of GNU General Public Licence v3

In case of technical problems (bugs etc.) please contact Naveen Duhan ([email protected])

For any Questions on the scientific aspects of the miPyRNA-0.2 method please contact:

Rakesh Kaundal, ([email protected])

Naveen Duhan, ([email protected])