Introduction

lncScore is a python package for the identification of lncRNA from the assembled novel transcripts, and it also can be used to calculate the coding potential.

Abstract

RNA-Seq based transcriptome assembly has been widely used in the identification of novel lncRNAs. However, the best-performing transcript reconstruction method merely identified 21% of full-length protein-coding transcripts from H. sapiens. Those partial-length protein-coding transcripts are more likely to be classified as lncRNAs due to their incomplete CDS, leading to higher false positive rate for lncRNA identification. Furthermore, potential sequencing or assembly error that gain or abolish stop codons also complicates ORF-based prediction of lncRNAs. Therefore, it remains a challenge to identify lncRNAs from the assembled transcripts, particularly the partial-length ones. Here, we present a novel alignment-free tool, lncScore, which uses a logistic regression model with 11 carefully selected features. Compared to other alignment-free tools, lncScore outperforms them on accurately distinguishing lncRNAs from mRNAs, especially partial-length mRNAs in the human and mouse datasets. In addition, lncScore also performed well on transcripts from five other species (Zebrafish, Fly, C. elegans, Rat, and Sheep), using models trained on human and mouse datasets. To speed up the prediction, multithreading is implemented within lncScore, and it only took 2 minute to clas-sify 64,756 transcripts and 54 seconds to train a new model with 21,000 transcripts with 12 threads, which is much faster than other tools.

Documentation

Documentation for the software is available at http://lncscore.openbioinformatics.org.

Installation

The following software should be installed in your cluster or computer before running the lncScore.py.

    Perl (>=5.10.1), https://www.perl.org/get.html.

    Python (>= 2.7), https://www.python.org/downloads/.

    The scikit-learn module, http://scikit-learn.org/stable/install.html.

In most use cases the best way to install Python and scikit-learn package on your system is by using Anaconda(https://www.continuum.io), which is an easy-to-install free Python distirbution and includes more than 400 of the most popular Python packages. Anaconda includes installers(https://www.continuum.io/downloads) for Windows, OS X, and Linux.

If the input file in .bed format, then an additional python package named 'pysam' is required to be installed first. After the installation of Anaconda, you can use the command 'conda install pysam' to install the Pysam package.

Author

lncScore is developed by Jian Zhao ([email protected]). For questions and comments, please contact Jian or submit an issue on github.

Reference

Zhao J, Song X*, Wang K*. lncScore: alignment-free identification of lncRNA from assembled novel transcripts. Submitted, 2016

License

WGLab MIT License.

Name		Name	Last commit message	Last commit date
Latest commit History 34 Commits
.idea		.idea
cpmodule		cpmodule
dat		dat
docs		docs
test		test
tools		tools
README.md		README.md
lncScore.py		lncScore.py
mkdocs.yml		mkdocs.yml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Introduction

Abstract

Documentation

Installation

Author

Reference

License

About

Releases

Packages

Languages

yaskermezli/lncScore

Folders and files

Latest commit

History

Repository files navigation

Introduction

Abstract

Documentation

Installation

Author

Reference

License

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages