An R package for finding non-adenosine poly(A) residues in Oxford Nanopore direct RNA sequencing reads
- It works on Oxford Nanopore direct RNA sequencing reads basecalled by Guppy software
- It requires tail delimitation data produced by Nanopolish software
- It allows both for the detection of non-adenosine residues within the poly(A) tails and visual inspection of read signals
Important note! Since version 1.0.2 Ninetails is compatible also with tailfindR
Currently, Ninetails can distinguish characteristic signatures of four types of nucleotides: adenosines (A), cytosines (C), guanosines (G), and uridines (U).
Note
For detailed documentation including explanation of additional dataprocessing and datavis features see Ninetails' Wiki
The software is still under development, so all suggestions to improving it are welcome. Please note that the code contained herein may change frequently, so use it with caution.
Ninetails was tested on Linux Mint 20.3, Ubuntu 20.04.3 and Windows 11 operating systems with R 4.1.2, R 4.2.0 and R 4.2.1.
Currently, Ninetails is not available on CRAN/Bioconductor, so you need to install it using devtools
.
If you do not have devtools
installed already, you can do this with the following command in R/RStudio:
install.packages("devtools")
Note
For Windows users:
Before installation of
devtools
on Windows, you should installRtools
, so the packages would be correctly compiled: https://cran.r-project.org/bin/windows/Rtools/
Once you have devtools
installed, you can install Ninetails using the command below in R/RStudio:
devtools::install_github('LRB-IIMCB/ninetails')
library(ninetails)
The installation of the repo takes approx. 20 seconds on typical PC. Additional time is required to install and configure additional components.
Important info: Ninetails requires additional components/third party tools to operate. For further info, read Wiki
check_tails()
is the main function which allows to classify sequencing reads based on presence/absence of non-adenosine residues within their poly(A) tails (and additional conditions, such as minimal read length and qc_tag assigned by Nanopolish polya function).
Below is an example of how to use check_tails()
function:
results <- ninetails::check_tails(
nanopolish = system.file('extdata',
'test_data',
'nanopolish_output.tsv',
package = 'ninetails'),
sequencing_summary = system.file('extdata',
'test_data',
'sequencing_summary.txt',
package = 'ninetails'),
workspace = system.file('extdata',
'test_data',
'basecalled_fast5',
package = 'ninetails'),
num_cores = 2,
basecall_group = 'Basecall_1D_000',
pass_only=TRUE,
save_dir = '~/Downloads')
This function returns a list consisting of two tables: read_classes and nonadenosine_residues. In addition, the function saves results to text files in the user-specified directory.
Moreover, the function also creates a log file in the directory specified by the user.
The runtime depends on the hardware resources and sequencing depth. The processing of built-in test dataset should take around 1 minute.
column name | content |
---|---|
readname | an identifier of a given read (36 characters) |
contig | reference to which the given read was mapped (inherited from nanopolish) |
polya_length | tail length estimation provided by nanopolish polya function |
qc_tag | quality tag assigned by nanopolish polya function |
class | the crude result of classification |
comments | a code indicating whether the classification criteria were met/unmet |
The class
column contains information whether the given read was recognized as modified (containing non-adenosine residue) or not. Whereas the comment
column contains details underlying the classification outcome. The content of these columns is explained below:
class | comments | explanation |
---|---|---|
modified | YAY | move transition present, nonA residue detected |
unmodified | MAU | move transition absent, nonA residue undetected |
unmodified | MPU | move transition present, nonA residue undetected |
unclassified | QCF | nanopolish qc failed |
unclassified | NIN | not included in the analysis (pass only = T) |
unclassified | IRL | insufficient read length |
column name | content |
---|---|
readname | an identifier of a given read (36 characters) |
prediction | the result of classification (basic model: C, G, U assignment) |
est_nonA_pos | the approximate nucleotide position where nonadenosine is to be expected; reported from 5' end |
polya_length | the tail length estimated according to Nanopolish polya function |
qc_tag | quality tag assigned by nanopolish polya function |
Warning
Current pre-release versions of the package work with Guppy basecaller 6.0.0 and lower. Please be aware to use compatible version of basecaller.
Before running the program, it is recommended to ascertain that the given arguments (nanopolish, sequencing summary and directory with fast5 files) correspond with each other. In other words, that the records contained in the Nanopolish
polyA output file correspond to the records contained in the sequencing summary file and in the fast5 files stored in the declared directory (workspace). If a complete discrepancy is detected, the program will not perform the analysis. Instead, it will throw an error. In case of the presence of incompatible records - they will be omitted from the result files and the pipeline will end with warning.
Warning
Please be aware that signal transformations performed during analysis can place a heavy load on memory. This is especially true if your data covers the entire sequencing run.
For the moment, Ninetails does not offer the possibility of processing large data sets in chunks behind the scenes (under development). Therefore, to minimise the risk of unexpected crashes, it is highly recommended to split the output of the Nanopolish
polyA function into smaller files to make it easier to process the data in subsets and then merge the final results.
Note
Currently, Ninetails does not support single fast5 files as this format is deprecated by ONT. Before running the program on single fast5 files, you should convert them to multifast5 with another tool, for instance with
ont-fast5-api
.
Note
Ninetails relies on Nanopolish segmentation and therefore may underestimate terminal modifications (last and penultimate nucleotides of the tail).
Please cite Ninetails as: Gumińska, N., Matylla-Kulińska, K., Pawel Krawczyk, Maj, M., Orzeł, W., Mackiewicz, Z., Brouze, A., Mroczek, S., & Dziembowski, A. (2024). LRB-IIMCB/ninetails: v.1.0.2_manuscript (v.1.0.2_manuscript). Zenodo. https://doi.org/10.5281/zenodo.13309819
If you encounter a bug, please post it on github. To help diagnose the problem, send a minimal reproducible example (required inputs covering around 5-10 reads + corresponding nanopolish output & sequencing summary), so I will be able to reproduce the error and fix it for you.
Any issues regarding the Ninetails should be addressed to Natalia Gumińska (nguminska (at) iimcb.gov.pl).
Ninetails has beed developed in the Laboratory of RNA Biology (Dziembowski Lab) at the International Institute of Molecular and Cell Biology in Warsaw.