RFAthM6A

RFAthM6A is a tool for predicting m6A sites in Arabidopsis thaliana.

Prerequisites to use RFAthM6A

Python3 is installed in the computer. R software is installed in the computer. The R package randomForest is installed.

If randomForest is not installed, you can run this command in R:

install.packages('randomForest')

Use well-trained models to predict m6A sites in Arabidopsis thaliana.

First, enter into the directory where the scripts are placed and prepare the exon sequence file in fasta format and the exon sequence should only contain letters "A", "T", "C" and "G". We used the file 'test.fasta' as an example to illustrate the prediction process. Run the following command to find the "RRACH" motif, which are saved to the file "test_site".

python find_site.py test.fasta test_site

Run the following command to predict m6A sites by RFPSNSP model and the prediction scores are saved to the file "psnsp_score".

Rscript predict_by_psnsp.R test_site psnsp_score

Run the following command to predict m6A sites by RFPSDSP model and the prediction scores are saved to the file "psdsp_score".

Rscript predict_by_psdsp.R test_site psdsp_score

Use the following commands to do the KSNPF encoding and RFKSNPF predicting.

python ks.py test_site test_ks

Rscript predict_by_ksnpf.R test_ks ksnpf_score

Use the following commands to do the KNF encoding and RFKNF predicting.

python knf.py test_site test_knf

Rscript predict_by_knf.R test_knf knf_score

Use the following commands to combine the prediction scores of RFPSDSP, RFKSNPF and RFKNF and save the results to the file "combine_score".

Rscript RFCombine.R psdsp_score ksnpf_score knf_score score

All the score files contain 3 columns. The first column is the name of the test sequence. The second column is the position of the "RRACH" motif in the sequence. The second column is the prediction score. The prediction score is between 0 and 1, and can be considered as the predictive probability to be m6A.

3.Repeating the work of the paper and building your own prediction models

The "dataset" directory includes the datasets and sourcecodes metioned in the paper "RFAthM6A: a new tool for predicting m6A sites in Arabidopsis thaliana" (unpublished). Making use of the files in the directory and according to the introduction of the Readme file in the directory, one can repeat the main work in the paper, including feature encoding, 5-fold cross validation and independent test for different methods. This directory also includes the files of rules we extracted from the random forest models.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

RFAthM6A

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 49 Commits
dataset		dataset
README.md		README.md
RFCombine.R		RFCombine.R
find_site.py		find_site.py
knf.RData		knf.RData
knf.py		knf.py
ks.py		ks.py
ksnpf.RData		ksnpf.RData
predict_by_knf.R		predict_by_knf.R
predict_by_ksnpf.R		predict_by_ksnpf.R
predict_by_psdsp.R		predict_by_psdsp.R
predict_by_psnsp.R		predict_by_psnsp.R
psdsp.RData		psdsp.RData
psnsp.RData		psnsp.RData
test.fasta		test.fasta

parsboy66/RFAthM6A

Folders and files

Latest commit

History

Repository files navigation

RFAthM6A

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages