-
Notifications
You must be signed in to change notification settings - Fork 0
/
README.Rmd
143 lines (105 loc) · 3.74 KB
/
README.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
---
output: github_document
---
<!-- README.md is generated from README.Rmd. Please edit that file -->
```{r, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>",
fig.path = "man/figures/README-",
out.width = "100%"
)
```
# mutationsR
<!-- badges: start -->
<!-- badges: end -->
mutationsR is a tool to manipulate mutation data
## Installation
You can install the development version of mutationsR like so:
``` r
#### installing the dependencies
# install pacman
if (!require("pacman")) install.packages("pacman")
# install dependencies from CRAN
pacman::p_install(c(devtools, ggplot2, dplyr, ggpubr))
# install Biostrings
if (!require("BiocManager", quietly = TRUE))
install.packages("BiocManager")
BiocManager::install("Biostrings")
# install graphicsPLr from GitHub
if (!require("graphicsPLr")) devtools::install_github('Phuong-Le/graphicsPLr')
# the package
devtools::install_github('Phuong-Le/mutationsR')
```
## Example
Manipulating mutations as strings, you can get the central mutations from their k-mer context, get the wildtype sequence and get reverse complementary sequences.
```{r strings}
library(mutationsR)
##
## getting the central mutation
get_mut('A[C>T]G')
get_mut('C>T')
## getting the wildtype mutation
get_wt_seq('A[C>T]G')
## getting reverse complementary mutations
rv_context('A[C>G]T')
# this can be expanded so that all mutations from C and T are remained, and all mutations from A and G are converted to their reverse complementary counterparts
strand_symmetric('A[C>G]T')
strand_symmetric('A[A>G]T')
```
If you have a MAF or VCF file (mutation data provided by the ICGC portal), you can compute the k-mer mutations, given the reference genome sequence like so - note that the columns have to be in this order: donor_id, chromosome, chrom_start, chrom_end, reference_genome_allele, mutated_from_base, mutated_to_base
```{r get contexts}
library(mutationsR)
if (requireNamespace("seqinr", quietly = TRUE)) {
mut_dt = data.frame(
donor_id = c('PD1', 'PD2'),
chromosome = c('3', 'X'),
chrom_start = c(5, 7),
chrom_end = c(5, 7),
reference_genome_allele = c('A', 'C'),
mutated_from_base = c('A', 'C'),
mutated_to_base = c('T', 'A')
)
seq = seqinr::s2c('AGCTAGCTGA')
get_context = get_context_param(seq, k = 3)
apply(mut_dt, MARGIN = 1, get_context)
}
```
option to use the simplified version of VCF - note that the columns have to be in this order: donor_id (SampleID), chromosome (Chr), position (Pos), reference_base (Ref), mutated_base (Alt)
```{r get contexts (simplified)}
library(mutationsR)
if (requireNamespace("seqinr", quietly = TRUE)) {
mut_dt = data.frame(
SampleID = c("PD1", "PD2"),
Chr = c("3", "X"),
Pos = c(5, 7),
Ref = c("A", "C"),
Alt = c("T", "A")
)
seq = seqinr::s2c("AGCTAGCTGA")
get_context = get_context_param(seq, k = 3, format_ = 'simplified')
apply(mut_dt, MARGIN = 1, get_context)
}
```
If you need to get all mutation contexts, given a k-mer size, do the following (this is currently strand symmetric only)
```{r gen contexts}
library(mutationsR)
gen_contexts(1)
gen_contexts(3)
```
### Mutation spectra plotting
You can also plot mutation spectra, for example
```{r mutation spectra}
library(mutationsR)
kmer = gen_contexts(3)
value = 1:96
spectra_dt = data.frame(kmer = kmer, value = value)
spectra_plot(spectra_dt)
```
### Mutational signatures and phylogenetics
The package also provides several functions to analyse mutations together with phylogenetic trees they are associated with.
```{r phylogenetics}
library(mutationsR)
plot_signature_by_branches(mut_tree, exposure_by_branch)
```
Besides, it is also possible to reconstruct the mutations on the sample level if the only data available is the branches with `tree_to_samples`