-
Notifications
You must be signed in to change notification settings - Fork 0
/
WGCNA_analysis
65 lines (51 loc) · 2.82 KB
/
WGCNA_analysis
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
WGCNA analysis needed two important files.All gene FPKM matrix file with certain samples must be given.sample trait matrix with every samples
for example,trait file: no biorepeat
sampleID T01 T02 T03 T04 T05 T06
T01 1 0 0 0 0 0
T02 0 1 0 0 0 0
T03 0 0 1 0 0 0
T04 0 0 0 1 0 0
T05 0 0 0 0 1 0
T06 0 0 0 0 0 1
if need biorepeat:T01,T02,T03 and T04,T05,T06 considering a group of three biorepeat
sampleID T01_T02_T03 T04_T05_T06
T01 1 0
T02 1 0
T03 1 0
T04 0 1
T05 0 1
T06 0 1
if you have the trait related data,just input it
use perl scriptes invoking the R WGCNA packages to analyize the data.
if you no the scripte,you can step by step to use R packages(WGCNA) to analyize.
comments:
Weighted Gene Coexpression Network Analysis(WGCNA) have cutoff for the counts of gene set,which should be less than about 10000 genes.
In case your gene set exceeded(outnumber) 10000 genes,software would be crash and exit,so,on this condition,the gene set must be under 10000.
The most significant was the annotation'file must be given,and the cuoff of fpkm must be set and filtered.
in case,we done those work above,we could start the analyize.(the annotation file ofen in workflow result)
About WGCNA:
WGCNA was used to coordination expression module predicated,which can choose key(heart) gene in given modules.so the analysis to classify the
gene with common behavior help us quckly to distinguish the gene that we want.same as Kmeans-classification but its function enhanced.
WGCNA can divide your fpkm of gene and forecast the module's heart gene,however,Kmeans method just classify your expression of genes.
so we recommend WGCNA to analysis the different sample(time diff or disp diff),and the DEG intersection and union would be fine.
additionally,we must be realized that the amount of samples should exceed 5 sample,20 would be best.
Take the biological trait for WGCNA analysis,the trait file format take below format:column for grouping,T01_T02_T03 was repeated group
sampleID T01_T02_T03 T04_T05_T06 T07_T08_T09 T10_T11_T12 T13_T14_T15 T16_T17_T18 CAT GSH MDA
T01 1 0 0 0 0 0 34.04 1.72 23
T02 1 0 0 0 0 0 34.04 1.80 33
T03 1 0 0 0 0 0 38.52 1.64 44
T04 0 1 0 0 0 0 45.06 2.51 229
T05 0 1 0 0 0 0 71.29 2.92 233
T06 0 1 0 0 0 0 72.08 2.80 21
T07 0 0 1 0 0 0 90.48 3.25 23.7
T08 0 0 1 0 0 0 70.61 2.74 56.1
T09 0 0 1 0 0 0 89.06 3.31 22
T10 0 0 0 1 0 0 20.47 3.90 22
T11 0 0 0 1 0 0 46.22 4.12 71
T12 0 0 0 1 0 0 01.35 3.80 56
T13 0 0 0 0 1 0 18.61 3.98 78
T14 0 0 0 0 1 0 18.61 3.56 679
T15 0 0 0 0 1 0 95.90 2.83 55
T16 0 0 0 0 0 1 22.36 2.38 77
T17 0 0 0 0 0 1 37.46 2.04 66
T18 0 0 0 0 0 1 72.33 1.97 90