-
Notifications
You must be signed in to change notification settings - Fork 1
/
Copy pathgoseq.Rd
102 lines (89 loc) · 3.78 KB
/
goseq.Rd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/do.goseq.R
\name{goseq}
\alias{goseq}
\title{Perform goseq Enrichment tests across a GeneSetDb.}
\usage{
goseq(
gsd,
selected,
universe,
feature.bias,
method = c("Wallenius", "Sampling", "Hypergeometric"),
repcnt = 2000,
use_genes_without_cat = TRUE,
plot.fit = FALSE,
do.conform = TRUE,
as.dt = FALSE,
.pipelined = FALSE
)
}
\arguments{
\item{gsd}{The \code{GeneSetDb} object to run tests against}
\item{selected}{The ids of the selected features}
\item{universe}{The ids of the universe}
\item{feature.bias}{a named vector as long as \code{nrow(x)} that has the
"bias" information for the features/genes tested (ie. vector of gene
lengths). \code{names(feature.bias)} should equal \code{rownames(x)}.
If this is not provided, all feature lengths are set to 1 (no bias).
The goseq package provides a \code{\link[goseq]{getlength}} function which
facilitates getting default values for these if you do not have the
correct values used in your analysis.}
\item{method}{The method to use to calculate the unbiased category
enrichment scores}
\item{repcnt}{Number of random samples to be calculated when random sampling
is used. Ignored unless \code{method="Sampling"}.}
\item{use_genes_without_cat}{A boolean to indicate whether genes without a
categorie should still be used. For example, a large number of gene may
have no GO term annotated. If this option is set to FALSE, those genes
will be ignored in the calculation of p-values (default behaviour). If
this option is set to TRUE, then these genes will count towards the total
number of genes outside the category being tested.}
\item{plot.fit}{parameter to pass to \code{goseq::nullp()}.}
\item{do.conform}{By default \code{TRUE}: does some gymnastics to conform
the \code{gsd} to the \code{universe} vector. This should neber be set
to \code{FALSE}, but this parameter is here so that when this function
is called from the \code{\link[=seas]{seas()}} codepath, we do not have to
reconform the \code{GeneSetDb} object, because it has already been done.}
\item{as.dt}{If \code{FALSE} (default), the data.frame like thing that
this funciton returns will be set to a data.frame. Set this to \code{TRUE}
to keep this object as a \code{data.table}}
\item{.pipelined}{If this is being called external to a seas pipeline, then
some additional cleanup of columns name output will be done when
\code{FALSE} (default). Otherwise the column renaming and post processing is
left to the do.goseq caller.}
}
\value{
A \code{data.table} of results, similar to goseq output. The output
from \code{\link[goseq]{nullp}} is added to the outgoing data.table as
an attribue named \code{"pwf"}.
}
\description{
Note that we do not import things from goseq directly, and only load
it if this function is fired. I can't figure out a way to selectively
import functions from the goseq package without it having to load its
dependencies, which take a long time -- and I don't want loading sparrow
to take a long time. So, the goseq package has moved to Suggests and then
is loaded within this function when necessary.
}
\examples{
vm <- exampleExpressionSet()
gdb <- conform(exampleGeneSetDb(), vm)
# Identify DGE genes
mg <- seas(vm, gdb, design = vm$design)
lfc <- logFC(mg)
# wire up params
selected <- subset(lfc, significant)$feature_id
universe <- rownames(vm)
mylens <- setNames(vm$genes$size, rownames(vm))
degenes <- setNames(integer(length(universe)), universe)
degenes[selected] <- 1L
gostats <- sparrow::goseq(
gdb, selected, universe, mylens,
method = "Wallenius", use_genes_without_cat = TRUE)
}
\references{
Young, M. D., Wakefield, M. J., Smyth, G. K., Oshlack, A. (2010).
Gene ontology analysis for RNA-seq: accounting for selection bias.
\emph{Genome Biology} 11, R14. http://genomebiology.com/2010/11/2/R14
}