man/goseq.Rd

% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/do.goseq.R
\name{goseq}
\alias{goseq}
\title{Perform goseq Enrichment tests across a GeneSetDb.}
\usage{
goseq(
  gsd,
  selected,
  universe,
  feature.bias,
  method = c("Wallenius", "Sampling", "Hypergeometric"),
  repcnt = 2000,
  use_genes_without_cat = TRUE,
  plot.fit = FALSE,
  do.conform = TRUE,
  as.dt = FALSE,
  .pipelined = FALSE
)
}
\arguments{
\item{gsd}{The \code{GeneSetDb} object to run tests against}

\item{selected}{The ids of the selected features}

\item{universe}{The ids of the universe}

\item{feature.bias}{a named vector as long as \code{nrow(x)} that has the
"bias" information for the features/genes tested (ie. vector of gene
lengths). \code{names(feature.bias)} should equal \code{rownames(x)}.
If this is not provided, all feature lengths are set to 1 (no bias).
The goseq package provides a \code{\link[goseq]{getlength}} function which
facilitates getting default values for these if you do not have the
correct values used in your analysis.}

\item{method}{The method to use to calculate the unbiased category
enrichment scores}

\item{repcnt}{Number of random samples to be calculated when random sampling
is used. Ignored unless \code{method="Sampling"}.}

\item{use_genes_without_cat}{A boolean to indicate whether genes without a
categorie should still be used. For example, a large number of gene may
have no GO term annotated. If this option is set to FALSE, those genes
will be ignored in the calculation of p-values (default behaviour). If
this option is set to TRUE, then these genes will count towards the total
number of genes outside the category being tested.}

\item{plot.fit}{parameter to pass to \code{goseq::nullp()}.}

\item{do.conform}{By default \code{TRUE}: does some gymnastics to conform
the \code{gsd} to the \code{universe} vector. This should neber be set
to \code{FALSE}, but this parameter is here so that when this function
is called from the \code{\link[=seas]{seas()}} codepath, we do not have to
reconform the \code{GeneSetDb} object, because it has already been done.}

\item{as.dt}{If \code{FALSE} (default), the data.frame like thing that
this funciton returns will be set to a data.frame. Set this to \code{TRUE}
to keep this object as a \code{data.table}}

\item{.pipelined}{If this is being called external to a seas pipeline, then
some additional cleanup of columns name output will be done when
\code{FALSE} (default). Otherwise the column renaming and post processing is
left to the do.goseq caller.}
}
\value{
A \code{data.table} of results, similar to goseq output. The output
from \code{\link[goseq]{nullp}} is added to the outgoing data.table as
an attribue named \code{"pwf"}.
}
\description{
Note that we do not import things from goseq directly, and only load
it if this function is fired. I can't figure out a way to selectively
import functions from the goseq package without it having to load its
dependencies, which take a long time -- and I don't want loading sparrow
to take a long time. So, the goseq package has moved to Suggests and then
is loaded within this function when necessary.
}
\examples{
vm <- exampleExpressionSet()
gdb <- conform(exampleGeneSetDb(), vm)

# Identify DGE genes
mg <- seas(vm, gdb, design = vm$design)
lfc <- logFC(mg)

# wire up params
selected <- subset(lfc, significant)$feature_id
universe <- rownames(vm)
mylens <- setNames(vm$genes$size, rownames(vm))
degenes <- setNames(integer(length(universe)), universe)
degenes[selected] <- 1L

gostats <- sparrow::goseq(
  gdb, selected, universe, mylens,
  method = "Wallenius", use_genes_without_cat = TRUE)
}
\references{
Young, M. D., Wakefield, M. J., Smyth, G. K., Oshlack, A. (2010).
Gene ontology analysis for RNA-seq: accounting for selection bias.
\emph{Genome Biology} 11, R14. http://genomebiology.com/2010/11/2/R14
}