FacileData

The FacileData package was written to facilitate easier analysis of large, multi-assay high-throughput genomics datasets. To this end, the FacileData package provides two things:

A FacileData Access API that defines a fluent interface over multi-assay genomics datasets that fits into the tidyverse. This enables analysts to more naturally query and retrieve data for general exploratory data analysis; and
A reference implementation of a datastore that implements the FacileData Access API called a FacileDataSet. The FacileDataSet provides efficient storage and retrieval of arbitrarily large high-throughput genomics datasets. For example, a single FacileDataSet can be used to store all of the RNA-seq, microarray, RPPA, etc. data from the The Cancer Genome Atlas. This singular FacileDataSet allows analysts easy access to arbitrary subsets of these data without having to load all of it into memory.

Installation

The FacileData suite of packages is only available from github from now. You will want to install three FacileData* packages to appreciate the its utility:

# install.packages("devtools")
devtools::install_github("facilebio/FacileData")

Example Usage

As a teaser, we’ll show how to plot HER2 copy number vs expression across the TCGA bladder and breast indications (“BLCA” and “BRCA”) using a FacileDataSet.

library(ggplot2)
library(FacileData)
library(FacileTCGADataSet)
tcga <- FacileTCGADataSet()

features <- filter_features(tcga, name == "ERBB2")

fdat <- tcga |>
  filter_samples(indication %in% c("BLCA", "BRCA")) |>
  with_assay_data(features, assay_name = "rnaseq", normalized = TRUE) |>
  with_assay_data(features, assay_name = "cnv_score") |>
  with_sample_covariates(c("indication", "sex"))

ggplot(fdat, aes(cnv_score_ERBB2, ERBB2, color = sex)) +
  geom_point() +
  facet_wrap(~ indication)

Let’s compare how you might do the same using data stored in a SummarizedExperiment named se.tcga that stores RNA-seq (raw and normalized) and copy number data.

# load / get `se.all` from somewhere
fidx <- which(mcols(se.all)$name == "ERBB2")
se <- se.all[, se.all$indication %in% c("BLCA", "BRCA")]

sdat <- data.frame(
  ERBB2 = assay(se, "rnaseq_norm")[fidx,],
  cnv_score_ERBB2 = assay(se, "cnv_score")[fidx,],
  sex = se$sex,
  indication = se$indication)

ggplot(fdat, aes(cnv_score_ERBB2, ERBB2, color=sex)) +
  geom_point() +
  facet_wrap(~ indication)

TODO: Show same analysis using MultiAssayEperiment

Name		Name	Last commit message	Last commit date
Latest commit History 548 Commits
.github		.github
R		R
inst		inst
man		man
pkgdown		pkgdown
tests		tests
vignettes		vignettes
.Rbuildignore		.Rbuildignore
.gitignore		.gitignore
DESCRIPTION		DESCRIPTION
LICENSE.md		LICENSE.md
NAMESPACE		NAMESPACE
NEWS.md		NEWS.md
README.Rmd		README.Rmd
README.md		README.md
TODO.Rmd		TODO.Rmd
biocViews.txt		biocViews.txt
codecov.yml		codecov.yml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

FacileData

Installation

Example Usage

About

Releases 2

Packages

Contributors 3

Languages

License

facilebio/FacileData

Folders and files

Latest commit

History

Repository files navigation

FacileData

Installation

Example Usage

About

Resources

License

Stars

Watchers

Forks

Releases 2

Packages 0

Contributors 3

Languages

Packages