The goal of DFplyr is to enable dplyr
and ggplot2
support for
S4Vectors::DataFrame
by providing the appropriate extension methods.
As row names are an important feature of many Bioconductor structures,
these are preserved where possible.
You can install the development version from GitHub with:
# install.packages("devtools")
devtools::install_github("jonocarroll/DFplyr")
You can install from Bioconductor with:
if (!require("BiocManager", quietly =TRUE))
install.packages("BiocManager")
# The following initializes usage of Bioc devel
BiocManager::install(version='devel')
BiocManager::install("DFplyr")
First create an S4Vectors DataFrame
, including S4 columns if desired
library(S4Vectors)
#> Loading required package: stats4
#> Loading required package: BiocGenerics
#>
#> Attaching package: 'BiocGenerics'
#> The following objects are masked from 'package:stats':
#>
#> IQR, mad, sd, var, xtabs
#> The following objects are masked from 'package:base':
#>
#> anyDuplicated, aperm, append, as.data.frame, basename, cbind,
#> colnames, dirname, do.call, duplicated, eval, evalq, Filter, Find,
#> get, grep, grepl, intersect, is.unsorted, lapply, Map, mapply,
#> match, mget, order, paste, pmax, pmax.int, pmin, pmin.int,
#> Position, rank, rbind, Reduce, rownames, sapply, setdiff, sort,
#> table, tapply, union, unique, unsplit, which.max, which.min
#>
#> Attaching package: 'S4Vectors'
#> The following object is masked from 'package:utils':
#>
#> findMatches
#> The following objects are masked from 'package:base':
#>
#> expand.grid, I, unname
m <- mtcars[, c("cyl", "hp", "am", "gear", "disp")]
d <- as(m, "DataFrame")
d$grX <- GenomicRanges::GRanges("chrX", IRanges::IRanges(1:32, width = 10))
d$grY <- GenomicRanges::GRanges("chrY", IRanges::IRanges(1:32, width = 10))
d$nl <- IRanges::NumericList(lapply(d$gear, function(n) round(rnorm(n), 2)))
d
#> DataFrame with 32 rows and 8 columns
#> cyl hp am gear disp grX
#> <numeric> <numeric> <numeric> <numeric> <numeric> <GRanges>
#> Mazda RX4 6 110 1 4 160 chrX:1-10
#> Mazda RX4 Wag 6 110 1 4 160 chrX:2-11
#> Datsun 710 4 93 1 4 108 chrX:3-12
#> Hornet 4 Drive 6 110 0 3 258 chrX:4-13
#> Hornet Sportabout 8 175 0 3 360 chrX:5-14
#> ... ... ... ... ... ... ...
#> Lotus Europa 4 113 1 5 95.1 chrX:28-37
#> Ford Pantera L 8 264 1 5 351.0 chrX:29-38
#> Ferrari Dino 6 175 1 5 145.0 chrX:30-39
#> Maserati Bora 8 335 1 5 301.0 chrX:31-40
#> Volvo 142E 4 109 1 4 121.0 chrX:32-41
#> grY nl
#> <GRanges> <NumericList>
#> Mazda RX4 chrY:1-10 -0.65, 0.90,-0.84,...
#> Mazda RX4 Wag chrY:2-11 0.67,-0.17, 0.23,...
#> Datsun 710 chrY:3-12 -0.91,-0.69, 0.73,...
#> Hornet 4 Drive chrY:4-13 0.65,-0.30, 0.98
#> Hornet Sportabout chrY:5-14 -0.87,-0.81,-0.42
#> ... ... ...
#> Lotus Europa chrY:28-37 0.66,0.83,1.76,...
#> Ford Pantera L chrY:29-38 -0.19,-0.83, 1.08,...
#> Ferrari Dino chrY:30-39 -0.47, 1.73,-0.08,...
#> Maserati Bora chrY:31-40 2.07,1.65,0.51,...
#> Volvo 142E chrY:32-41 0.82,-0.38,-0.86,...
This will appear in RStudio’s environment pane as a
Formal class DataFrame (dplyr-compatible)
when using DFplyr
. No interference with the actual object is required,
but this helps identify that dplyr
-compatibility is available.
DataFrame
s can then be used in dplyr
calls the same as data.frame
or tibble
objects. Support for working with S4 columns is enabled
provided they have appropriate functions. Adding multiple columns will
result in the new columns being created in alphabetical order
library(DFplyr)
#> Loading required package: dplyr
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:S4Vectors':
#>
#> first, intersect, rename, setdiff, setequal, union
#> The following objects are masked from 'package:BiocGenerics':
#>
#> combine, intersect, setdiff, union
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
#>
#> Attaching package: 'DFplyr'
#> The following object is masked from 'package:dplyr':
#>
#> desc
mutate(d, newvar = cyl + hp)
#> DataFrame with 32 rows and 9 columns
#> cyl hp am gear disp grX
#> <numeric> <numeric> <numeric> <numeric> <numeric> <GRanges>
#> Mazda RX4 6 110 1 4 160 chrX:1-10
#> Mazda RX4 Wag 6 110 1 4 160 chrX:2-11
#> Datsun 710 4 93 1 4 108 chrX:3-12
#> Hornet 4 Drive 6 110 0 3 258 chrX:4-13
#> Hornet Sportabout 8 175 0 3 360 chrX:5-14
#> ... ... ... ... ... ... ...
#> Lotus Europa 4 113 1 5 95.1 chrX:28-37
#> Ford Pantera L 8 264 1 5 351.0 chrX:29-38
#> Ferrari Dino 6 175 1 5 145.0 chrX:30-39
#> Maserati Bora 8 335 1 5 301.0 chrX:31-40
#> Volvo 142E 4 109 1 4 121.0 chrX:32-41
#> grY nl newvar
#> <GRanges> <CompressedNumericList> <numeric>
#> Mazda RX4 chrY:1-10 -0.65, 0.90,-0.84,... 116
#> Mazda RX4 Wag chrY:2-11 0.67,-0.17, 0.23,... 116
#> Datsun 710 chrY:3-12 -0.91,-0.69, 0.73,... 97
#> Hornet 4 Drive chrY:4-13 0.65,-0.30, 0.98 116
#> Hornet Sportabout chrY:5-14 -0.87,-0.81,-0.42 183
#> ... ... ... ...
#> Lotus Europa chrY:28-37 0.66,0.83,1.76,... 117
#> Ford Pantera L chrY:29-38 -0.19,-0.83, 1.08,... 272
#> Ferrari Dino chrY:30-39 -0.47, 1.73,-0.08,... 181
#> Maserati Bora chrY:31-40 2.07,1.65,0.51,... 343
#> Volvo 142E chrY:32-41 0.82,-0.38,-0.86,... 113
mutate(d, nl2 = nl * 2)
#> DataFrame with 32 rows and 9 columns
#> cyl hp am gear disp grX
#> <numeric> <numeric> <numeric> <numeric> <numeric> <GRanges>
#> Mazda RX4 6 110 1 4 160 chrX:1-10
#> Mazda RX4 Wag 6 110 1 4 160 chrX:2-11
#> Datsun 710 4 93 1 4 108 chrX:3-12
#> Hornet 4 Drive 6 110 0 3 258 chrX:4-13
#> Hornet Sportabout 8 175 0 3 360 chrX:5-14
#> ... ... ... ... ... ... ...
#> Lotus Europa 4 113 1 5 95.1 chrX:28-37
#> Ford Pantera L 8 264 1 5 351.0 chrX:29-38
#> Ferrari Dino 6 175 1 5 145.0 chrX:30-39
#> Maserati Bora 8 335 1 5 301.0 chrX:31-40
#> Volvo 142E 4 109 1 4 121.0 chrX:32-41
#> grY nl nl2
#> <GRanges> <CompressedNumericList> <CompressedNumericList>
#> Mazda RX4 chrY:1-10 -0.65, 0.90,-0.84,... -1.30, 1.80,-1.68,...
#> Mazda RX4 Wag chrY:2-11 0.67,-0.17, 0.23,... 1.34,-0.34, 0.46,...
#> Datsun 710 chrY:3-12 -0.91,-0.69, 0.73,... -1.82,-1.38, 1.46,...
#> Hornet 4 Drive chrY:4-13 0.65,-0.30, 0.98 1.30,-0.60, 1.96
#> Hornet Sportabout chrY:5-14 -0.87,-0.81,-0.42 -1.74,-1.62,-0.84
#> ... ... ... ...
#> Lotus Europa chrY:28-37 0.66,0.83,1.76,... 1.32,1.66,3.52,...
#> Ford Pantera L chrY:29-38 -0.19,-0.83, 1.08,... -0.38,-1.66, 2.16,...
#> Ferrari Dino chrY:30-39 -0.47, 1.73,-0.08,... -0.94, 3.46,-0.16,...
#> Maserati Bora chrY:31-40 2.07,1.65,0.51,... 4.14,3.30,1.02,...
#> Volvo 142E chrY:32-41 0.82,-0.38,-0.86,... 1.64,-0.76,-1.72,...
mutate(d, length_nl = lengths(nl))
#> DataFrame with 32 rows and 9 columns
#> cyl hp am gear disp grX
#> <numeric> <numeric> <numeric> <numeric> <numeric> <GRanges>
#> Mazda RX4 6 110 1 4 160 chrX:1-10
#> Mazda RX4 Wag 6 110 1 4 160 chrX:2-11
#> Datsun 710 4 93 1 4 108 chrX:3-12
#> Hornet 4 Drive 6 110 0 3 258 chrX:4-13
#> Hornet Sportabout 8 175 0 3 360 chrX:5-14
#> ... ... ... ... ... ... ...
#> Lotus Europa 4 113 1 5 95.1 chrX:28-37
#> Ford Pantera L 8 264 1 5 351.0 chrX:29-38
#> Ferrari Dino 6 175 1 5 145.0 chrX:30-39
#> Maserati Bora 8 335 1 5 301.0 chrX:31-40
#> Volvo 142E 4 109 1 4 121.0 chrX:32-41
#> grY nl length_nl
#> <GRanges> <CompressedNumericList> <integer>
#> Mazda RX4 chrY:1-10 -0.65, 0.90,-0.84,... 4
#> Mazda RX4 Wag chrY:2-11 0.67,-0.17, 0.23,... 4
#> Datsun 710 chrY:3-12 -0.91,-0.69, 0.73,... 4
#> Hornet 4 Drive chrY:4-13 0.65,-0.30, 0.98 3
#> Hornet Sportabout chrY:5-14 -0.87,-0.81,-0.42 3
#> ... ... ... ...
#> Lotus Europa chrY:28-37 0.66,0.83,1.76,... 5
#> Ford Pantera L chrY:29-38 -0.19,-0.83, 1.08,... 5
#> Ferrari Dino chrY:30-39 -0.47, 1.73,-0.08,... 5
#> Maserati Bora chrY:31-40 2.07,1.65,0.51,... 5
#> Volvo 142E chrY:32-41 0.82,-0.38,-0.86,... 4
mutate(d,
chr = GenomeInfoDb::seqnames(grX),
strand_X = BiocGenerics::strand(grX),
end_X = BiocGenerics::end(grX)
)
#> DataFrame with 32 rows and 11 columns
#> cyl hp am gear disp grX
#> <numeric> <numeric> <numeric> <numeric> <numeric> <GRanges>
#> Mazda RX4 6 110 1 4 160 chrX:1-10
#> Mazda RX4 Wag 6 110 1 4 160 chrX:2-11
#> Datsun 710 4 93 1 4 108 chrX:3-12
#> Hornet 4 Drive 6 110 0 3 258 chrX:4-13
#> Hornet Sportabout 8 175 0 3 360 chrX:5-14
#> ... ... ... ... ... ... ...
#> Lotus Europa 4 113 1 5 95.1 chrX:28-37
#> Ford Pantera L 8 264 1 5 351.0 chrX:29-38
#> Ferrari Dino 6 175 1 5 145.0 chrX:30-39
#> Maserati Bora 8 335 1 5 301.0 chrX:31-40
#> Volvo 142E 4 109 1 4 121.0 chrX:32-41
#> grY nl chr end_X strand_X
#> <GRanges> <CompressedNumericList> <Rle> <integer> <Rle>
#> Mazda RX4 chrY:1-10 -0.65, 0.90,-0.84,... chrX 10 *
#> Mazda RX4 Wag chrY:2-11 0.67,-0.17, 0.23,... chrX 11 *
#> Datsun 710 chrY:3-12 -0.91,-0.69, 0.73,... chrX 12 *
#> Hornet 4 Drive chrY:4-13 0.65,-0.30, 0.98 chrX 13 *
#> Hornet Sportabout chrY:5-14 -0.87,-0.81,-0.42 chrX 14 *
#> ... ... ... ... ... ...
#> Lotus Europa chrY:28-37 0.66,0.83,1.76,... chrX 37 *
#> Ford Pantera L chrY:29-38 -0.19,-0.83, 1.08,... chrX 38 *
#> Ferrari Dino chrY:30-39 -0.47, 1.73,-0.08,... chrX 39 *
#> Maserati Bora chrY:31-40 2.07,1.65,0.51,... chrX 40 *
#> Volvo 142E chrY:32-41 0.82,-0.38,-0.86,... chrX 41 *
the object returned remains a standard DataFrame
, and further calls
can be piped with %>%
mutate(d, newvar = cyl + hp) %>%
pull(newvar)
#> [1] 116 116 97 116 183 111 253 66 99 129 129 188 188 188 213 223 238 70 56
#> [20] 69 101 158 158 253 183 70 95 117 272 181 343 113
Some of the variants of the dplyr
verbs also work
mutate_if(d, is.numeric, ~ .^2)
#> DataFrame with 32 rows and 8 columns
#> cyl hp am gear disp grX
#> <numeric> <numeric> <numeric> <numeric> <numeric> <GRanges>
#> Mazda RX4 36 12100 1 16 25600 chrX:1-10
#> Mazda RX4 Wag 36 12100 1 16 25600 chrX:2-11
#> Datsun 710 16 8649 1 16 11664 chrX:3-12
#> Hornet 4 Drive 36 12100 0 9 66564 chrX:4-13
#> Hornet Sportabout 64 30625 0 9 129600 chrX:5-14
#> ... ... ... ... ... ... ...
#> Lotus Europa 16 12769 1 25 9044.01 chrX:28-37
#> Ford Pantera L 64 69696 1 25 123201.00 chrX:29-38
#> Ferrari Dino 36 30625 1 25 21025.00 chrX:30-39
#> Maserati Bora 64 112225 1 25 90601.00 chrX:31-40
#> Volvo 142E 16 11881 1 16 14641.00 chrX:32-41
#> grY nl
#> <GRanges> <CompressedNumericList>
#> Mazda RX4 chrY:1-10 -0.65, 0.90,-0.84,...
#> Mazda RX4 Wag chrY:2-11 0.67,-0.17, 0.23,...
#> Datsun 710 chrY:3-12 -0.91,-0.69, 0.73,...
#> Hornet 4 Drive chrY:4-13 0.65,-0.30, 0.98
#> Hornet Sportabout chrY:5-14 -0.87,-0.81,-0.42
#> ... ... ...
#> Lotus Europa chrY:28-37 0.66,0.83,1.76,...
#> Ford Pantera L chrY:29-38 -0.19,-0.83, 1.08,...
#> Ferrari Dino chrY:30-39 -0.47, 1.73,-0.08,...
#> Maserati Bora chrY:31-40 2.07,1.65,0.51,...
#> Volvo 142E chrY:32-41 0.82,-0.38,-0.86,...
mutate_if(d, ~ inherits(., "GRanges"), BiocGenerics::start)
#> DataFrame with 32 rows and 8 columns
#> cyl hp am gear disp grX
#> <numeric> <numeric> <numeric> <numeric> <numeric> <integer>
#> Mazda RX4 6 110 1 4 160 1
#> Mazda RX4 Wag 6 110 1 4 160 2
#> Datsun 710 4 93 1 4 108 3
#> Hornet 4 Drive 6 110 0 3 258 4
#> Hornet Sportabout 8 175 0 3 360 5
#> ... ... ... ... ... ... ...
#> Lotus Europa 4 113 1 5 95.1 28
#> Ford Pantera L 8 264 1 5 351.0 29
#> Ferrari Dino 6 175 1 5 145.0 30
#> Maserati Bora 8 335 1 5 301.0 31
#> Volvo 142E 4 109 1 4 121.0 32
#> grY nl
#> <integer> <CompressedNumericList>
#> Mazda RX4 1 -0.65, 0.90,-0.84,...
#> Mazda RX4 Wag 2 0.67,-0.17, 0.23,...
#> Datsun 710 3 -0.91,-0.69, 0.73,...
#> Hornet 4 Drive 4 0.65,-0.30, 0.98
#> Hornet Sportabout 5 -0.87,-0.81,-0.42
#> ... ... ...
#> Lotus Europa 28 0.66,0.83,1.76,...
#> Ford Pantera L 29 -0.19,-0.83, 1.08,...
#> Ferrari Dino 30 -0.47, 1.73,-0.08,...
#> Maserati Bora 31 2.07,1.65,0.51,...
#> Volvo 142E 32 0.82,-0.38,-0.86,...
Use of tidyselect
helpers is limited to within dplyr::vars()
calls
and using the _at
variants
mutate_at(d, vars(starts_with("c")), ~ .^2)
#> DataFrame with 32 rows and 8 columns
#> cyl hp am gear disp grX
#> <numeric> <numeric> <numeric> <numeric> <numeric> <GRanges>
#> Mazda RX4 36 110 1 4 160 chrX:1-10
#> Mazda RX4 Wag 36 110 1 4 160 chrX:2-11
#> Datsun 710 16 93 1 4 108 chrX:3-12
#> Hornet 4 Drive 36 110 0 3 258 chrX:4-13
#> Hornet Sportabout 64 175 0 3 360 chrX:5-14
#> ... ... ... ... ... ... ...
#> Lotus Europa 16 113 1 5 95.1 chrX:28-37
#> Ford Pantera L 64 264 1 5 351.0 chrX:29-38
#> Ferrari Dino 36 175 1 5 145.0 chrX:30-39
#> Maserati Bora 64 335 1 5 301.0 chrX:31-40
#> Volvo 142E 16 109 1 4 121.0 chrX:32-41
#> grY nl
#> <GRanges> <CompressedNumericList>
#> Mazda RX4 chrY:1-10 -0.65, 0.90,-0.84,...
#> Mazda RX4 Wag chrY:2-11 0.67,-0.17, 0.23,...
#> Datsun 710 chrY:3-12 -0.91,-0.69, 0.73,...
#> Hornet 4 Drive chrY:4-13 0.65,-0.30, 0.98
#> Hornet Sportabout chrY:5-14 -0.87,-0.81,-0.42
#> ... ... ...
#> Lotus Europa chrY:28-37 0.66,0.83,1.76,...
#> Ford Pantera L chrY:29-38 -0.19,-0.83, 1.08,...
#> Ferrari Dino chrY:30-39 -0.47, 1.73,-0.08,...
#> Maserati Bora chrY:31-40 2.07,1.65,0.51,...
#> Volvo 142E chrY:32-41 0.82,-0.38,-0.86,...
select_at(d, vars(starts_with("gr")))
#> DataFrame with 32 rows and 2 columns
#> grX grY
#> <GRanges> <GRanges>
#> Mazda RX4 chrX:1-10 chrY:1-10
#> Mazda RX4 Wag chrX:2-11 chrY:2-11
#> Datsun 710 chrX:3-12 chrY:3-12
#> Hornet 4 Drive chrX:4-13 chrY:4-13
#> Hornet Sportabout chrX:5-14 chrY:5-14
#> ... ... ...
#> Lotus Europa chrX:28-37 chrY:28-37
#> Ford Pantera L chrX:29-38 chrY:29-38
#> Ferrari Dino chrX:30-39 chrY:30-39
#> Maserati Bora chrX:31-40 chrY:31-40
#> Volvo 142E chrX:32-41 chrY:32-41
Importantly, grouped operations are supported. DataFrame
does not
natively support groups (the same way that data.frame
does not) so
these are implemented specifically for DFplyr
group_by(d, cyl, am)
#> DataFrame with 32 rows and 8 columns
#> Groups: cyl, am
#> cyl hp am gear disp grX
#> <numeric> <numeric> <numeric> <numeric> <numeric> <GRanges>
#> Mazda RX4 6 110 1 4 160 chrX:1-10
#> Mazda RX4 Wag 6 110 1 4 160 chrX:2-11
#> Datsun 710 4 93 1 4 108 chrX:3-12
#> Hornet 4 Drive 6 110 0 3 258 chrX:4-13
#> Hornet Sportabout 8 175 0 3 360 chrX:5-14
#> ... ... ... ... ... ... ...
#> Lotus Europa 4 113 1 5 95.1 chrX:28-37
#> Ford Pantera L 8 264 1 5 351.0 chrX:29-38
#> Ferrari Dino 6 175 1 5 145.0 chrX:30-39
#> Maserati Bora 8 335 1 5 301.0 chrX:31-40
#> Volvo 142E 4 109 1 4 121.0 chrX:32-41
#> grY nl
#> <GRanges> <CompressedNumericList>
#> Mazda RX4 chrY:1-10 -0.65, 0.90,-0.84,...
#> Mazda RX4 Wag chrY:2-11 0.67,-0.17, 0.23,...
#> Datsun 710 chrY:3-12 -0.91,-0.69, 0.73,...
#> Hornet 4 Drive chrY:4-13 0.65,-0.30, 0.98
#> Hornet Sportabout chrY:5-14 -0.87,-0.81,-0.42
#> ... ... ...
#> Lotus Europa chrY:28-37 0.66,0.83,1.76,...
#> Ford Pantera L chrY:29-38 -0.19,-0.83, 1.08,...
#> Ferrari Dino chrY:30-39 -0.47, 1.73,-0.08,...
#> Maserati Bora chrY:31-40 2.07,1.65,0.51,...
#> Volvo 142E chrY:32-41 0.82,-0.38,-0.86,...
Other verbs are similarly implemented, and preserve row names where possible
select(d, am, cyl)
#> DataFrame with 32 rows and 2 columns
#> am cyl
#> <numeric> <numeric>
#> Mazda RX4 1 6
#> Mazda RX4 Wag 1 6
#> Datsun 710 1 4
#> Hornet 4 Drive 0 6
#> Hornet Sportabout 0 8
#> ... ... ...
#> Lotus Europa 1 4
#> Ford Pantera L 1 8
#> Ferrari Dino 1 6
#> Maserati Bora 1 8
#> Volvo 142E 1 4
arrange(d, desc(hp))
#> DataFrame with 32 rows and 8 columns
#> cyl hp am gear disp grX
#> <numeric> <numeric> <numeric> <numeric> <numeric> <GRanges>
#> Maserati Bora 8 335 1 5 301 chrX:31-40
#> Ford Pantera L 8 264 1 5 351 chrX:29-38
#> Duster 360 8 245 0 3 360 chrX:7-16
#> Camaro Z28 8 245 0 3 350 chrX:24-33
#> Chrysler Imperial 8 230 0 3 440 chrX:17-26
#> ... ... ... ... ... ... ...
#> Fiat 128 4 66 1 4 78.7 chrX:18-27
#> Fiat X1-9 4 66 1 4 79.0 chrX:26-35
#> Toyota Corolla 4 65 1 4 71.1 chrX:20-29
#> Merc 240D 4 62 0 4 146.7 chrX:8-17
#> Honda Civic 4 52 1 4 75.7 chrX:19-28
#> grY nl
#> <GRanges> <CompressedNumericList>
#> Maserati Bora chrY:31-40 2.07,1.65,0.51,...
#> Ford Pantera L chrY:29-38 -0.19,-0.83, 1.08,...
#> Duster 360 chrY:7-16 -0.39,-1.09,-0.02
#> Camaro Z28 chrY:24-33 -1.51,-0.63, 0.30
#> Chrysler Imperial chrY:17-26 0.31, 1.26,-1.22
#> ... ... ...
#> Fiat 128 chrY:18-27 -1.15,-0.88,-0.39,...
#> Fiat X1-9 chrY:26-35 -0.35, 1.52, 0.36,...
#> Toyota Corolla chrY:20-29 1.26,-0.56, 0.41,...
#> Merc 240D chrY:8-17 0.76,-0.50,-0.68,...
#> Honda Civic chrY:19-28 0.94, 1.07,-1.33,...
filter(d, am == 0)
#> DataFrame with 19 rows and 8 columns
#> cyl hp am gear disp grX
#> <numeric> <numeric> <numeric> <numeric> <numeric> <GRanges>
#> Hornet 4 Drive 6 110 0 3 258.0 chrX:4-13
#> Hornet Sportabout 8 175 0 3 360.0 chrX:5-14
#> Valiant 6 105 0 3 225.0 chrX:6-15
#> Duster 360 8 245 0 3 360.0 chrX:7-16
#> Merc 240D 4 62 0 4 146.7 chrX:8-17
#> ... ... ... ... ... ... ...
#> Toyota Corona 4 97 0 3 120.1 chrX:21-30
#> Dodge Challenger 8 150 0 3 318.0 chrX:22-31
#> AMC Javelin 8 150 0 3 304.0 chrX:23-32
#> Camaro Z28 8 245 0 3 350.0 chrX:24-33
#> Pontiac Firebird 8 175 0 3 400.0 chrX:25-34
#> grY nl
#> <GRanges> <CompressedNumericList>
#> Hornet 4 Drive chrY:4-13 0.65,-0.30, 0.98
#> Hornet Sportabout chrY:5-14 -0.87,-0.81,-0.42
#> Valiant chrY:6-15 1.12, 0.21,-0.26
#> Duster 360 chrY:7-16 -0.39,-1.09,-0.02
#> Merc 240D chrY:8-17 0.76,-0.50,-0.68,...
#> ... ... ...
#> Toyota Corona chrY:21-30 1.65,-1.04,-1.22
#> Dodge Challenger chrY:22-31 -0.66,-0.76, 0.39
#> AMC Javelin chrY:23-32 -0.61,-0.52, 1.71
#> Camaro Z28 chrY:24-33 -1.51,-0.63, 0.30
#> Pontiac Firebird chrY:25-34 -0.67, 0.35, 0.29
slice(d, 3:6)
#> DataFrame with 4 rows and 8 columns
#> cyl hp am gear disp grX
#> <numeric> <numeric> <numeric> <numeric> <numeric> <GRanges>
#> Datsun 710 4 93 1 4 108 chrX:3-12
#> Hornet 4 Drive 6 110 0 3 258 chrX:4-13
#> Hornet Sportabout 8 175 0 3 360 chrX:5-14
#> Valiant 6 105 0 3 225 chrX:6-15
#> grY nl
#> <GRanges> <CompressedNumericList>
#> Datsun 710 chrY:3-12 -0.91,-0.69, 0.73,...
#> Hornet 4 Drive chrY:4-13 0.65,-0.30, 0.98
#> Hornet Sportabout chrY:5-14 -0.87,-0.81,-0.42
#> Valiant chrY:6-15 1.12, 0.21,-0.26
group_by(d, gear) %>%
slice(1:2)
#> DataFrame with 6 rows and 8 columns
#> Groups: gear
#> cyl hp am gear disp grX
#> <numeric> <numeric> <numeric> <numeric> <numeric> <GRanges>
#> Hornet Sportabout 8 175 0 3 360.0 chrX:5-14
#> Merc 450SL 8 180 0 3 275.8 chrX:13-22
#> Mazda RX4 6 110 1 4 160.0 chrX:1-10
#> Mazda RX4 Wag 6 110 1 4 160.0 chrX:2-11
#> Porsche 914-2 4 91 1 5 120.3 chrX:27-36
#> Ford Pantera L 8 264 1 5 351.0 chrX:29-38
#> grY nl
#> <GRanges> <CompressedNumericList>
#> Hornet Sportabout chrY:5-14 -0.87,-0.81,-0.42
#> Merc 450SL chrY:13-22 0.43,1.46,0.13
#> Mazda RX4 chrY:1-10 -0.65, 0.90,-0.84,...
#> Mazda RX4 Wag chrY:2-11 0.67,-0.17, 0.23,...
#> Porsche 914-2 chrY:27-36 0.28, 0.94,-0.14,...
#> Ford Pantera L chrY:29-38 -0.19,-0.83, 1.08,...
rename
is itself renamed to rename2
due to conflicts between {dplyr}
and {S4Vectors}, but works in the {dplyr} sense of taking new = old
replacements with NSE syntax
select(d, am, cyl) %>%
rename2(foo = am)
#> DataFrame with 32 rows and 2 columns
#> foo cyl
#> <numeric> <numeric>
#> Mazda RX4 1 6
#> Mazda RX4 Wag 1 6
#> Datsun 710 1 4
#> Hornet 4 Drive 0 6
#> Hornet Sportabout 0 8
#> ... ... ...
#> Lotus Europa 1 4
#> Ford Pantera L 1 8
#> Ferrari Dino 1 6
#> Maserati Bora 1 8
#> Volvo 142E 1 4
Row names are not preserved when there may be duplicates or they don’t
make sense, otherwise the first label (according to the current
de-duplication method, in the case of distinct
, this is via
BiocGenerics::duplicated
). This may have complications for S4 columns.
distinct(d)
#> DataFrame with 32 rows and 8 columns
#> cyl hp am gear disp grX
#> <numeric> <numeric> <numeric> <numeric> <numeric> <GRanges>
#> Mazda RX4 6 110 1 4 160 chrX:1-10
#> Mazda RX4 Wag 6 110 1 4 160 chrX:2-11
#> Datsun 710 4 93 1 4 108 chrX:3-12
#> Hornet 4 Drive 6 110 0 3 258 chrX:4-13
#> Hornet Sportabout 8 175 0 3 360 chrX:5-14
#> ... ... ... ... ... ... ...
#> Lotus Europa 4 113 1 5 95.1 chrX:28-37
#> Ford Pantera L 8 264 1 5 351.0 chrX:29-38
#> Ferrari Dino 6 175 1 5 145.0 chrX:30-39
#> Maserati Bora 8 335 1 5 301.0 chrX:31-40
#> Volvo 142E 4 109 1 4 121.0 chrX:32-41
#> grY nl
#> <GRanges> <CompressedNumericList>
#> Mazda RX4 chrY:1-10 -0.65, 0.90,-0.84,...
#> Mazda RX4 Wag chrY:2-11 0.67,-0.17, 0.23,...
#> Datsun 710 chrY:3-12 -0.91,-0.69, 0.73,...
#> Hornet 4 Drive chrY:4-13 0.65,-0.30, 0.98
#> Hornet Sportabout chrY:5-14 -0.87,-0.81,-0.42
#> ... ... ...
#> Lotus Europa chrY:28-37 0.66,0.83,1.76,...
#> Ford Pantera L chrY:29-38 -0.19,-0.83, 1.08,...
#> Ferrari Dino chrY:30-39 -0.47, 1.73,-0.08,...
#> Maserati Bora chrY:31-40 2.07,1.65,0.51,...
#> Volvo 142E chrY:32-41 0.82,-0.38,-0.86,...
group_by(d, cyl, am) %>%
tally(gear)
#> DataFrame with 6 rows and 3 columns
#> cyl am n
#> <numeric> <numeric> <numeric>
#> 1 4 0 11
#> 2 4 1 34
#> 3 6 0 14
#> 4 6 1 13
#> 5 8 0 36
#> 6 8 1 10
count(d, gear, am, cyl)
#> DataFrame with 10 rows and 4 columns
#> gear am cyl n
#> <factor> <Rle> <Rle> <integer>
#> 1 3 0 4 1
#> 2 3 0 6 2
#> 3 3 0 8 12
#> 4 4 0 4 2
#> 5 4 0 6 2
#> 6 4 1 4 6
#> 7 4 1 6 2
#> 8 5 1 4 2
#> 9 5 1 6 1
#> 10 5 1 8 2
Most dplyr
functions are implemented with the exception of join
s.
If you find any which are not, please file an issue.