Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Transpose(dt) allows to return list without promoting elements to maxtype #5805

Merged
merged 27 commits into from
Mar 16, 2024
Merged
Show file tree
Hide file tree
Changes from 15 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions NEWS.md
Original file line number Diff line number Diff line change
Expand Up @@ -293,6 +293,8 @@

41. `tables()` is faster by default by excluding the size of character strings in R's global cache (which may be shared) and excluding the size of list column items (which also may be shared). `mb=` now accepts any function which accepts a `data.table` and returns a higher and better estimate of its size in bytes, albeit more slowly; e.g. `mb = utils::object.size`.

42. `transpose` gains `list.cols=` argument, [#5639](https://github.com/Rdatatable/data.table/issues/5639). This enables to return output with list columns and avoids promoting type to maximum type. The only exception are `factor` columns which are promoted to `character` so `transpose(, list.cols=TRUE)` and `transpose(, list.cols=FALSE)` keep same behavior for `factor`. Thanks to @MLopez-Ibanez for the request, and Benjamin Schwendinger for the PR.
ben-schwen marked this conversation as resolved.
Show resolved Hide resolved

## BUG FIXES

1. `by=.EACHI` when `i` is keyed but `on=` different columns than `i`'s key could create an invalidly keyed result, [#4603](https://github.com/Rdatatable/data.table/issues/4603) [#4911](https://github.com/Rdatatable/data.table/issues/4911). Thanks to @myoung3 and @adamaltmejd for reporting, and @ColeMiller1 for the PR. An invalid key is where a `data.table` is marked as sorted by the key columns but the data is not sorted by those columns, leading to incorrect results from subsequent queries.
Expand Down
4 changes: 2 additions & 2 deletions R/transpose.R
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
transpose = function(l, fill=NA, ignore.empty=FALSE, keep.names=NULL, make.names=NULL) {
transpose = function(l, fill=NA, ignore.empty=FALSE, keep.names=NULL, make.names=NULL, list.cols=FALSE) {
if (!is.null(make.names)) {
stopifnot(length(make.names)==1L)
if (is.character(make.names)) {
Expand All @@ -14,7 +14,7 @@ transpose = function(l, fill=NA, ignore.empty=FALSE, keep.names=NULL, make.names
colnames = as.character(l[[make.names]])
l = if (is.data.table(l)) l[,-make.names,with=FALSE] else l[-make.names]
}
ans = .Call(Ctranspose, l, fill, ignore.empty, keep.names)
ans = .Call(Ctranspose, l, fill, ignore.empty, keep.names, list.cols)
if (!is.null(make.names)) setattr(ans, "names", c(keep.names, colnames))
else if (is.data.frame(l)) # including data.table but not plain list
setattr(ans, "names", c(keep.names, paste0("V", seq_len(length(ans)-length(keep.names)))))
Expand Down
15 changes: 14 additions & 1 deletion inst/tests/tests.Rraw
Original file line number Diff line number Diff line change
Expand Up @@ -6888,10 +6888,23 @@ ll = sapply(ll, paste, collapse=",")
test(1477.07, transpose(strsplit(ll, ",", fixed=TRUE)), tstrsplit(ll, ",", fixed=TRUE))
test(1477.08, transpose(1:5), error="l must be a list")
test(1477.09, transpose(list(as.complex(c(1, 1+5i)))), error="Unsupported column type")
test(1477.10, transpose(list(list(1:5))), error="Item 1 of list input is")
test(1477.10, transpose(list(x~y)), error="Item 1 of list input is") # changed behavior since we also support now list columns #5805
test(1477.11, transpose(as.list(1:5), fill=1:2), error="fill must be a length 1 vector")
test(1477.12, transpose(as.list(1:5), ignore.empty=NA), error="ignore.empty should be logical TRUE/FALSE")
test(1477.13, transpose(list()), list())
# return list columns #5639
la = list(as.list(1:3), list("a","b","c"))
lb = list(list(1L,"a"), list(2L,"b"), list(3L,"c"))
test(1477.21, transpose(list(1:3, c("a","b","c")), list.cols=TRUE), lb)
test(1477.22, transpose(list(1:3, c("a","b","c")), list.cols=FALSE), lapply(lb, as.character))
test(1477.23, transpose(la, list.cols=TRUE), lb)
test(1477.24, transpose(lb, list.cols=TRUE), la)
test(1477.25, transpose(list(list(1L,"a"), list(2L), list(3L,"c")), list.cols=TRUE, fill="b"), la)
test(1477.26, transpose(list(1:2, c("a","b","c")), list.cols=TRUE, fill=3L), lb)
test(1477.27, transpose(list(factor(letters[1:3])), list.cols=TRUE), list(list("a"), list("b"), list("c")))
test(1477.28, transpose(list(factor(letters[1:3])), list.cols=FALSE), list("a", "b", "c"))
test(1477.41, transpose(la, list.cols=NA), error="list.cols should be logical TRUE/FALSE.")


# #480 `setDT` and 'lapply'
ll = list(data.frame(a=1), data.frame(x=1, y=2), NULL, list())
Expand Down
11 changes: 10 additions & 1 deletion man/transpose.Rd
Original file line number Diff line number Diff line change
Expand Up @@ -6,14 +6,15 @@
}

\usage{
transpose(l, fill=NA, ignore.empty=FALSE, keep.names=NULL, make.names=NULL)
transpose(l, fill=NA, ignore.empty=FALSE, keep.names=NULL, make.names=NULL, list.cols=FALSE)
}
\arguments{
\item{l}{ A list, data.frame or data.table. }
\item{fill}{ Default is \code{NA}. It is used to fill shorter list elements so as to return each element of the transposed result of equal lengths. }
\item{ignore.empty}{Default is \code{FALSE}. \code{TRUE} will ignore length-0 list elements.}
\item{keep.names}{The name of the first column in the result containing the names of the input; e.g. \code{keep.names="rn"}. By default \code{NULL} and the names of the input are discarded.}
\item{make.names}{The name or number of a column in the input to use as names of the output; e.g. \code{make.names="rn"}. By default \code{NULL} and default names are given to the output columns.}
\item{list.cols}{Default is \code{FALSE}. \code{TRUE} will avoid promoting types and return columns of type \code{list} instead. \code{factor} will always be casted to \code{character}.}
ben-schwen marked this conversation as resolved.
Show resolved Hide resolved
}
\details{
The list elements (or columns of \code{data.frame}/\code{data.table}) should be all \code{atomic}. If list elements are of unequal lengths, the value provided in \code{fill} will be used so that the resulting list always has all elements of identical lengths. The class of input object is also preserved in the transposed result.
Expand All @@ -35,6 +36,14 @@ setDT(transpose(ll, fill=0))[]

DT = data.table(x=1:5, y=6:10)
transpose(DT)

DT = data.table(x=1:3, y=c("a","b","c"))
transpose(DT, list.cols=TRUE)

# base R equivalent of transpose
l = list(1:3, c("a", "b", "c"))
lapply(seq(length(l[[1]])), function(x) lapply(l, `[[`, x))
transpose(l, list.cols=TRUE)
}
\seealso{
\code{\link{data.table}}, \code{\link{tstrsplit}}
Expand Down
2 changes: 1 addition & 1 deletion src/data.table.h
Original file line number Diff line number Diff line change
Expand Up @@ -304,7 +304,7 @@ SEXP lookup(SEXP, SEXP, SEXP, SEXP, SEXP, SEXP, SEXP, SEXP);
SEXP overlaps(SEXP, SEXP, SEXP, SEXP, SEXP, SEXP);
SEXP whichwrapper(SEXP, SEXP);
SEXP shift(SEXP, SEXP, SEXP, SEXP);
SEXP transpose(SEXP, SEXP, SEXP, SEXP);
SEXP transpose(SEXP, SEXP, SEXP, SEXP, SEXP);
SEXP anyNA(SEXP, SEXP);
SEXP setlevels(SEXP, SEXP, SEXP);
SEXP rleid(SEXP, SEXP);
Expand Down
28 changes: 16 additions & 12 deletions src/transpose.c
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
#include <Rdefines.h>
#include <time.h>

SEXP transpose(SEXP l, SEXP fill, SEXP ignoreArg, SEXP keepNamesArg) {
SEXP transpose(SEXP l, SEXP fill, SEXP ignoreArg, SEXP keepNamesArg, SEXP listColsArg) {

int nprotect=0;
if (!isNewList(l))
Expand All @@ -18,23 +18,28 @@ SEXP transpose(SEXP l, SEXP fill, SEXP ignoreArg, SEXP keepNamesArg) {
if (length(fill) != 1)
error(_("fill must be a length 1 vector, such as the default NA"));
R_len_t ln = LENGTH(l);
if (!isLogical(listColsArg) || LOGICAL(listColsArg)[0]==NA_LOGICAL)
ben-schwen marked this conversation as resolved.
Show resolved Hide resolved
error(_("list.cols should be logical TRUE/FALSE."));
bool listCol = LOGICAL(listColsArg)[0];

// preprocessing
int maxlen=0, zerolen=0;
SEXPTYPE maxtype=0;
for (int i=0; i<ln; ++i) {
SEXP li = VECTOR_ELT(l, i);
if (!isVectorAtomic(li) && !isNull(li))
error(_("Item %d of list input is not an atomic vector"), i+1);
if (!isVectorAtomic(li) && !isNull(li) && !isNewList(li))
error(_("Item %d of list input is not either an atomic vector, or a list"), i+1);
const int len = length(li);
if (len>maxlen) maxlen=len;
zerolen += (len==0);
SEXPTYPE type = TYPEOF(li);
if (isFactor(li)) type=STRSXP;
if (type>maxtype) maxtype=type;
}
if (listCol) maxtype=VECSXP; // need to keep preprocessing for zerolen
fill = PROTECT(coerceVector(fill, maxtype)); nprotect++;


ben-schwen marked this conversation as resolved.
Show resolved Hide resolved
SEXP ans = PROTECT(allocVector(VECSXP, maxlen+rn)); nprotect++;
int anslen = (ignore) ? (ln - zerolen) : ln;
if (rn) {
Expand All @@ -54,17 +59,10 @@ SEXP transpose(SEXP l, SEXP fill, SEXP ignoreArg, SEXP keepNamesArg) {
const int len = length(li);
if (ignore && len==0) continue;
if (TYPEOF(li) != maxtype) {
li = PROTECT(isFactor(li) ? asCharacterFactor(li) : coerceVector(li, maxtype));
li = PROTECT(isFactor(li) ? (listCol ? coerceVector(asCharacterFactor(li), VECSXP) : asCharacterFactor(li)) : coerceVector(li, maxtype));
} else PROTECT(li); // extra PROTECT just to help rchk by avoiding two counter variables
switch (maxtype) {
case LGLSXP : {
ben-schwen marked this conversation as resolved.
Show resolved Hide resolved
const int *ili = LOGICAL(li);
const int ifill = LOGICAL(fill)[0];
for (int j=0; j<maxlen; ++j) {
LOGICAL(ansp[j+rn])[k] = j<len ? ili[j] : ifill;
}
} break;
case INTSXP : {
case INTSXP : case LGLSXP : {
const int *ili = INTEGER(li);
const int ifill = INTEGER(fill)[0];
for (int j=0; j<maxlen; ++j) {
Expand All @@ -84,6 +82,12 @@ SEXP transpose(SEXP l, SEXP fill, SEXP ignoreArg, SEXP keepNamesArg) {
SET_STRING_ELT(ansp[j+rn], k, j<len ? STRING_ELT(li, j) : sfill);
}
} break;
case VECSXP : {
const SEXP vfill = VECTOR_ELT(fill, 0);
for (int j=0; j<maxlen; ++j) {
SET_VECTOR_ELT(ansp[j+rn], k, j<len ? VECTOR_ELT(li, j) : vfill);
}
} break;
default :
error(_("Unsupported column type '%s'"), type2char(maxtype));
}
Expand Down
Loading