-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
group_by #49
Comments
I was actually wondering recently whether there'd be any performance benefit to have Maybe that's just how long it takes; I'm not too familiar with |
Ooh I see what you are saying, this is a good idea thanks! For a regular raster, you only need one parent row too. I will explore this |
Here's a direct group_by only for sf in-dev, totally WIP f <- raadtools::sshfiles()$fullname[1]
f <- raadtools::sstfiles()$fullname[1]
x <- tidync::tidync(f)
p <- spex::polygonize(raster::aggregate(raster::raster(f) %/% 5, fact = 10)) %>%
group_by(Daily.sea.surface.temperature) %>% summarize()
p <- st_cast(p)
#p <- spex::polygonize(raster::raster(extent(raster::raster(f)), nrows = 50, ncols = 80))
#p$Daily.sea.surface.temperature <- NULL
# res <- x %>% hyper_filter(longitude = longitude > 147, latitude = latitude < -30) %>%
# group_by(p) %>% summarize(value = mean(sla, na.rm = T))
res <- x %>% hyper_filter(lon = lon > 147, lat = lat < -30) %>%
group_by(p) %>% summarize(value = mean(sst, na.rm = T))
p$value <- NA
p$value[match(res$tidync_group_, 1:nrow(p))] <- res$value
plot(p["value"], reset = F)
maps::map("world2", add = T)
|
Her'es the code for this, ripped out to simplify the current build #' Dplyr verbs for tidync
#'
#' Very experimental
#' @param x tidync object
#' @param .x tidync object
#' @param .data tidync object
#' @param ... passed into dplyr engine
#' @param add add to existing groups
#' @param shape experimental tidync
#' @rdname dplyr-verbs
#' @examples
#' \dontrun{
#' #x <- tidync(raadtools::sshfiles()$fullname[1])
#' # x %>% hyper_filter(longitude = longitude > 147, latitude = latitude < - 30) %>%
#' # group_by(longitude) %>% summarize(adt = mean(adt, na.rm = TRUE))
#' # x %>% hyper_filter(longitude = longitude > 147, latitude = latitude < - 30) %>%
#' #group_by(longitude) %>% summarize_all(mean, na.rm = TRUE)
#' }
tbl_vars.tidync <- function(x) {
c(x$variable$name[x$variable$active],
x$dimension$name[x$dimension$active])
}
groups.tidync <- function(x) {
x$groups
}
#' @rdname dplyr-verbs
summarise.tidync <- function(.data, ...) {
if (inherits(.data$groups, "sf")) {
## 1. create a raster (transform sf if needed)
ns <- 1:2
ax <- active_axis_transforms(.data)[ns]
X <- ax[[1]][[names(ax)[1]]][ax[[1]][["selected"]]]
Y <- ax[[2]][[names(ax)[2]]][ax[[2]][["selected"]]]
## 1a. obtain nominal sf axes from group_by TODO ^^
ex <- c(range(X) + c(-1, 1) * diff(X[1:2])/2,
range(Y) + c(-1, 1) * diff(Y[1:2])/2)
r <- raster::raster(raster::extent(ex),
ncols = sum(ax[[1]][["selected"]]), nrows = sum(ax[[2]][["selected"]]))
## 2. fasterize groups to it
.data$groups$ID <- 1:nrow(.data$groups)
#rcell <- fasterize::fasterize(.data$groups, r, "ID")
cells <- tabularaster::cellnumbers(r, .data$groups)
cells <- dplyr::filter(cells, !is.na(.data$cell_))
ht <- hyper_tibble(.data, na.rm = FALSE)
# browser()
ht[["tidync_group_"]] <- NA
ht[["tidync_cell_"]] <- raster::cellFromXY(r, as.matrix(ht[names(ax)]))
ht[["tidync_group_"]][match(cells$cell_, ht$tidync_cell_)] <- cells$object_
## 3. run actual group_by with sf-ID
## danger, see the flip here ////!!!
#ht[["tidync_group_"]] <- values(flip(rcell, "y"))
## could use setdiff here with cellnumbers ...
ht <- dplyr::filter(ht, !is.na(.data$tidync_group_))
return( ht %>% group_by(.data$tidync_group_) %>% summarise(...))
}
hyper_tibble(.data) %>% group_by(!!!.data$groups) %>% summarise(...)
}
#' @importFrom dplyr group_by ungroup summarise
#' @rdname dplyr-verbs
group_by.tidync <- function(.x, ..., add = FALSE, shape = NULL) {
if (add) stop('groupings cannot be added to')
groups <- rlang::quos(...)
x$groupshape <- FALSE
if(!is.null(shape)) {
if (length(groups) > 0) warning("'shape' is set, so bare grouping names ignored")
groups <- shape
x$groupshape <- TRUE
}
.x$groups <- groups
.x
}
#' @rdname dplyr-verbs
ungroup.tidync <- function(x, ...) {
x$groups <- NULL
x
} |
Discussed with @tremenyi the lazy approach of tidync %>% hyper_filter should also allow
group_by
to accumulate grouping variables. We'll needMaybe we can use tbl_df?
This works (fixed just now in #50 ):
It's pretty slow though, it's actually better to just do this
which shows we are on the right track I guess. You can't use tbl_cube to group_by data variables, it's only by dimension. (You can group by more than one dimension). It depends which dimension are grouped by which approach is faster, and it would depend on other stuff tool.
The text was updated successfully, but these errors were encountered: