Skip to content

Commit

Permalink
Reconcile chunk_limit and single_download_limit (#332)
Browse files Browse the repository at this point in the history
* Set chunk_limit default to single_download_limit

* Also have check_chunk_limit return the chunk limit to avoid repeated calls to bcdc_single_download_limit.
* Use check_chunk_limit in paginated requests

* document

* Fix check_chunk_limit

* Don't use a default
* Return chunk_limit early if chunk_value is NULL

* Deprecate bcdata.single_download_limit option

* wrapper function to consult the option and warn once per session if it is set.
* Use check_chunk_limit throughout to be more efficient in checking both options
* Update documentation

* Document, update NEWS

* Update to testthat edition 3

* Test changes to bcdata.single_download_limit option, use snapshot test

* Update tests for testthat 3e

* remove context()
* update expect_is() to expect_s3_class()
  and expect_type()

* Unset BCDC_KEY so doesn't affect message snapshot

* Update R/bcdc_options.R

Co-authored-by: Stephanie Hazlitt <[email protected]>

* Update R/bcdc_options.R

Co-authored-by: Stephanie Hazlitt <[email protected]>

* Update R/bcdc_options.R

Co-authored-by: Stephanie Hazlitt <[email protected]>

* document

* Comments for clarity and future cleanup

---------

Co-authored-by: Stephanie Hazlitt <[email protected]>
  • Loading branch information
ateucher and stephhazlitt authored Oct 5, 2023
1 parent 8a9aeda commit 64ebee4
Show file tree
Hide file tree
Showing 29 changed files with 239 additions and 223 deletions.
3 changes: 2 additions & 1 deletion DESCRIPTION
Original file line number Diff line number Diff line change
Expand Up @@ -56,7 +56,7 @@ Suggests:
ggplot2,
knitr,
rmarkdown,
testthat,
testthat (>= 3.0.0),
withr
VignetteBuilder:
knitr
Expand Down Expand Up @@ -86,3 +86,4 @@ Collate:
'utils-show-query.R'
'utils.R'
'zzz.R'
Config/testthat/edition: 3
2 changes: 2 additions & 0 deletions NEWS.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,7 @@
# bcdata (development version)

* Deprecate the `bcdata.single_download_limit` option, as it was mostly redundant with `bcdata.chunk_limit`, and should always be set by the server. Please set the page size limit for paginated requests via the `bcdata.chunk_limit` option (#332)

# bcdata 0.4.1

* Add `jsonlite::read_json()` as a file read method, so users can now download & read `json` resources in B.C. Data Catalogue records
Expand Down
3 changes: 1 addition & 2 deletions R/bcdc-web-services.R
Original file line number Diff line number Diff line change
Expand Up @@ -18,8 +18,7 @@
#' `"wms"` in the `format` column it is available as a Web
#' Feature Service, and you can query and download it
#' using `bcdc_query_geodata()`. The response will be
#' paginated if the number of features is above the number
#' set by the `bcdata.single_download_limit` option.
#' paginated if the number of features is greater than that allowed by the server.
#' Please see [bcdc_options()] for defaults and more
#' information.
#'
Expand Down
79 changes: 53 additions & 26 deletions R/bcdc_options.R
Original file line number Diff line number Diff line change
Expand Up @@ -12,28 +12,33 @@

#' Retrieve options used in bcdata, their value if set and the default value.
#'
#' This function retrieves bcdata specific options that can be set. These options can be set
#' using `option({name of the option} = {value of the option})`. The default options are purposefully
#' set conservatively to hopefully ensure successful requests. Resetting these options may result in
#' failed calls to the data catalogue. Options in R are reset every time R is re-started. See examples for
#' addition ways to restore your initial state.
#' This function retrieves bcdata specific options that can be set. These
#' options can be set using `option({name of the option} = {value of the
#' option})`. The default options are purposefully set conservatively to
#' hopefully ensure successful requests. Resetting these options may result in
#' failed calls to the data catalogue. Options in R are reset every time R is
#' re-started. See examples for additional ways to restore your initial state.
#'
#' `bcdata.max_geom_pred_size` is the maximum size in bytes of an object used for a geometric operation. Objects
#' that are bigger than this value will have a bounding box drawn and apply the geometric operation
#' on that simpler polygon. The [bcdc_check_geom_size] function can be used to assess whether a given spatial object
#' exceed the value of this option. Users can iteratively try to increase the maximum geometric predicate size and see
#' if the bcdata catalogue accepts the request.
#' `bcdata.max_geom_pred_size` is the maximum size in bytes of an object used
#' for a geometric operation. Objects that are bigger than this value will have
#' a bounding box drawn and apply the geometric operation on that simpler
#' polygon. The [bcdc_check_geom_size] function can be used to assess whether a
#' given spatial object exceeds the value of this option. Users can iteratively
#' try to increase the maximum geometric predicate size and see if the bcdata
#' catalogue accepts the request.
#'
#' `bcdata.chunk_limit` is an option useful when dealing with very large data sets. When requesting large objects
#' from the catalogue, the request is broken up into smaller chunks which are then recombined after they've
#' been downloaded. This is called "pagination". bcdata does this all for you but using this option you can set the size of the chunk
#' requested. On faster internet connections, a bigger chunk limit could be useful while on slower connections,
#' it is advisable to lower the chunk limit. Chunks must be less than 10000.
#' `bcdata.chunk_limit` is an option useful when dealing with very large data
#' sets. When requesting large objects from the catalogue, the request is broken
#' up into smaller chunks which are then recombined after they've been
#' downloaded. This is called "pagination". bcdata does this all for you, however by
#' using this option you can set the size of the chunk requested. On slower
#' connections, or when having problems, it may help to lower the chunk limit.
#'
#' `bcdata.single_download_limit` is the maximum number of records an object can be before forcing a paginated download
#' (see entry for `bcdata.chunk_limit` for details on pagination).
#' Tweaking this option in conjunction with `bcdata.chunk_limit` can often resolve failures in large and complex downloads.
#' The default is 10000 records.
#' `bcdata.single_download_limit` *Deprecated*. This is the maximum number of
#' records an object can be before forcing a paginated download; it is set by
#' querying the server capabilities. This option is deprecated and will be
#' removed in a future release. Use `bcdata.chunk_limit` to set a lower value
#' pagination value.
#'
#' @examples
#' \donttest{
Expand Down Expand Up @@ -69,24 +74,29 @@ bcdc_options <- function() {
ifelse(is.null(x), NA, as.numeric(x))
}

server_single_download_limit <- bcdc_single_download_limit()

dplyr::tribble(
~ option, ~ value, ~default,
"bcdata.max_geom_pred_size", null_to_na(getOption("bcdata.max_geom_pred_size")), 5E5,
"bcdata.chunk_limit", null_to_na(getOption("bcdata.chunk_limit")), 1000,
"bcdata.chunk_limit", null_to_na(getOption("bcdata.chunk_limit")), server_single_download_limit,
"bcdata.single_download_limit",
null_to_na(getOption("bcdata.single_download_limit",
default = bcdc_single_download_limit())), 10000
null_to_na(deprecate_single_download_limit_option()), server_single_download_limit
)
}


check_chunk_limit <- function(){
chunk_value <- getOption("bcdata.chunk_limit")
chunk_limit <- getOption("bcdata.single_download_limit", default = bcdc_single_download_limit())
chunk_limit <- getOption("bcdata.chunk_limit")
single_download_limit <- deprecate_single_download_limit_option()

if (!is.null(chunk_value) && chunk_value >= chunk_limit) {
stop(glue::glue("Your chunk value of {chunk_value} exceed the BC Data Catalogue chunk limit of {chunk_limit}"), call. = FALSE)
if (is.null(chunk_limit)) {
return(single_download_limit)
}
if (chunk_limit > single_download_limit) {
stop(glue::glue("Your chunk value of {chunk_limit} exceeds the BC Data Catalogue chunk limit of {single_download_limit}"), call. = FALSE)
}
chunk_limit
}

bcdc_get_capabilities <- function() {
Expand Down Expand Up @@ -152,3 +162,20 @@ bcdc_single_download_limit <- function() {
count_defaults <- xml2::xml_find_first(doc, count_default_xpath)
xml2::xml_integer(count_defaults)
}

# Used to send a message once per session that the single_download_limit option
# will be deprecated. When we remove it, replace all calls to this function
# with bcdc_single_download_limit() and remove the ._bcdataenv_$single_download_limit_warned
# object from .onLoad.
deprecate_single_download_limit_option <- function() {
x <- getOption("bcdata.single_download_limit")
if (!is.null(x)) {
if (!isTRUE(._bcdataenv_$single_download_limit_warned)) {
warning("The bcdata.single_download_limit option is deprecated. Please use bcdata.chunk_limit instead.",
call. = FALSE)
assign("single_download_limit_warned", TRUE, envir = ._bcdataenv_)
}
return(x)
}
bcdc_single_download_limit()
}
19 changes: 9 additions & 10 deletions R/utils-classes.R
Original file line number Diff line number Diff line change
Expand Up @@ -52,9 +52,7 @@ print.bcdc_promise <- function(x, ...) {

## pagination printing
number_of_records <- bcdc_number_wfs_records(x$query_list, x$cli)
sdl <- getOption("bcdata.single_download_limit", default = bcdc_single_download_limit())
cl <- getOption("bcdata.chunk_limit", default = 1000)
paginate <- number_of_records > sdl
chunk_size <- check_chunk_limit()

if (!is.null(x$query_list$count)) {
# head or tail have updated the count
Expand All @@ -73,8 +71,11 @@ print.bcdc_promise <- function(x, ...) {

cat_bullet(strwrap(glue::glue("Using {col_blue('collect()')} on this object will return {col_green(number_of_records)} features ",
"and {col_green(fields)} fields")))
if (paginate) cat_bullet(strwrap(glue::glue("Accessing this record requires pagination and will make {col_green(ceiling(number_of_records/cl))} separate requests to the WFS. ",
if (number_of_records > chunk_size) { # this triggers pagination
cat_bullet(strwrap(glue::glue("Accessing this record requires pagination and will make {col_green(ceiling(number_of_records/chunk_size))} separate requests to the WFS. ",
"See ?bcdc_options")))
}

cat_bullet(strwrap("At most six rows of the record are printed here"))
cat_rule()
print(parsed)
Expand Down Expand Up @@ -433,7 +434,6 @@ mutate.bcdc_promise <- function(.data, ...){
#' }
#'
collect.bcdc_promise <- function(x, ...){
check_chunk_limit()

x$query_list$CQL_FILTER <- finalize_cql(x$query_list$CQL_FILTER)

Expand All @@ -442,8 +442,9 @@ collect.bcdc_promise <- function(x, ...){

## Determine total number of records for pagination purposes
number_of_records <- bcdc_number_wfs_records(query_list, cli)
chunk_size <- check_chunk_limit()

if (number_of_records < getOption("bcdata.single_download_limit", default = bcdc_single_download_limit())) {
if (number_of_records <= chunk_size) {
cc <- tryCatch(cli$post(body = query_list, encode = "form"),
error = function(e) {
stop("There was an issue processing this request.
Expand All @@ -453,8 +454,7 @@ collect.bcdc_promise <- function(x, ...){
url <- cc$url
full_url <- cli$url_fetch(query = query_list)
} else {
chunk <- getOption("bcdata.chunk_limit", default = 1000)
message(glue::glue("This object has {number_of_records} records and requires {ceiling(number_of_records/chunk)} paginated requests to complete."))
message(glue::glue("This object has {number_of_records} records and requires {ceiling(number_of_records/chunk_size)} paginated requests to complete."))
sorting_col <- pagination_sort_col(x$cols_df)

query_list <- c(query_list, sortby = sorting_col)
Expand All @@ -466,7 +466,7 @@ collect.bcdc_promise <- function(x, ...){
limit_param = "count",
offset_param = "startIndex",
limit = number_of_records,
chunk = chunk,
chunk = chunk_size,
progress = interactive()
)

Expand All @@ -487,7 +487,6 @@ collect.bcdc_promise <- function(x, ...){

as.bcdc_sf(bcdc_read_sf(txt), query_list = query_list, url = url,
full_url = full_url)

}


Expand Down
2 changes: 1 addition & 1 deletion R/zzz.R
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@
.onLoad <- function(...) {
._bcdataenv_$named_get_record_warned <- FALSE # nocov
._bcdataenv_$get_capabilities_xml <- NULL # nocov

._bcdataenv_$single_download_limit_warned <- FALSE # nocov
}

# Define bcdc_sf as a subclass of sf so that it works
Expand Down
42 changes: 23 additions & 19 deletions man/bcdc_options.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

3 changes: 1 addition & 2 deletions man/bcdc_query_geodata.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

2 changes: 1 addition & 1 deletion scratch/scratch_test_iterating.R
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,6 @@ for(nme in single_arg_functions[6]){
filter(fun(local)) %>%
collect()

expect_is(remote, "sf")
expect_s3_class(remote, "sf")
expect_equal(attr(remote, "sf_column"), "geometry")
}
25 changes: 10 additions & 15 deletions tests/testthat.R
Original file line number Diff line number Diff line change
@@ -1,17 +1,12 @@
# Copyright 2019 Province of British Columbia
# This file is part of the standard setup for testthat.
# It is recommended that you do not modify it.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and limitations under the License.
# Where should you do additional test configuration?
# Learn more about the roles of various files in:
# * https://r-pkgs.org/testing-design.html#sec-tests-files-overview
# * https://testthat.r-lib.org/articles/special-files.html

library(testthat)
library(bcdata)

if (require("testthat", quietly = TRUE)) {
library(bcdata)
library(sf)
test_check("bcdata")
}
test_check("bcdata")
36 changes: 36 additions & 0 deletions tests/testthat/_snaps/options.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,36 @@
# bcdata.single_download_limit is deprecated but works

Code
bcdc_query_geodata(record = "76b1b7a3-2112-4444-857a-afccf7b20da8")
Condition
Warning:
The bcdata.single_download_limit option is deprecated. Please use bcdata.chunk_limit instead.
Output
Querying 'bc-airports' record
* Using collect() on this object will return 455 features and 41 fields
* Accessing this record requires pagination and will make 455 separate
* requests to the WFS. See ?bcdc_options
* At most six rows of the record are printed here
--------------------------------------------------------------------------------
Simple feature collection with 6 features and 41 fields
Geometry type: POINT
Dimension: XY
Bounding box: xmin: 833323.9 ymin: 381604.1 xmax: 1198292 ymax: 1054950
Projected CRS: NAD83 / BC Albers
# A tibble: 6 x 42
id CUSTODIAN_ORG_DESCRI~1 BUSINESS_CATEGORY_CL~2 BUSINESS_CATEGORY_DE~3
<chr> <chr> <chr> <chr>
1 WHSE_IMA~ "Ministry of Forest, ~ airTransportation Air Transportation
2 WHSE_IMA~ "Ministry of Forest, ~ airTransportation Air Transportation
3 WHSE_IMA~ "Ministry of Forest, ~ airTransportation Air Transportation
4 WHSE_IMA~ "Ministry of Forest, ~ airTransportation Air Transportation
5 WHSE_IMA~ "Ministry of Forest, ~ airTransportation Air Transportation
6 WHSE_IMA~ "Ministry of Forest, ~ airTransportation Air Transportation
# i abbreviated names: 1: CUSTODIAN_ORG_DESCRIPTION,
# 2: BUSINESS_CATEGORY_CLASS, 3: BUSINESS_CATEGORY_DESCRIPTION
# i 38 more variables: OCCUPANT_TYPE_DESCRIPTION <chr>, SOURCE_DATA_ID <chr>,
# SUPPLIED_SOURCE_ID_IND <chr>, AIRPORT_NAME <chr>, DESCRIPTION <chr>,
# PHYSICAL_ADDRESS <chr>, ALIAS_ADDRESS <chr>, STREET_ADDRESS <chr>,
# POSTAL_CODE <chr>, LOCALITY <chr>, CONTACT_PHONE <chr>,
# CONTACT_EMAIL <chr>, CONTACT_FAX <chr>, WEBSITE_URL <chr>, ...

3 changes: 0 additions & 3 deletions tests/testthat/test-bcdc-get-citation.R
Original file line number Diff line number Diff line change
Expand Up @@ -10,9 +10,6 @@
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and limitations under the License.

context("Testing bcdc_get_citation function")


test_that("bcdc_get_citation take a character and returns a bibentry",{
skip_if_net_down()
skip_on_cran()
Expand Down
2 changes: 0 additions & 2 deletions tests/testthat/test-browse.R
Original file line number Diff line number Diff line change
Expand Up @@ -10,8 +10,6 @@
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and limitations under the License.

context("confirm browsing ability")

test_that("bcdc_browse returns the correct url", {
skip_if_net_down()
skip_on_cran()
Expand Down
3 changes: 1 addition & 2 deletions tests/testthat/test-cql-string.R
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,6 @@
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and limitations under the License.

context("Testing ability to create CQL strings")
suppressPackageStartupMessages(library(sf, quietly = TRUE))

the_geom <- st_sf(st_sfc(st_point(c(1,1))))
Expand All @@ -26,7 +25,7 @@ test_that("bcdc_cql_string fails when used on an uncollected (promise) object",
})

test_that("CQL function works", {
expect_is(CQL("SELECT * FROM foo;"), c("CQL", "SQL"))
expect_s3_class(CQL("SELECT * FROM foo;"), c("CQL", "SQL"))
})

test_that("All cql geom predicate functions work", {
Expand Down
Loading

0 comments on commit 64ebee4

Please sign in to comment.