Skip to content

Commit

Permalink
Add documentation clarifying appropriate use of weights in `slice_sam…
Browse files Browse the repository at this point in the history
…ple()` (#7052)

* Add documentation clarifying appropriate use of weights in dplyr's `slice_sample()`.

* Add documentation to relevant .Rd file.

* Tweak documentation placement a bit

---------

Co-authored-by: Davis Vaughan <[email protected]>
  • Loading branch information
apeterson91 and DavisVaughan authored Aug 27, 2024
1 parent 85e94fc commit cfb25a0
Show file tree
Hide file tree
Showing 2 changed files with 21 additions and 5 deletions.
12 changes: 10 additions & 2 deletions R/slice.R
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,12 @@
#' intrinsic notion of row order. If you want to perform the equivalent
#' operation, use [filter()] and [row_number()].
#'
#' For `slice_sample()`, note that the weights provided in `weight_by` are
#' passed through to the `prob` argument of [base::sample.int()]. This means
#' they cannot be used to reconstruct summary statistics from the underlying
#' population. See [this discussion](https://stats.stackexchange.com/q/639211/)
#' for more details.
#'
#' @family single table verbs
#' @inheritParams args_by
#' @inheritParams arrange
Expand Down Expand Up @@ -93,9 +99,9 @@
#' mtcars %>% slice_sample(n = 5)
#' mtcars %>% slice_sample(n = 5, replace = TRUE)
#'
#' # you can optionally weight by a variable - this code weights by the
#' # You can optionally weight by a variable - this code weights by the
#' # physical weight of the cars, so heavy cars are more likely to get
#' # selected
#' # selected.
#' mtcars %>% slice_sample(weight_by = wt, n = 5)
#'
#' # Group wise operation ----------------------------------------
Expand Down Expand Up @@ -293,6 +299,8 @@ slice_max.data.frame <- function(.data, order_by, ..., n, prop, by = NULL, with_
#' @param weight_by <[`data-masking`][rlang::args_data_masking]> Sampling
#' weights. This must evaluate to a vector of non-negative numbers the same
#' length as the input. Weights are automatically standardised to sum to 1.
#' See the `Details` section for more technical details regarding these
#' weights.
slice_sample <- function(.data, ..., n, prop, by = NULL, weight_by = NULL, replace = FALSE) {
check_dot_by_typo(...)
check_slice_unnamed_n_prop(..., n = n, prop = prop)
Expand Down
14 changes: 11 additions & 3 deletions man/slice.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

0 comments on commit cfb25a0

Please sign in to comment.