Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pareto k diagnostics and Pareto smoothing #283

Merged
merged 39 commits into from
Aug 31, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
39 commits
Select commit Hold shift + click to select a range
5f38bee
added pareto-k diagnostics
Aug 22, 2022
9ca1c1a
Begin cleanup of Pareto-smoothing functions
Mar 23, 2023
73380a9
Minor changes to pareto-smooth functions
Mar 23, 2023
11c4068
add left tail pareto-k calculation
Mar 31, 2023
9290c21
fix pareto_khat.default to not convert to draws
Mar 31, 2023
c51bb53
add pareto-khat calculation for "both" tails
Apr 1, 2023
4f95834
improve pareto-k diagnostic message handling
Apr 1, 2023
a1d1486
calculate ndraws_tail for pareto_k in accordance with psis paper
Apr 4, 2023
500fc60
refactor pareto-smoothing functions
Apr 14, 2023
b2d6e96
Merge branch 'stan-dev:master' into pareto_k
n-kall Apr 14, 2023
76cfe41
Improve documentation for pareto smoothing functions
Apr 26, 2023
7ec263f
Merge branch 'pareto_k' of github.com:n-kall/posterior into pareto_k
Apr 26, 2023
cae2893
fix documentation for ndraws_tail in pareto smoothing functions
Apr 27, 2023
30dff2e
update documentation for pareto smoothing
Apr 27, 2023
0d1acf9
return pareto diagnostics as list
Apr 27, 2023
049d8fe
Add tests and cleanup pareto diagnostics
May 3, 2023
75d9c67
Merge branch 'stan-dev:master' into pareto_k
n-kall May 3, 2023
3ae4af4
improvements to pareto smooth functions (fix rvar method, simplify)
May 9, 2023
6e232c3
cleanup and documentation
May 9, 2023
9778a3d
Merge branch 'pareto_k' of github.com:n-kall/posterior into pareto_k
May 9, 2023
ef9a2b3
remove currently unused genralized pareto distribution functions
May 9, 2023
077c975
handle extra diagnostics in pareto_khat.rvar
May 9, 2023
ca58e6d
improve documentation and comments for pareto smoothing functions
May 9, 2023
7da27e3
add input checks for pareto_smooth
May 9, 2023
e2330b3
fix documentation issue in pareto_khat
May 9, 2023
60fe39c
typofix 'convergence_rage' -> 'convergence_rate'
May 9, 2023
7c76adf
Merge branch 'stan-dev:master' into pareto_k
n-kall May 11, 2023
982d408
set minimum ss when pareto-k >=1 to infinity
May 12, 2023
7049233
Merge branch 'pareto_k' of github.com:n-kall/posterior into pareto_k
May 13, 2023
ad68f27
set min_ss based on pareto_k to inf if k >= 1
n-kall May 13, 2023
7ff235b
Merge branch 'pareto_k' of github.com:n-kall/posterior into pareto_k
May 16, 2023
3e823e7
add pareto smoothing reference in docs
May 16, 2023
4c9aa75
add new function pareto_diags and adjust existing functions
May 17, 2023
c383f50
expand pareto diagnostic documentation
May 24, 2023
4af4ede
add further tests for pareto functions
May 25, 2023
54011ec
add documentation file fo pareto_diags
May 25, 2023
bb64c2a
Merge branch 'stan-dev:master' into pareto_k
n-kall Jul 31, 2023
f107908
Merge branch 'stan-dev:master' into pareto_k
n-kall Aug 7, 2023
a85b9b5
use ess_tail for r_eff, documentation fixes, typofixes
Aug 30, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
9 changes: 9 additions & 0 deletions NAMESPACE
Original file line number Diff line number Diff line change
Expand Up @@ -223,6 +223,12 @@ S3method(order_draws,draws_list)
S3method(order_draws,draws_matrix)
S3method(order_draws,draws_rvars)
S3method(order_draws,rvar)
S3method(pareto_diags,default)
S3method(pareto_diags,rvar)
S3method(pareto_khat,default)
S3method(pareto_khat,rvar)
S3method(pareto_smooth,default)
S3method(pareto_smooth,rvar)
S3method(pillar_shaft,rvar)
S3method(print,draws_array)
S3method(print,draws_df)
Expand Down Expand Up @@ -447,6 +453,9 @@ export(ndraws)
export(niterations)
export(nvariables)
export(order_draws)
export(pareto_diags)
export(pareto_khat)
export(pareto_smooth)
export(quantile2)
export(r_scale)
export(rdo)
Expand Down
81 changes: 81 additions & 0 deletions R/gpd.R
Original file line number Diff line number Diff line change
@@ -0,0 +1,81 @@
#' Quantile function for generalized pareto
#' @noRd
qgeneralized_pareto <- function(p, mu = 0, sigma = 1, k = 0, lower.tail = TRUE, log.p = FALSE) {
stopifnot(length(mu) == 1 && length(sigma) == 1 && length(k) == 1)
if (is.na(sigma) || sigma <= 0) {
return(rep(NaN, length(p)))
}
if (log.p) {
p <- exp(p)
}
if (!lower.tail) {
p <- 1 - p
}
if (k == 0) {
q <- mu - sigma * log1p(-p)
} else {
q <- mu + sigma * expm1(-k * log1p(-p)) / k
}
q
}

#' Estimate parameters of the Generalized Pareto distribution
#'
#' Given a sample \eqn{x}, Estimate the parameters \eqn{k} and \eqn{\sigma} of
#' the generalized Pareto distribution (GPD), assuming the location parameter is
#' 0. By default the fit uses a prior for \eqn{k} (this is in addition to the prior described by Zhang and Stephens, 2009), which will stabilize
#' estimates for very small sample sizes (and low effective sample sizes in the
#' case of MCMC samples). The weakly informative prior is a Gaussian prior
#' centered at 0.5 (see details in Vehtari et al., 2022).
#'
#' @param x A numeric vector. The sample from which to estimate the parameters.
#' @param wip Logical indicating whether to adjust \eqn{k} based on a weakly
#' informative Gaussian prior centered on 0.5. Defaults to `TRUE`.
#' @param min_grid_pts The minimum number of grid points used in the fitting
#' algorithm. The actual number used is `min_grid_pts + floor(sqrt(length(x)))`.
#' @param sort_x If `TRUE` (the default), the first step in the fitting
#' algorithm is to sort the elements of `x`. If `x` is already
#' sorted in ascending order then `sort_x` can be set to `FALSE` to
#' skip the initial sorting step.
#' @return A named list with components `k` and `sigma`.
#'
#' @details Here the parameter \eqn{k} is the negative of \eqn{k} in Zhang &
#' Stephens (2009).
#'
#'
#' @references
#' Zhang, J., and Stephens, M. A. (2009). A new and efficient estimation method
#' for the generalized Pareto distribution. *Technometrics* **51**, 316-325.
#'
#' @noRd
gpdfit <- function(x, wip = TRUE, min_grid_pts = 30, sort_x = TRUE) {
# see section 4 of Zhang and Stephens (2009)
if (sort_x) {
x <- sort.int(x)
}
N <- length(x)
prior <- 3
M <- min_grid_pts + floor(sqrt(N))
jj <- seq_len(M)
xstar <- x[floor(N / 4 + 0.5)] # first quartile of sample
theta <- 1 / x[N] + (1 - sqrt(M / (jj - 0.5))) / prior / xstar
k <- matrixStats::rowMeans2(log1p(-theta %o% x))
l_theta <- N * (log(-theta / k) - k - 1) # profile log-lik
w_theta <- exp(l_theta - matrixStats::logSumExp(l_theta)) # normalize
theta_hat <- sum(theta * w_theta)
k_hat <- mean.default(log1p(-theta_hat * x))
sigma_hat <- -k_hat / theta_hat

# adjust k_hat based on weakly informative prior, Gaussian centered on 0.5.
# this stabilizes estimates for very small Monte Carlo sample sizes and low neff (see Vehtari et al., 2022 for details)
if (wip) {
k_hat <- (k_hat * N + 0.5 * 10) / (N + 10)
}

if (is.na(k_hat)) {
k_hat <- Inf
sigma_hat <- NaN
}

list(k = k_hat, sigma = sigma_hat)
}
Loading