Skip to content

Commit

Permalink
new options for lekprofile for different grouping schemes of unevalua…
Browse files Browse the repository at this point in the history
…ted variables in each facet
  • Loading branch information
fawda123 committed Sep 8, 2015
1 parent 0b73e89 commit 46c7cd1
Show file tree
Hide file tree
Showing 9 changed files with 126 additions and 59 deletions.
4 changes: 2 additions & 2 deletions DESCRIPTION
Original file line number Diff line number Diff line change
@@ -1,8 +1,8 @@
Package: NeuralNetTools
Type: Package
Title: Visualization and Analysis Tools for Neural Networks
Version: 1.3.12.9000
Date: 2015-09-07
Version: 1.3.13.9000
Date: 2015-09-08
Author: Marcus W. Beck [aut, cre]
Maintainer: Marcus W. Beck <[email protected]>
Description: Visualization and analysis tools to aid in the interpretation of
Expand Down
88 changes: 60 additions & 28 deletions R/NeuralNetTools_lek.R
Original file line number Diff line number Diff line change
Expand Up @@ -12,17 +12,23 @@
#' @param ... arguments passed to other methods
#'
#' @details
#' The Lek profile method is described briefly in Lek et al. 1996 and in more detail in Gevrey et al. 2003. The profile method is fairly generic and can be extended to any statistical model in R with a predict method. However, it is one of few methods used to evaluate sensitivity in neural networks. The default method of this function attempts to find variables names from a generic model object.
#' The Lek profile method is described briefly in Lek et al. 1996 and in more detail in Gevrey et al. 2003. The profile method is fairly generic and can be extended to any statistical model in R with a predict method. However, it is one of few methods used to evaluate sensitivity in neural networks.
#'
#' The profile method begins by obtaining model predictions of the response variable across the range of values for the given explanatory variable. All other explanatory variables are held constant at set values (e.g., minimum, 20th percentile, maximum). The final result is a set of response curves for one response variable across the range of values for one explanatory variable, while holding all other explanatory variables constant. This is implemented in in the function by creating a matrix of values for explanatory variables where the number of rows is the number of observations and the number of columns is the number of explanatory variables. All explanatory variables are held at their mean (or other constant value) while the variable of interest is sequenced from its minimum to maximum value across the range of observations. This matrix (or data frame) is then used to predict values of the response variable from a fitted model object. This is repeated for each explanatory variable to obtain all response curves.
#' The profile method can be used to evaluate the effect of explanatory variables by returning a plot of the predicted response across the range of values for each separate variable. The original profile method evaluated the effects of each variable while holding the remaining expalanatory variables at different quantiles (e.g., minimum, 20th percentile, maximum). This is implemented in in the function by creating a matrix of values for explanatory variables where the number of rows is the number of observations and the number of columns is the number of explanatory variables. All explanatory variables are held at their mean (or other constant value) while the variable of interest is sequenced from its minimum to maximum value across the range of observations. This matrix (or data frame) is then used to predict values of the response variable from a fitted model object. This is repeated for each explanatory variable to obtain all response curves. Values passed to \code{split_vals} must range from zero to one to define the quantiles for holding unevalauted explanatory variables.
#'
#' An alternative implementation of the profile method is to group the unevaluated explanatory variables using groupings defined by the statistical properties of the data. Covariance among predictors may present unlikely scenarios if holding all unevaluated variables at the same level. To address this issue, the function provides an option to hold unevalutaed variable at mean values defined by natural clusters in the data. \code{\link[stats]{kmeans}} clustering is used on the input \code{data.frame} of explanatory variables if the argument passed to \code{split_vals} is an integer value greater than one. The centers of the clusters are then used as constant values for the unevaluated variables. An arbitrary grouping scheme can also be passed to \code{split_vals} as a \code{data.frame} where the user can specify exact values for holding each value constant (see the examples).
#'
#' For all plots, the legend with the 'splits' label indicates the colors that correspond to each group. The groups describe the values at which unevaluated explanatory variables were held constant, either as specific quantiles, set mean values based on clustering, or in the arbitrary grouping defined by the user.
#'
#' Note that there is no predict method for neuralnet objects from the nn package. The lekprofile method for nn objects uses the nnet package to recreate the input model, which is then used for the sensitivity predictions. This approach only works for networks with one hidden layer.
#'
#' Finally, an alternative plot of grouping means...
#'
#' @export
#'
#' @import ggplot2 nnet
#'
#' @return A \code{\link[ggplot2]{ggplot}} object for plotting if \code{val_out = FALSE}, otherwise a \code{data.frame} in long form showing the predicted responses at different values of the explanatory varibales.
#' @return A \code{\link[ggplot2]{ggplot}} object for plotting if \code{val_out = FALSE}, otherwise a two-element \code{list} is returned with a \code{data.frame} in long form showing the predicted responses at different values of the explanatory variables and the grouping scheme that was used to hold unevaluated variables constant.
#'
#' @references
#' Lek, S., Delacoste, M., Baran, P., Dimopoulos, I., Lauga, J., Aulagnier, S. 1996. Application of neural networks to modelling nonlinear relationships in Ecology. Ecological Modelling. 90:39-52.
Expand Down Expand Up @@ -79,6 +85,20 @@
#' lekprofile(mod)
#'
#' }
#'
#' ## group by clusters instead of sequencing by quantiles
#'
#' mod <- nnet(Y1 ~ X1 + X2 + X3, data = neuraldat, size = 5)
#'
#' lekprofile(mod, split_vals = 6) # six clusters
#'
#' ## enter an arbitrary grouping scheme for the split values
#' ## i.e. hold all values at 0.5
#' split_vals <- rbind(rep(0.5, length = ncol(x)))
#' split_vals <- data.frame(split_vals)
#' names(split_vals) <- names(split_vals)
#'
#' lekprofile(mod, split_vals = split_vals, xsel = 'X3')
lekprofile <- function(mod_in, ...) UseMethod('lekprofile')

#' @rdname lekprofile
Expand All @@ -90,50 +110,62 @@ lekprofile <- function(mod_in, ...) UseMethod('lekprofile')
#' @method lekprofile default
lekprofile.default <- function(mod_in, xvars, ynms, xsel = NULL, steps = 100, split_vals = seq(0, 1, by = 0.2), val_out = FALSE, ...){


# subset xall if xsel is not empy
if(is.null(xsel)) xsel <- names(xvars)

# stop if only one input variable
if(ncol(xvars) == 1) stop('Lek profile requires greater than one input variable')

# standard lekprofile method using quantile splits or clusters
if(inherits(split_vals, c('numeric', 'integer'))){

# quantile approach
if(all(split_vals <= 1)){

grps <- apply(xvars, 2, quantile, split_vals)
grps <- as.data.frame(rbind(grps))

# kmeans approach
} else {

# sanity checks for integer, one value
if(length(split_vals) > 1) stop('split_vals must have length equal to one if an integer')
if(split_vals%%1 != 0) stop('split_vals must be an integer greater than one')

# get means of cluster centers
grps <- kmeans(xvars, centers = split_vals)$centers

}

# use matrix or data.frame input for constant values
} else {

if(ncol(split_vals) != ncol(xvars)) stop('split_vals as matrix must have ncol same as xvars')
grps <- split_vals
names(grps) <- names(xvars)

}

#use 'pred_fun' to get pred vals of response across range of vals for an exp vars
#loops over all explanatory variables of interest and all split values
lek_vals <- sapply(
xsel,
function(vars){
sapply(
split_vals,
function(splits){
pred_sens(
xvars,
mod_in,
vars,
steps,
function(val) quantile(val, probs = splits),
ynms
)
},
simplify = FALSE
)
},
function(vars) pred_sens(xvars, mod_in, vars, steps, grps, ynms),
simplify = FALSE
)

#melt lek_val list for use with ggplot
lek_vals <- melt(lek_vals, id.vars = 'x_vars')
lek_vals$L2 <- factor(lek_vals$L2, labels = split_vals)
lek_vals$L2 <- factor(lek_vals$L2)#, labels = 1:nrow(grps))
names(lek_vals) <- c('Explanatory', 'resp_name', 'Response', 'Splits', 'exp_name')

#return only values if val_out = TRUE
if(val_out) return(lek_vals)
if(val_out) return(list(lek_vals, grps))

#ggplot object
p <- ggplot2::ggplot(lek_vals, aes_string(x = 'Explanatory', y = 'Response', group = 'Splits')) +
geom_line(aes_string(colour = 'Splits', linetype = 'Splits', size = 'Splits')) +
facet_grid(resp_name ~ exp_name, scales = 'free_x') +
scale_linetype_manual(values = rep('solid', length(split_vals))) +
scale_size_manual(values = rep(1, length(split_vals)))
geom_line(aes_string(colour = 'Splits')) +
facet_grid(resp_name ~ exp_name, scales = 'free_x')

return(p)

Expand Down
55 changes: 34 additions & 21 deletions R/NeuralNetTools_utils.R
Original file line number Diff line number Diff line change
Expand Up @@ -270,15 +270,15 @@ neuralweights.nn <- function(mod_in, rel_rsc = NULL, ...){
#' @param mod_in any model object with a predict method
#' @param var_sel chr string of explanatory variable to select
#' @param step_val number of values to sequence range of selected explanatory variable
#' @param fun_in function defining the method of holding explanatory variables constant
#' @param grps matrix of values for holding explanatory values constant, one column per variable and one row per split
#' @param ynms chr string of response variable names for correct labelling
#'
#'@details
#' Gets predicted output for a model's response variable based on matrix of explanatory variables that are restricted following Lek's profile method. The selected explanatory variable is sequenced across a range of values. All other explanatory variables are held constant at the value specified by \code{fun_in}.
#' Gets predicted output for a model's response variable based on matrix of explanatory variables that are restricted following Lek's profile method. The selected explanatory variable is sequenced across a range of values. All other explanatory variables are held constant at the values in \code{grps}.
#'
#' @seealso lekprofile
#'
#' @return A \code{\link{data.frame}} of predictions and the sequence values of the selected explanatory variable
#' @return A \code{\link[base]{list}} of predictions where each element is a \code{\link[base]{data.frame}} with the predicted value of the response and the values of the explanatory variable defined by \code{var_sel}. Each element of the list corresponds to a group defined by the rows in \code{grps} at which the other explanatory variables were held constant.
#'
#' @export
#'
Expand All @@ -294,25 +294,38 @@ neuralweights.nn <- function(mod_in, rel_rsc = NULL, ...){
#' mod <- nnet(Y1 ~ X1 + X2 + X3, data = neuraldat, size = 5)
#'
#' mat_in <- neuraldat[, c('X1', 'X2', 'X3')]
#' pred_sens(mat_in, mod, 'X1', 100, function(x) quantile(x, 0.5), 'Y1')
pred_sens <- function(mat_in, mod_in, var_sel, step_val, fun_in, ynms){

mat_out <- matrix(nrow = step_val, ncol = ncol(mat_in), dimnames = list(c(1:step_val)))
mat_out <- data.frame(mat_out)
names(mat_out) <- names(mat_in)

mat_cons <- mat_in[, !names(mat_in) %in% var_sel, drop = F]
mat_cons <- apply(mat_cons, 2, fun_in)
mat_cons <- sapply(1:step_val, function(x) mat_cons)
if(!'numeric' %in% class(mat_cons)) mat_cons <- t(mat_cons)
mat_out[, !names(mat_in) %in% var_sel] <- mat_cons

mat_out[, var_sel] <- seq(min(mat_in[, var_sel]), max(mat_in[, var_sel]), length = step_val)
#' grps <- apply(mat_in, 2, quantile, seq(0, 1, by = 0.2))
#'
#' pred_sens(mat_in, mod, 'X1', 100, grps, 'Y1')
pred_sens <- function(mat_in, mod_in, var_sel, step_val, grps, ynms){

# exp variable to evaluate across its range
chngs <- range(mat_in[, var_sel, drop = FALSE], na.rm = TRUE)
chngs <- data.frame(seq(chngs[1], chngs[2], length = step_val))
names(chngs) <- var_sel

# constant values exp variables not to evaluate
const <- grps[, !names(mat_in) %in% var_sel]
rownames(const) <- 1:nrow(const)

# iterate across rows of const, combine with chngs, get preds
out <- apply(const, 1, function(x) {

topred <- as.data.frame(rbind(x))[rep(1, step_val), ]
topred <- cbind(chngs, topred)

preds <- data.frame(predict(mod_in, newdata = topred))
names(preds) <- ynms

x_vars <- topred[, var_sel]
preds <- data.frame(preds, x_vars)
rownames(preds) <- 1:step_val

return(preds)

out <- data.frame(predict(mod_in, newdata = as.data.frame(mat_out)))
names(out) <- ynms
x_vars <- mat_out[, var_sel]
data.frame(out, x_vars)
})

return(out)

}

Expand Down
Binary file modified README_files/figure-html/unnamed-chunk-6-1.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified README_files/figure-html/unnamed-chunk-7-1.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified README_files/figure-html/unnamed-chunk-8-1.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified README_files/figure-html/unnamed-chunk-9-1.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
26 changes: 23 additions & 3 deletions man/lekprofile.Rd
Original file line number Diff line number Diff line change
Expand Up @@ -40,17 +40,23 @@ lekprofile(mod_in, ...)
\item{val_out}{logical value indicating if actual sensitivity values are returned rather than a plot, default \code{FALSE}}
}
\value{
A \code{\link[ggplot2]{ggplot}} object for plotting if \code{val_out = FALSE}, otherwise a \code{data.frame} in long form showing the predicted responses at different values of the explanatory varibales.
A \code{\link[ggplot2]{ggplot}} object for plotting if \code{val_out = FALSE}, otherwise a two-element \code{list} is returned with a \code{data.frame} in long form showing the predicted responses at different values of the explanatory variables and the grouping scheme that was used to hold unevaluated variables constant.
}
\description{
Conduct a sensitivity analysis of model responses in a neural network to input variables using Lek's profile method
}
\details{
The Lek profile method is described briefly in Lek et al. 1996 and in more detail in Gevrey et al. 2003. The profile method is fairly generic and can be extended to any statistical model in R with a predict method. However, it is one of few methods used to evaluate sensitivity in neural networks. The default method of this function attempts to find variables names from a generic model object.
The Lek profile method is described briefly in Lek et al. 1996 and in more detail in Gevrey et al. 2003. The profile method is fairly generic and can be extended to any statistical model in R with a predict method. However, it is one of few methods used to evaluate sensitivity in neural networks.

The profile method begins by obtaining model predictions of the response variable across the range of values for the given explanatory variable. All other explanatory variables are held constant at set values (e.g., minimum, 20th percentile, maximum). The final result is a set of response curves for one response variable across the range of values for one explanatory variable, while holding all other explanatory variables constant. This is implemented in in the function by creating a matrix of values for explanatory variables where the number of rows is the number of observations and the number of columns is the number of explanatory variables. All explanatory variables are held at their mean (or other constant value) while the variable of interest is sequenced from its minimum to maximum value across the range of observations. This matrix (or data frame) is then used to predict values of the response variable from a fitted model object. This is repeated for each explanatory variable to obtain all response curves.
The profile method can be used to evaluate the effect of explanatory variables by returning a plot of the predicted response across the range of values for each separate variable. The original profile method evaluated the effects of each variable while holding the remaining expalanatory variables at different quantiles (e.g., minimum, 20th percentile, maximum). This is implemented in in the function by creating a matrix of values for explanatory variables where the number of rows is the number of observations and the number of columns is the number of explanatory variables. All explanatory variables are held at their mean (or other constant value) while the variable of interest is sequenced from its minimum to maximum value across the range of observations. This matrix (or data frame) is then used to predict values of the response variable from a fitted model object. This is repeated for each explanatory variable to obtain all response curves. Values passed to \code{split_vals} must range from zero to one to define the quantiles for holding unevalauted explanatory variables.

An alternative implementation of the profile method is to group the unevaluated explanatory variables using groupings defined by the statistical properties of the data. Covariance among predictors may present unlikely scenarios if holding all unevaluated variables at the same level. To address this issue, the function provides an option to hold unevalutaed variable at mean values defined by natural clusters in the data. \code{\link[stats]{kmeans}} clustering is used on the input \code{data.frame} of explanatory variables if the argument passed to \code{split_vals} is an integer value greater than one. The centers of the clusters are then used as constant values for the unevaluated variables. An arbitrary grouping scheme can also be passed to \code{split_vals} as a \code{data.frame} where the user can specify exact values for holding each value constant (see the examples).

For all plots, the legend with the 'splits' label indicates the colors that correspond to each group. The groups describe the values at which unevaluated explanatory variables were held constant, either as specific quantiles, set mean values based on clustering, or in the arbitrary grouping defined by the user.

Note that there is no predict method for neuralnet objects from the nn package. The lekprofile method for nn objects uses the nnet package to recreate the input model, which is then used for the sensitivity predictions. This approach only works for networks with one hidden layer.

Finally, an alternative plot of grouping means...
}
\examples{
## using nnet
Expand Down Expand Up @@ -99,6 +105,20 @@ mod <- train(Y1 ~ X1 + X2 + X3, method = 'nnet', data = neuraldat, linout = TRUE
lekprofile(mod)

}

## group by clusters instead of sequencing by quantiles

mod <- nnet(Y1 ~ X1 + X2 + X3, data = neuraldat, size = 5)

lekprofile(mod, split_vals = 6) # six clusters

## enter an arbitrary grouping scheme for the split values
## i.e. hold all values at 0.5
split_vals <- rbind(rep(0.5, length = ncol(x)))
split_vals <- data.frame(split_vals)
names(split_vals) <- names(split_vals)

lekprofile(mod, split_vals = split_vals, xsel = 'X3')
}
\references{
Lek, S., Delacoste, M., Baran, P., Dimopoulos, I., Lauga, J., Aulagnier, S. 1996. Application of neural networks to modelling nonlinear relationships in Ecology. Ecological Modelling. 90:39-52.
Expand Down
Loading

0 comments on commit 46c7cd1

Please sign in to comment.