Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

added mallows cp #42

Open
wants to merge 1 commit into
base: master
Choose a base branch
from
Open

added mallows cp #42

wants to merge 1 commit into from

Conversation

timbook
Copy link

@timbook timbook commented Oct 25, 2019

Added Mallows's Cp metric. Still missing unit tests, although I don't know how useful they would be. Also needs a formula added to the README. I'd be happy to do this, although I don't know how.

Copy link
Owner

@mfrasco mfrasco left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks so much for your interest in Metrics and for making this contribution! I really appreciate it. I've left a few comments in the PR that I would love your thoughts on. In addition, there are a few outstanding things to complete before the PR can be accepted

  1. Adding documentation files. (You can generate these with devtools::document())
  2. Adding tests (You can delay doing this until we agree on the appropriate function signature).

#'
#' \deqn{MSE_\text{sub} + 2p_\text{sub}MSE_\text{full}}
#'
#' While the two definitions don't give the same answer, a model with the lowest
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this is true, why does the package need to provide both definitions of the metric?

#' @param data The data (only necessary whe supplying formulae)
#' @param alt_definition Whether or not to use the alternate definition for Mallows's Cp
#' @export
mallowsCp <- function(full_model, sub_model, data = NULL, alt_definition = FALSE) {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if the arguments to this function should be actual, predicted_full, predicted_sub, p_full, and p_sub, instead of requiring the user to pass in model objects or formulae?

The advantages of this approach are

  1. It keeps consistent with the rest of the package
  2. It allows the user to use whatever software they want in order to fit the regression, not necessarily lm,

The disadvantages of this approach are

  1. An uglier function signature with more arguments.

What are your thoughts on this decision? Is there a particular reason that you designed the function the way you did?

#' Cp will have the lowest Cp using both definitions.
#'
#' @param full_model Either a \code{formula} or \code{lm} for the OLS model using all predictors.
#' @param sub_model Either a \code{formula} or \code{lm} for the OLS model with a subset of predictors
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you consider any variable names other than sub? I'm afraid that it won't be obvious to users that sub means subset of columns? What about candidate or subset?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants