-
Notifications
You must be signed in to change notification settings - Fork 26
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
added mallows cp #42
base: master
Are you sure you want to change the base?
added mallows cp #42
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks so much for your interest in Metrics
and for making this contribution! I really appreciate it. I've left a few comments in the PR that I would love your thoughts on. In addition, there are a few outstanding things to complete before the PR can be accepted
- Adding documentation files. (You can generate these with
devtools::document()
) - Adding tests (You can delay doing this until we agree on the appropriate function signature).
#' | ||
#' \deqn{MSE_\text{sub} + 2p_\text{sub}MSE_\text{full}} | ||
#' | ||
#' While the two definitions don't give the same answer, a model with the lowest |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since this is true, why does the package need to provide both definitions of the metric?
#' @param data The data (only necessary whe supplying formulae) | ||
#' @param alt_definition Whether or not to use the alternate definition for Mallows's Cp | ||
#' @export | ||
mallowsCp <- function(full_model, sub_model, data = NULL, alt_definition = FALSE) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if the arguments to this function should be actual
, predicted_full
, predicted_sub
, p_full
, and p_sub
, instead of requiring the user to pass in model objects or formulae?
The advantages of this approach are
- It keeps consistent with the rest of the package
- It allows the user to use whatever software they want in order to fit the regression, not necessarily
lm
,
The disadvantages of this approach are
- An uglier function signature with more arguments.
What are your thoughts on this decision? Is there a particular reason that you designed the function the way you did?
#' Cp will have the lowest Cp using both definitions. | ||
#' | ||
#' @param full_model Either a \code{formula} or \code{lm} for the OLS model using all predictors. | ||
#' @param sub_model Either a \code{formula} or \code{lm} for the OLS model with a subset of predictors |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did you consider any variable names other than sub
? I'm afraid that it won't be obvious to users that sub
means subset of columns
? What about candidate
or subset
?
Added Mallows's Cp metric. Still missing unit tests, although I don't know how useful they would be. Also needs a formula added to the README. I'd be happy to do this, although I don't know how.