-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
If there is only one converged run, the sivs
will break
#2
Comments
Alex and I exchanged few emails, but I thought this part of his last email should be posted here as this is directly contributing towards addressing this issue:
|
There is one very rare condition that
sivs()
does not catch and it will break without a proper error. In this post/issue I'm going to first dissect the way this issue can happen, and then I will list how this can be addressed, and ultimately I elaborate on my preference on how this should be addressed.The reasons I'm writing this are:
Dissection of the problem
In the following line we are removing the runs that have failed to converge (due to whatever reason that is relevant to the chosen method)
sivs/R/sivs.R
Lines 1010 to 1011 in 4a57625
and in the following lines we are making sure that we have at least something to proceed with:
sivs/R/sivs.R
Lines 1013 to 1019 in 4a57625
and if we have at least one good run, we proceed with this beautiful and compact part to extract the coefficients in a data.frame format:
sivs/R/sivs.R
Lines 1021 to 1030 in 4a57625
The issue raises when
clean.iterative.res
has length of 1, which subsequently would cause thecoef.df
to be a data.frame with only two columns. When we move the row names to row.names, the data.frame would be left with only one column which consequently turn into a vector:sivs/R/sivs.R
Lines 1032 to 1034 in 4a57625
This can be reproduced with this small demo:
And this is the problem, because the following
apply()
expects a tabular object but what it will receive is just a vector:sivs/R/sivs.R
Lines 1036 to 1039 in 4a57625
Possible solutions and my thoughts
There are many ways to solve this programmatically, like converting the
coef.df
to a matrix or even using the tibble package for all tabular data we have in the SIVS package; but the question is more fundamental imho (whennfolds > 1
) :I would argue that the RFE should not be ran, because having one single lucky run is well against having a reproducible research and what "Stable Iterative Variable Selection" is all about.
If we decide not to move to the next step in SIVS (i.e recursive feature elimination), then the question would be on how this should be reported to user and what should we return? Possible options are
I would argue that we should throw an error and do not return anything to user. This is because, in spite of writing it in the article that SIVS should not be treated as a black-box method, many naively would still run the code and expect magic, and in the off chance of hitting this "lucky" run, they will just go with what SIVS is providing and they will move forward without thinking twice about what has happened and why. Perhaps when #1 is implemented, the user can have access to the list of coefficients and can do some sort of RFE themselves at their own risk if they want to.
Reports
This issue has been reported on Stackoverflow as well:
Acknowledgment
I would like to thank Alexander Biehl who reported this issue.
I will edit this post when/if I get their consent to put their real name or GithubID here.As I mentioned, this is a very rare case and Alex responsibly and kindly reported the issue to me via email. The time he invested in writing that email should also be factored in for resolution of this issue. Also Alex pointed me to the Stackoverflow post which I mentioned above.The text was updated successfully, but these errors were encountered: