Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

API: naming and functions #32

Open
mathurinm opened this issue Nov 9, 2020 · 4 comments
Open

API: naming and functions #32

mathurinm opened this issue Nov 9, 2020 · 4 comments

Comments

@mathurinm
Copy link
Collaborator

As we saw in #30 and with @ksehic, newcomer's life could be made easier.

I would like to discuss a few API changes:

  • Naming: from my (personal) user perspective it's surprising to have sparse_ho.models.Lasso which has as an attribute a celer.Lasso
    Since models.Lasso sets the alpha, I would name this class LassoGradSearch, or LassoGS or LassoAuto, or something else (in a similar way as in sklearn.linear_model.LassoCV)
  • exposition of estimator: in the newly named LassoGradSearch, estimator should be made private. It's only called for its solver, and I don't think the user should be aware of it.
  • the user should not have to pass estimator when creating LassoGradSearch. It should be created in the init (if possible with sklearn convention, oterwise on the first call to fit). Conceptually, this makes the instanciation of the class easier.
  • LassoGradSearch should expose a fit and a predict (so the user does not need to access its estimator), and pass sklearn.utils.estimators_check.check_estimator

@agramfort @josephsalmon what do you think ?

@agramfort
Copy link
Collaborator

agramfort commented Nov 9, 2020 via email

@mathurinm
Copy link
Collaborator Author

mathurinm commented Jan 26, 2021

Given the past work, I think it should be doable to have something along the lines of:

criterion = HeldOutMSE(idx_train, idx_val)  # better yet, train_frac=0.75 if possible with CV
algo = ImplicitForward()
monitor_grad = Monitor()  # this could go inside model too, I think ? 
optimizer = LineSearch(n_outer=10)
model = LassoAuto(estimator=estimator, criterion=criterion, alpha=alpha0)
# in my ideal world we should not pass estimator, (skelarn LassoCV does not need sklearn Lasso) but with sklearn *
# conventions I don't know what's possible. Instanciate at first fit call, maybe, like sklearn does for coef_ ?
model.fit(X, y, algo=algo, optimizer=optimizer, search='grad', callback=None)  # grad_search happens here, do we
#  even need to specify it ? 
# ie. is there an alternative ? for gridsearch, use sklearn
model.predict(X_test)

My 2 issues with the current design is that we have to instantiate 6 objects to fit (estimator, model, monitor, algo, criterion, optimizer), and that we must use estimator to predict in the end, while it should remain hidden from the user (it's not obvious that it's modified by grad_search)

Does it sound doable the way I propose, @QB3 @Klopfe ? this may help for refitting in #86

@QB3
Copy link
Owner

QB3 commented Jan 26, 2021

While we are discussing the API, it seems that instantiating HeldOutMSE with idx_train and idx_val is counter intuitive.
What do you think HeldOutMSE should take as argument? @agramfort @josephsalmon @Klopfe @mathurinm ?

@agramfort
Copy link
Collaborator

agramfort commented Jan 26, 2021 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants