Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MLJ extension #61

Merged
merged 6 commits into from
Aug 9, 2023
Merged

MLJ extension #61

merged 6 commits into from
Aug 9, 2023

Conversation

tjjarvinen
Copy link
Collaborator

This PR adds support for MLJ interface solvers.

At this state there is only support for MLJScikitLearnInterface.jl and MLJLinearModels.jl. You can look at all available MLJ solvers here.

To use MLJ you just need to create MLJ solver and then use it with linear_fit like any other solver

using ACEfit
using MLJ

# Load ARD solver
ARDRegressor = @load ARDRegressor pkg=MLJScikitLearnInterface

# Create the solver itself and give it parameters
solver = ARDRegressor(
    n_iter = 300,
    tol = 1e-3,
    threshold_lambda = 10000
)

# This is equal to the already existing ARD solver we have

@cortner
Copy link
Member

cortner commented Jul 25, 2023

questions:

  • the docs say it but can you please still add a line to the code snipped to explain how the solvers are used and clarify that the ACEfit.solve is overloaded to be able to use those MLJ solvers.
  • why are there multiple interfaces? I don't understand why there can't be a single interface to MLJ?
  • does this fully replace the current SKLEARN interface?

@cortner
Copy link
Member

cortner commented Jul 25, 2023

@wcwitt -- we thought it would be useful to try out this interface as there are a lot of solvers in MLJ that we haven't considered yet. It should really change anything a bit a strict extension. Btw - once you spin your BLR solvers out of ACEfit you may want to consider making then accessible via MLJ as well.

There is also more functionality in MLJ that we should consider incorporating into our workflow rather than rewrite things from scratch.

@tjjarvinen
Copy link
Collaborator Author

  • why are there multiple interfaces? I don't understand why there can't be a single interface to MLJ?

MLJ has different output depending on solvers. SKLearn solvers e.g. return Python objects, while MLJLinearModels returns Julia Vectors. You will then need to transform the output to our format, so I just found it easier to make two extensions.

  • does this fully replace the current SKLEARN interface?

It is not a replacement, but an addition. Nothing has been removed and everything works like in the past. Now we just have two alternative ways to use SKLearn.

@wcwitt
Copy link
Collaborator

wcwitt commented Jul 26, 2023

Thanks both, I'm supportive of this.

does this fully replace the current SKLEARN interface?

It is not a replacement, but an addition. Nothing has been removed and everything works like in the past. Now we just have two alternative ways to use SKLearn.

But could we go ahead and remove the old way?

@wcwitt
Copy link
Collaborator

wcwitt commented Jul 26, 2023

I have a general question about J1.9 extensions - how should we decide when to use them?

My (weakly held) preference is to keep them to a minimum. It makes sense to use them here because of the Python dependencies, but how do you think about the decision in general?

@cortner
Copy link
Member

cortner commented Jul 26, 2023

I agree, keeping them to a minimum. I would use them for heavy dependencies that take a long time to load or that are rarely used

@tjjarvinen
Copy link
Collaborator Author

But could we go ahead and remove the old way?

We could. But MLJ SKLearn has some issues, like we cannot get the number of iterations the solver used. So, for now I would take it as "a testing status" and then later on when it has been in use for some time, we can drop the old one.

@tjjarvinen
Copy link
Collaborator Author

I agree, keeping them to a minimum. I would use them for heavy dependencies that take a long time to load or that are rarely used

This is the way to go.

@cortner
Copy link
Member

cortner commented Jul 27, 2023

Regarding the old sklearn interface how about we @deprecate it?

@tjjarvinen
Copy link
Collaborator Author

Extensions complicate @deprecate-macro use. We could deprecate the old SKLearn structure, but we could not give any new structure, as an new(x) for the macro because of extensions.

Maybe just adding a @warn there that would give instruction of the MLJ use be a better option?

@cortner
Copy link
Member

cortner commented Jul 27, 2023

I see - a warning sounds good then. And remove at the next minor version.

@cortner
Copy link
Member

cortner commented Jul 27, 2023

I wouldn't call it "better" just that this will be removed in favour of the MLJ interface.

@cortner
Copy link
Member

cortner commented Aug 1, 2023

is this tentatively ready to review and merge?

@cortner
Copy link
Member

cortner commented Aug 1, 2023

Just to note here #64 -- I don't propose to extend this PR but just to be aware that other parts of the MLJ ecosystem could be very useful for us and we should consider integrating them as well.

@cortner cortner mentioned this pull request Aug 1, 2023
@tjjarvinen
Copy link
Collaborator Author

Yes, it should be in working condition now. Next step would be to gather user experience.

@cortner
Copy link
Member

cortner commented Aug 1, 2023

@wcwitt if this doesn't change functionality, should we merge and register and treat any problems as future bugfixes? Are you willing to take a quick look at it?

@wcwitt
Copy link
Collaborator

wcwitt commented Aug 2, 2023

Sounds fine, running out of time to look it over today, but I will by tomorrow.

Edit: might need another day, sorry.

Copy link
Collaborator

@wcwitt wcwitt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for adding these features. I made two small comments.


```julia
using ACEfit
using MLJ
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Conceptually, does it make sense to recommend that MLJ is loaded first?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Loading order does not matter, so I would leave it as it is.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The example in ext/ACEfit_MLJLinearModels_ext.jl loads MLJ first - can we make them consistent?

test/test_mlj.jl Outdated
@info(" ... MLJLinearModels LinearRegressor")
LinearRegressor = @load LinearRegressor pkg=MLJLinearModels
solver = LinearRegressor()
solver = ACEfit.LSQR(damp = 0, atol = 1e-6)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe this line should be deleted?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

MLJ has different back ends. MLJLinearModels and MLJScikitLearnInterface are the ones that have been implemented. The point here is that these two behave differently on the background, so we need to test them both. So, there is one test for MLJScikitLearnInterface and one for MLJLinearModels. So, removing the part you suggest would leave MLJLinearModels without a test.

We could put here some solver that would be actually used, but all MLJLinearModels have the same interface on the background, so I thought that testing with LSQR would be the fastest.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, maybe I don't understand what's happening then. But it looks like we set solver equal to LinearRegressor() and then immediately overwrite it with a non-MLJ LSQR in the next line. Is that wrong?

By contrast the MLJScikitLearnInterface test makes sense to me.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nope, you are correct and I am the one who cannot read...

Fixed the test and added the needed fix for parameter extraction.

@wcwitt wcwitt merged commit 2c61394 into ACEsuit:main Aug 9, 2023
7 checks passed
@wcwitt
Copy link
Collaborator

wcwitt commented Aug 9, 2023

@cortner I just merged, not sure if you want to tag at this stage or not

@cortner
Copy link
Member

cortner commented Aug 9, 2023

I like tagging many versions. This is now registered as 0.1.3.

@tjjarvinen tjjarvinen deleted the mlj branch August 10, 2023 01:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants