Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Adds Linear, Ridge and Lasso regressions #10

Closed
wants to merge 5 commits into from
Closed

[WIP] Adds Linear, Ridge and Lasso regressions #10

wants to merge 5 commits into from

Conversation

Nimpruda
Copy link

see #7

@codecov
Copy link

codecov bot commented Dec 12, 2019

Codecov Report

Merging #10 into master will decrease coverage by 13.35%.
The diff coverage is 0%.

Impacted file tree graph

@@             Coverage Diff             @@
##           master      #10       +/-   ##
===========================================
- Coverage   96.68%   83.33%   -13.36%     
===========================================
  Files           7        9        +2     
  Lines         181      210       +29     
===========================================
  Hits          175      175               
- Misses          6       35       +29
Impacted Files Coverage Δ
...infa-supervised/src/linear_regression/algorithm.rs 0% <0%> (ø)
linfa-supervised/src/ridge_regression/algorithm.rs 0% <0%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update ec5d491...74e30f9. Read the comment docs.

@@ -13,6 +13,8 @@ keywords = ["machine-learning", "linfa", "ai", "ml"]
categories = ["algorithms", "mathematics", "science"]

[dependencies]

linfa-supervised = {path = "linfa-supervised"}
Copy link
Contributor

@remram44 remram44 Dec 14, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You need to add a version number, or publishing the linfa crate to crates.io will fail 😉

Suggested change
linfa-supervised = {path = "linfa-supervised"}
linfa-supervised = { path = "linfa-supervised", version = "0.1" }


[dependencies]
ndarray = { version = "0.13" , features = ["rayon"] }
ndarray-linalg = { version = "0.12", features = ["openblas"] }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't want to force the usage of a certain BLAS implementation over the other ones (e.g. intel-mkl) - I would suggest to use a features-strategy similar to what I put down here: https://github.com/rust-ndarray/ndarray-examples/blob/d7d43e018cade1f8bf0d92220081ba66963bab07/linear_regression/Cargo.toml#L8

let x_hat = array![[6.0], [7.0]];
println!("{:#?}", linear_regression.predict(&x_hat));

let mut linear_regression2 = LinearRegression::new(true);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would suggest splitting this example (and the one below) in two separate examples, with a descriptive name (e.g. with/without intercept).
It would be ideal to have them in separate files as well under the examples folder.

use ndarray::{stack, Array, Array1, ArrayBase, Axis, Data, Ix1, Ix2};
use ndarray_linalg::Solve;
/* I will probably change the implementation for an enum for more type safety.
I have to make sure, it is a great idea when it comes to pyhton interoperability
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is considerable freedom in how to wrap the Rust version for Python consumption - as detailed in #8, we shouldn't let Python move our Rust design in directions which are not idiomatic. The wrapping code can do the bridging when required 😀

So I'd definitely suggest to go with the commented out version, which uses an enum (LinearRegression::new(false) is much more confusing than LinearRegression::new(Intercept::NoIntercept)).

use ndarray::{Array, Array1, ArrayBase, Data, Ix1, Ix2};
use ndarray_linalg::Solve;
/* The difference between a linear regression and a Ridge regression is
that ridge regression has an L2 penalisation term to having some features
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo? avoid having?

}

impl RidgeRegression {
pub fn new(alpha: f64) -> RidgeRegression {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't we have an intercept parameter here as well?

@@ -0,0 +1,6 @@
trait GradientDescent {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this used anywhere?

@@ -0,0 +1,16 @@
[package]
name = "linfa-supervised"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would probably prefer this to be named linfa-linear - we will have many more algorithm belonging to the family of supervised learning and we don't want to have all of them in the same sub-crate.

let dummy_column: Array<f64, _> = Array::ones((n_samples, 1));
let X = stack(Axis(1), &[dummy_column.view(), X.view()]).unwrap();
match &self.beta {
None => panic!("The linear regression estimator has to be fitted first!"),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not ideal - can we refactor the way we build LinearRegression to make sure that beta is always there when a user calls predict?

The easiest way to do this is re-using the same approach I put down in KMeans: a structure to hold the hyperparameters of the model (built with/without builder pattern, depending on how many hyperparameters we have there) and a fit method on it that returns a fitted LinearRegression - basically, using fit as constructor. In this way we can be sure that beta is there and we have one less panic condition 😁

@LukeMathWalker
Copy link
Contributor

Thanks for working on it @Nimpruda!
I left some comments here and there to give you pointers on the next step to make this PR ready to be merged - it's already half-way there 😁

I see we have some issues in CI due to ndarray-linalg requiring a BLAS implementation to build correctly - I'll work in a separate PR to get this issue solved so that your branch can compile correctly on Travis 👍

@LukeMathWalker LukeMathWalker mentioned this pull request Dec 15, 2019
24 tasks
@Nimpruda
Copy link
Author

I'll work on it whenever I have time! Thanks for all the feedback

@bytesnake
Copy link
Member

superseded by #20

@bytesnake bytesnake closed this Jul 20, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants