Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implementation for cyclic and mini-batch option #24

Open
wants to merge 26 commits into
base: master
Choose a base branch
from
Open

Conversation

wzzlcss
Copy link
Collaborator

@wzzlcss wzzlcss commented Jun 24, 2019

This pull request adds options to enable cyclic and mini batch saga. We can use these new features by setting batchsize (default = 1) and cyclic (default = FALSE). I calculate epoch = floor(n_samples/batchsize) as the number of updates on coefficients in one iteration. Before a new iteration, an index matrix is renewed that each column contains indices of samples for an update. This implementation pulls fresh batch every iteration.

Correctness check using abalone dataset (4177 samples) and gaussian model

  • Full batch without penalty is expected to get a same fit as (X^T X)^{-1}X^T y
X <- matrix(as.numeric(unlist(abalone$x)), ncol = 9)
y <- as.numeric(unlist(abalone$y))
X2 <- matrix(cbind(rep(1, 4177), X), ncol = 10) 
ols <- solve(t(X2)%*%X2)%*%t(X2)%*%y
fit <- sgdnet(X, y, alpha = 0, lambda = 0, maxit = 10000000, 
              thresh = 0.00000001, batchsize = 4177)

Screen Shot 2019-07-03 at 8 19 10 PM

  • With ridge penalty
glmfit_ridge <- glmnet(X, y, alpha = 0, lambda = 0.4)
sgdfit_ridge <- sgdnet(X, y, alpha = 0, lambda = 0.4)
batch_ridge  <- sgdnet(X, y, alpha = 0, lambda = 0.4, thresh = 0.00001, batchsize = 10)

Screen Shot 2019-07-03 at 8 19 23 PM

  • With lasso penalty
glmfit_lasso <- glmnet(X, y, alpha = 1, lambda = 0.4)
sgdfit_lasso <- sgdnet(X, y, alpha = 1, lambda = 0.4)
batch_lasso  <- sgdnet(X, y, alpha = 1, lambda = 0.4, thresh = 0.00001, batchsize = 10)

Screen Shot 2019-07-03 at 8 19 41 PM

Performance

  • Mini-batch version often requires more times on benchmark dataset to reach the same loss and outperforms saga on randomly generated GLM dataset.

Screen Shot 2019-07-03 at 8 35 35 PM

Screen Shot 2019-07-03 at 8 35 22 PM

@@ -32,5 +32,6 @@ Suggests:
latticeExtra
LinkingTo:
Rcpp (>= 0.12.16),
RcppEigen (>= 0.3.3.4.0)
RcppEigen (>= 0.3.3.4.0),
testthat
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to link to testthat? In R parlance, "LinkingTo" refers to sharing compiled (C / C++) code but I don't see that being used here

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When I want to add test file for functions from package’s compiled library, it seems that without this link, testtaht::use_catch cannot find <testthat.h>. I will remove this since most functions are in head file.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need to remove it - it seems like a good thing to use. (I just couldn't find the actual use of it since you deleted the tests in subsequent commits on this branch.)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So where is this being used? I don't see it in the test files.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi mentor, it is not being used now, I previously thought I was going to formally test c++ helper function.

@@ -1,40 +0,0 @@
context("general family tests")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why delete this file without a replacement?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tired to isolate problems on Travis. I will add it back.

@michaelweylandt
Copy link
Collaborator

Hi Daisy,

This PR and its commit history are a bit messy. Could you rebase it (create a new more direct history) before we review it? If you need help with this process, let me know. (It's pretty confusing your first time through, but such a useful thing to learn in the long run.)

It also looks like you've been removing most of the test files - is this intentional or just trying to isolate problems on Travis?

@wzzlcss
Copy link
Collaborator Author

wzzlcss commented Jun 25, 2019

Hi Michael, thank you for the comments! I will rebase the commit history. Earlier Travis CI said test-families.R fails (but it pass with my environment) so I will add it back. Testing cpp code via testthat::use_catch works for cpp files, and since our helper functions are in head files and exported main functions do not require that so I delete this infrastructure. I might only add test cases for main functions with cyclic and mini batch.

@jolars jolars self-requested a review June 30, 2019 14:21
@jolars jolars added the enhancement New feature or request label Jun 30, 2019
Copy link
Owner

@jolars jolars left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good job, Daisy. A few pointers:

  1. Please add tests along with any enhancements that you do. It is necessary that any changes are validated before they are merged. Also, please avoid deleting test files unless there is a good reason to do so (reiterating Michael's point). And if you like to do, it is usually always better to just add a skip() call instead or commenting out the tests.
  2. Try to adhere to the coding style that I've tried to use previously for the code. For the C++ side, this is mostly based on google's C++ style guide with a few exceptions (which I apologize for). The only really big exception is that I like to put return type declarations and inline modifiers on separate lines.
  3. I get a warning and a note when I run my r cmd check (and these are on travis too):
* checking Rd \usage sections ... WARNING
Undocumented arguments in documentation object 'sgdnet'batchsize

and

* checking compiled code ... NOTE
Filesgdnet/libs/sgdnet.so:
  Foundrand’, possibly fromrand’ (C)
    Object:sgdnet.oCompiled code should not call entry points which might terminate R nor
write to stdout/stderr instead of to the console, nor use Fortran I/O
nor system RNGs.

Could you try fixing these? I think the NOTE is related to your use of std::random_shuffle in utils.h, which you shouldn't use anyway since, if I'm not mistaken, c++ does not respect or use the random number generators from R, which means that results cannot be controlled by setting the seed on the R side, which is not so good. I found this post in the Rcpp gallery, which is probably exactly what you need.

src/families.h Outdated
{
gradient = linear_predictor - y.array().col(i);
for (unsigned i = 0; i < ind.rows(); ++i){
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
for (unsigned i = 0; i < ind.rows(); ++i){
for (unsigned i = 0; i < ind.rows(); ++i) {

src/saga-dense.h Outdated
// Unlag and rescale coefficients
w *= wscale;
wscale = 1.0;
Saga(Penalty& penalty,
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The indentation seems off here. I am guessing this is a result of copy-pasting code in R studio with auto-indentation on. Could you please undo the indentation?

if (lagged_amount != 0) {
penalty(w, j, wscale, lag_scaling[lagged_amount], g_sum);
lag[j] = k;
for (unsigned m = 0; m < subx.cols(); ++m){
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
for (unsigned m = 0; m < subx.cols(); ++m){
for (unsigned m = 0; m < subx.cols(); ++m) {

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add a space between ) and {

src/utils.h Outdated
const unsigned length)
{
Eigen::ArrayXXi index(1, length);
for (unsigned i = 0; i < length; ++i){
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
for (unsigned i = 0; i < length; ++i){
for (unsigned i = 0; i < length; ++i) {

src/utils.h Outdated
unsigned s_ind = floor(R::runif(0.0, n_samples));
index.col(i) = s_ind;
}
return(index);
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
return(index);
return index;

Try to use a consistent coding style. Most of the code currently uses return foo; rather than return(foo);, so please try to stick to this scheme.

src/utils.h Outdated
g_change_col = g_change.col(i);
step += g_change_col.rowwise()*subx.col(i).transpose().array();
}
return(step);
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
return(step);
return step;

src/utils.h Outdated
const unsigned B,
const bool cyclic)
{
if (B > 1) return(IndexBatch(n_samples, B));
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if (B > 1) return(IndexBatch(n_samples, B));
if (B > 1)
return IndexBatch(n_samples, B);

src/utils.h Outdated
const bool cyclic)
{
if (B > 1) return(IndexBatch(n_samples, B));
if (cyclic) return(IndexCyclic(n_samples, n_samples));
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if (cyclic) return(IndexCyclic(n_samples, n_samples));
if (cyclic)
return IndexCyclic(n_samples, n_samples);

src/utils.h Outdated
{
if (B > 1) return(IndexBatch(n_samples, B));
if (cyclic) return(IndexCyclic(n_samples, n_samples));
else return(IndexStochastic(n_samples, n_samples));
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
else return(IndexStochastic(n_samples, n_samples));
else
return IndexStochastic(n_samples, n_samples);

src/utils.h Outdated
Eigen::ArrayXXi index(B, n_iter);
Eigen::ArrayXi pool = Eigen::ArrayXi::LinSpaced(n_samples, 0, n_samples);

for (unsigned i = 0; i < n_iter; ++i){
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
for (unsigned i = 0; i < n_iter; ++i){
for (unsigned i = 0; i < n_iter; ++i) {

@wzzlcss wzzlcss force-pushed the dev branch 2 times, most recently from 487e627 to ca03008 Compare July 4, 2019 07:57
modify test case (roll back)

test case
wzzlcss added 7 commits July 4, 2019 01:06
Add helper function for cyclic and mini-batch [skip ci]

Some changes for Lag update [skip ci]

modify cyclic and mini batch [skip ci]
Delete test-runner.cpp

Delete catch-routine-registration.R

Delete test-cpp.R

Update test-lambda-path.R

Update test-cross-validation.R

Update RcppExports.cpp
delete space

check for integer batch size
Copy link
Owner

@jolars jolars left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this looks good now, Daisy. Good job!

I noticed that you are force-pushing your commits; please try to avoid this if you can. I know it's necessary sometimes, but make it a habit to push normally if possible.

@@ -30,9 +36,15 @@ test_that("all weights are zero when lambda > lambda_max", {
lambda_max <- max(abs(crossprod(yy, xx)) * sy)/NROW(x)

fit <- sgdnet(x, y, maxit = 1000, thresh = 0.0001)
fit_batch <- sgdnet(x, y, maxit = 1000, thresh = 0.0001, B = 10)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this be batchsize instead of B?

@jolars jolars requested a review from michaelweylandt July 4, 2019 09:08
@wzzlcss
Copy link
Collaborator Author

wzzlcss commented Jul 4, 2019

Hi Johan, thank you! Your comments help me a lot! I tried to delete some unnecessary commits. I will be more careful to keep a clean commit history.

@@ -16,6 +16,7 @@

#' @useDynLib sgdnet, .registration = TRUE
#' @importFrom Rcpp sourceCpp
#' @import methods
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are we importing the whole methods' package? AFAIR, we have no S4 anywhere in this code (except possibly sparse matrix support, but we shouldn't need methods` imported for that.)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have removed the methods' package. Some earlier travis ci wanted it, but now it is fine.

R/sgdnet.R Outdated
if (batchsize > n_samples)
stop("batch size cannot be larger than sample size.")

if (batchsize%%1 > 0)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe use the is.wholenumber function given in ?is.integer here - this check is a bit opaque to me.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, I am using the method from is.wholenumber instead.

gfit <- glmnet(x, y, family = "mgaussian", standardize.response = TRUE)

expect_equal(sfit$lambda, gfit$lambda)
expect_equivalent(coef(sfit), coef(gfit))
expect_equivalent(coef(bfit), coef(gfit), tolerance = 1e-2)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is still really loose and seems prone to false negatives: can we tighten this test?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have tightened the test to have smaller tolerance.

}
})
})
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing EOL here

src/saga-dense.h Outdated
// Outer loop
unsigned it_outer = 0;
bool converged = false;
do {

// Pull samples
index = Index(n_samples, B, cyclic);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sb Eigen::Index

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry about this, I have changed its name to "Ind" now.

src/utils.h Outdated
const unsigned length)
{
Eigen::ArrayXXi index(1, length);
index.row(0) = Eigen::ArrayXi::LinSpaced(n_samples, 0, n_samples);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't we just return the LinSpaced object here?

I'm also a bit confused on the design - this looks like it will always give the same samples at each iteration.

Copy link
Collaborator Author

@wzzlcss wzzlcss Aug 20, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi Michael, I changed it to give cyclic samples with random start at each iteration.

src/utils.h Outdated

//' wrapper aroud R's RNG such that we get a unifrom distribution over
//' [0,n) as required by the STL algorithm
inline int randWrapper(const int n) { return floor(unif_rand()*n); }
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there an Rcpp version we could use here instead? It's pretty hard to tell without going hunting that unif_rand is from R's C API.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi Michael, it seems that c++ does not respect random number generators from R, so I am using this to let the results be controlled by setting the seed on the R side following Johan’s suggestion.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't mean use a C++ RNG: I still meant use the R RNGs via C++.

Maybe Rcpp::as<int>(Rcpp::sample(n, 1)) - 1

My point was that it's not obvious (to me) that unif_rand does use the R RNGs, as opposed to being a C++ standard function or something from Eigen.

@michaelweylandt
Copy link
Collaborator

Hi @wzzlcss: Is this ready for final reviewing / merging?

@wzzlcss
Copy link
Collaborator Author

wzzlcss commented Aug 20, 2019

Hi mentors, I think this is ready for final reviewing.

@michaelweylandt
Copy link
Collaborator

@jolars - Can you take a look at this again? There's been quite a lot of work on it since July.

@michaelweylandt michaelweylandt requested a review from jolars August 21, 2019 22:41
Copy link
Collaborator

@michaelweylandt michaelweylandt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this looks good, though I haven't checked all the C++. Will let @jolars give the final thumbs up.


expect_equal(sfit$lambda, gfit$lambda)
expect_equivalent(coef(sfit), coef(gfit))
expect_equivalent(coef(bfit), coef(gfit), tolerance = 1e-3)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need a looser tolerance here than with the regular sfit?

intercept = FALSE,
thresh = 0.000001,
maxit = 1000,
batchsize = 500)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we set this 500 to n or to NROW(x) to make the intent (full gradient) a bit clearer?

Copy link
Owner

@jolars jolars left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apart from the issues @michaelweylandt pointed out, I think this looks good.

@michaelweylandt
Copy link
Collaborator

Great - thanks @jolars. @wzzlcss, I think one more commit addressing the simple stuff I noted and we'll be good to merge this.

@michaelweylandt
Copy link
Collaborator

Hi @wzzlcss, Is this ready to merge? (I just noted one test that looks a bit loose, but I think everything else has been addressed.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants