The package SAM targets at high dimensional predictive modeling (regression and classification) for complex data analysis. SAM is short for sparse additive modeling, and adopts the computationally efficient basis spline technique. We solve the optimization problems by various computational algorithms including the block coordinate descent algorithm, fast iterative soft-thresholding algorithm, and newton method. The computation is further accelerated by warm-start and active-set tricks.
SAM uses OpenMP to enables faster matrix multiplication. So, to use SAM, you must correctly enables OpenMP for the compiler.
For Windows and Linux users, newest version of GCC has fully support of OpenMP.
But for MAC OS users, things are a little tricky since the default llvm on MAC OS does not support OpenMP. But the solution is easy. You can simply install llvm with full OpenMP support and direct R using this version of llvm.
First, install llvm with OpenMP support by typing
brew install llvm
Then append the following lines into ~/.R/Makevars
to enable llvm with OpenMP support to be the compiler for R packages.
CC = /usr/local/bin/clang-omp
CXX = /usr/local/bin/clang-omp++
CXX98 = /usr/local/bin/clang-omp++
CXX11 = /usr/local/bin/clang-omp++
CXX14 = /usr/local/bin/clang-omp++
CXX17 = /usr/local/bin/clang-omp++
OBJC = /usr/local/bin/clang-omp
OBJCXX = /usr/local/bin/clang-omp++
First, you need to install the devtools package. You can do this from CRAN. Invoke R and then type
install.packages(devtools)
Then load the devtools package and install SAM
library(devtools)
install_github("HMJiangGatech/sam")
library(SAM)
Windows User: If you encounter a Rtools version issue: 1. make sure you install the latest Rtools; 2. try the following code
assignInNamespace("version_info", c(devtools:::version_info, list("3.5" = list(version_min = "3.3.0", version_max = "99.99.99", path = "bin"))), "devtools")
Ideally you can just install and enable SAM using with the help of CRAN on an R console.
install.packages("SAM")
library(SAM)
With SAM, you can run linear regression, logistic regression and poisson regression.
## generating training data
n = 100
d = 500
X = 0.5*matrix(runif(n*d),n,d) + matrix(rep(0.5*runif(n),d),n,d)
## generating response
y = -2*sin(X[,1]) + X[,2]^2-1/3 + X[,3]-1/2 + exp(-X[,4])+exp(-1)-1
## Training
out.trn = samQL(X,y)
out.trn
## plotting solution path
plot(out.trn)
## generating testing data
nt = 1000
Xt = 0.5*matrix(runif(nt*d),nt,d) + matrix(rep(0.5*runif(nt),d),nt,d)
yt = -2*sin(Xt[,1]) + Xt[,2]^2-1/3 + Xt[,3]-1/2 + exp(-Xt[,4])+exp(-1)-1
## predicting response
out.tst = predict(out.trn,Xt)
## generating training data
n = 200
d = 100
X = 0.5*matrix(runif(n*d),n,d) + matrix(rep(0.5*runif(n),d),n,d)
y = sign(((X[,1]-0.5)^2 + (X[,2]-0.5)^2)-0.06)
## flipping about 5 percent of y
y = y*sign(runif(n)-0.05)
y = sign(y==1)
## Training
out.trn = samLL(X,y)
out.trn
## plotting solution path
plot(out.trn)
## generating testing data
nt = 1000
Xt = 0.5*matrix(runif(nt*d),nt,d) + matrix(rep(0.5*runif(nt),d),nt,d)
yt = sign(((Xt[,1]-0.5)^2 + (Xt[,2]-0.5)^2)-0.06)
## flipping about 5 percent of y
yt = yt*sign(runif(nt)-0.05)
yt = sign(yt==1)
## predicting response
out.tst = predict(out.trn,Xt)
## generating training data
n = 200
d = 100
X = 0.5*matrix(runif(n*d),n,d) + matrix(rep(0.5*runif(n),d),n,d)
u = exp(-2*sin(X[,1]) + X[,2]^2-1/3 + X[,3]-1/2 + exp(-X[,4])+exp(-1)-1+1)
y = rep(0,n)
for(i in 1:n) y[i] = rpois(1,u[i])
## Training
out.trn = samEL(X,y)
out.trn
## plotting solution path
plot(out.trn)
## generating testing data
nt = 1000
Xt = 0.5*matrix(runif(nt*d),nt,d) + matrix(rep(0.5*runif(nt),d),nt,d)
ut = exp(-2*sin(Xt[,1]) + Xt[,2]^2-1/3 + Xt[,3]-1/2 + exp(-Xt[,4])+exp(-1)-1+1)
yt = rep(0,nt)
for(i in 1:nt) yt[i] = rpois(1,ut[i])
## predicting response
out.tst = predict(out.trn,Xt)
To get complete documentation of SAM, please type ?SAM
in an R terminal.
The scripts used for experiments are in the folder tests/
, to run the experiments, you should open the R terminal in the root of the project folder, and type
source("tests/test_linear.R")
or
source("tests/test_logis.R")
The machine we ran experiments on is a PC with
RAM: 31.3GB
Processor: Intel Core I7-6700T @ 2.80GHZ x8
Operating System: Ubuntu 16.04 LTS
R version: 3.2.3
GCC version: 5.4.0
We compared our results on linear regression and logistic regression with other packages, namely, grplasso, grpreg and gglasso. Also, for SAM and grpreg which support MCP regularization function, we run tests on them both with MCP regularizer and with L1 regularizer.
SAM with L1 | SAM with MCP | previous SAM | grplasso | grpreg with L1 | grpreg with MCP | gglasso | |
---|---|---|---|---|---|---|---|
time/s | 13.458 | 13.115 | 14.523 | 42.346 | 34.081 | 35.460 | 22.191 |
loss | 2.08e-5 | 1.66e-5 | 2.06e-5 | 2.26e-4 | 5.18e-4 | 2.12e-5 | 2.86e-3 |
SAM with L1 | SAM with MCP | previous SAM | grplasso | grpreg with L1 | grpreg with MCP | |
---|---|---|---|---|---|---|
time/s | 7.758 | 9.384 | 96.361 | 18.700 | 92.922 | 28.333 |
accuracy | 0.9054 | 0.9226 | 0.9109 | 0.9102 | 0.9052 | 0.8843 |
[1] Tuo Zhao, and Han Liu, Sparse Additive Machine, 2012.
[2] Pradeep Ravikumar, John Lafferty, Han Liu, Larry Wasserman, Pradeep, et al. Sparse additive models, 2009
[3] Xingguo Li, Jason Ge, Haoming Jiang, Mingyi Hong, Mengdi Wang, and Tuo Zhao, Boosting Pathwise Coordiante Optimization: Sequential Screening and Proximal Subsampled Newton Subroutine, 2016