A SVM (Support Vector Machine) implementation which picked out from Common Lisp Machine Learning (CLML). Currently, this library support only Soft margin SVM for 2-class classification.
In shell,
cd ~/quicklisp/local-projects/ git clone https://github.com/masatoi/clml-svm.git
In Lisp,
(ql:quickload :clml-svm)
First, you should construct a training vector which must be double-float typed array. A element of training-vector should contain datapoint values and a correct label at last element (+1.0 or -1.0). e.g.
#(#(5.0d0 4.0d0 4.0d0 5.0d0 7.0d0 10.0d0 3.0d0 2.0d0 1.0d0 +1.0d0)
#(6.0d0 8.0d0 8.0d0 1.0d0 3.0d0 4.0d0 3.0d0 7.0d0 1.0d0 +1.0d0)
#(8.0d0 10.0d0 10.0d0 8.0d0 7.0d0 10.0d0 9.0d0 7.0d0 1.0d0 -1.0d0)
...)
You can also use READ-LIBSVM-DATA function to make dateset. This function allows to read sparse expression used in LIBSVM datasets (http://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/).
e.g., in case of reading ‘a1a data’ (http://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/binary.html#a1a),
(defparameter training-vector (read-libsvm-data "/path/to/a1a"))
(defparameter test-vector (read-libsvm-data "/path/to/a1a.t"))
Secondly, you should select or define a kernel-function. Several kernel-functions are prepared (Linear-kernel, RBF-kernel and polynomial-kernel).
(defparameter kernel (make-rbf-kernel :gamma 0.0078125d0))
And then, make model object and train it using ‘make-svm-model’ function. C parameter means penalty for invasion of margin. If C has sufficiently big value, the model is similar to Hard margin SVM.
(defparameter model (make-svm-model training-vector kernel :C 4.0d0))
You can now make prediction or validation for test dataset. ‘cross-validation’ function performs N-fold cross validation.
(discriminate model (aref test-vector 0))
; => 1.0d0
(svm-validation model test-vector)
; => (((-1.0d0 . 1.0d0) . 3305) ((1.0d0 . 1.0d0) . 4141) ((-1.0d0 . -1.0d0) . 21956) ((1.0d0 . -1.0d0) . 1554)),
; 84.30352758754361d0
(cross-validation 3 training-vector kernel :C 4.0d0)
; => 83.17757009345794d0,
; (82.80373831775701d0 82.05607476635514d0 84.67289719626169d0)
When using RBF-kernel, you should determine meta-parameters gamma and C. Although it take a time, ‘grid-search’ function is available for automatic determination of gamma and C.
(grid-search training-vector test-vector)
; => 84.413360899341d0 (best accuracy),
; 0.0078125d0 (gamma),
; 8.0d0 (C)
;; cross-validation version
(grid-search-by-cv 3 training-vector)
; => 83.86292834890965d0,
; 1.220703125d-4,
; 512.0d0
We’re supporting only ANSI Common Lisp and mainly testing by SBCL.
- Allegro CL 9.0 (non-SMP) Enterprise 32 Edition (ANSI mode, any platforms)
- Allegro CL 9.0 (non-SMP) Enterprise 64 Edition (ANSI mode, any platforms)
- lispworks-6-0-0-amd64-linux
- lispworks-6-0-0-x86-linux
- sbcl-1.0.28-x86-64-linux
- Clozure CL 1.9
CLML is licensed under the terms of the Lisp Lesser GNU Public License, known as the LLGPL and distributed with CLML as the file “LICENSE”. The LLGPL consists of a preamble and the LGPL, which is distributed with CLML as the file “LGPL”. Where these conflict, the preamble takes precedence.
The LGPL is also available online at: http://www.gnu.org/licenses/old-licenses/lgpl-2.1.html
The LLGPL is also available online at: http://opensource.franz.com/preamble.html