-
Notifications
You must be signed in to change notification settings - Fork 2
/
Copy pathmexTrainDL.m
114 lines (111 loc) · 5.49 KB
/
mexTrainDL.m
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
%
% Usage: [D [model]]=mexTrainDL(X,param[,model]);
% model is optional
%
% Name: mexTrainDL
%
% Description: mexTrainDL is an efficient implementation of the
% dictionary learning technique presented in
%
% "Online Learning for Matrix Factorization and Sparse Coding"
% by Julien Mairal, Francis Bach, Jean Ponce and Guillermo Sapiro
% arXiv:0908.0050
%
% "Online Dictionary Learning for Sparse Coding"
% by Julien Mairal, Francis Bach, Jean Ponce and Guillermo Sapiro
% ICML 2009.
%
% Note that if you use param.mode=1 or 2, if the training set has a
% reasonable size and you have enough memory on your computer, you
% should use mexTrainDL_Memory instead.
%
%
% It addresses the dictionary learning problems
% 1) if param.mode=0
% min_{D in C} (1/n) sum_{i=1}^n (1/2)||x_i-Dalpha_i||_2^2 s.t. ...
% ||alpha_i||_1 <= lambda
% 2) if param.mode=1
% min_{D in C} (1/n) sum_{i=1}^n ||alpha_i||_1 s.t. ...
% ||x_i-Dalpha_i||_2^2 <= lambda
% 3) if param.mode=2
% min_{D in C} (1/n) sum_{i=1}^n (1/2)||x_i-Dalpha_i||_2^2 + ...
% lambda||alpha_i||_1 + lambda_2||alpha_i||_2^2
% 4) if param.mode=3, the sparse coding is done with OMP
% min_{D in C} (1/n) sum_{i=1}^n (1/2)||x_i-Dalpha_i||_2^2 s.t. ...
% ||alpha_i||_0 <= lambda
% 5) if param.mode=4, the sparse coding is done with OMP
% min_{D in C} (1/n) sum_{i=1}^n ||alpha_i||_0 s.t. ...
% ||x_i-Dalpha_i||_2^2 <= lambda
% 6) if param.mode=5, the sparse coding is done with OMP
% min_{D in C} (1/n) sum_{i=1}^n 0.5||x_i-Dalpha_i||_2^2 +lambda||alpha_i||_0
%
%
%% C is a convex set verifying
% 1) if param.modeD=0
% C={ D in Real^{m x p} s.t. forall j, ||d_j||_2^2 <= 1 }
% 2) if param.modeD=1
% C={ D in Real^{m x p} s.t. forall j, ||d_j||_2^2 + ...
% gamma1||d_j||_1 <= 1 }
% 3) if param.modeD=2
% C={ D in Real^{m x p} s.t. forall j, ||d_j||_2^2 + ...
% gamma1||d_j||_1 + gamma2 FL(d_j) <= 1 }
% 4) if param.modeD=3
% C={ D in Real^{m x p} s.t. forall j, (1-gamma1)||d_j||_2^2 + ...
% gamma1||d_j||_1 <= 1 }
%
% Potentially, n can be very large with this algorithm.
%
% Inputs: X: double m x n matrix (input signals)
% m is the signal size
% n is the number of signals to decompose
% param: struct
% param.D: (optional) double m x p matrix (dictionary)
% p is the number of elements in the dictionary
% When D is not provided, the dictionary is initialized
% with random elements from the training set.
% param.K (size of the dictionary, optional is param.D is provided)
% param.lambda (parameter)
% param.lambda2 (optional, by default 0)
% param.iter (number of iterations). If a negative number is
% provided it will perform the computation during the
% corresponding number of seconds. For instance param.iter=-5
% learns the dictionary during 5 seconds.
% param.mode (optional, see above, by default 2)
% param.posAlpha (optional, adds positivity constraints on the
% coefficients, false by default, not compatible with
% param.mode =3,4)
% param.modeD (optional, see above, by default 0)
% param.posD (optional, adds positivity constraints on the
% dictionary, false by default, not compatible with
% param.modeD=2)
% param.gamma1 (optional parameter for param.modeD >= 1)
% param.gamma2 (optional parameter for param.modeD = 2)
% param.batchsize (optional, size of the minibatch, by default
% 512)
% param.iter_updateD (optional, number of BCD iterations for the dictionary
% update step, by default 1)
% param.modeParam (optimization mode).
% 1) if param.modeParam=0, the optimization uses the
% parameter free strategy of the ICML paper
% 2) if param.modeParam=1, the optimization uses the
% parameters rho as in arXiv:0908.0050
% 3) if param.modeParam=2, the optimization uses exponential
% decay weights with updates of the form
% A_{t} <- rho A_{t-1} + alpha_t alpha_t^T
% param.rho (optional) tuning parameter (see paper arXiv:0908.0050)
% param.t0 (optional) tuning parameter (see paper arXiv:0908.0050)
% param.clean (optional, true by default. prunes
% automatically the dictionary from unused elements).
% param.verbose (optional, true by default, increase verbosity)
% param.numThreads (optional, number of threads for exploiting
% multi-core / multi-cpus. By default, it takes the value -1,
% which automatically selects all the available CPUs/cores).
%
% Output:
% param.D: double m x p matrix (dictionary)
%
% Note: this function admits a few experimental usages, which have not
% been extensively tested:
% - single precision setting
%
% Author: Julien Mairal, 2009