Training gets stuck on a specific dataset #225

fumoboy007 · 2024-12-04T00:59:05Z

I am trying to train a C-SVC on a specific dataset. The training process gets stuck, never finishing.

Reproduction Steps

Unarchive libsvm-stuck-training-issue.zip, which includes reproduction code and data. (The data has already been scaled.)
Copy svm.h and svm.cpp from LIBSVM v335 (or any older version up to and including v300) to the same directory.
clang++ -std=c++11 -o reproduce_issue reproduce_issue.cpp svm.cpp
./reproduce_issue

Preliminary Investigation

The issue is not reproducible after reverting 1c80a42, though I’m not sure why. Similarly, changing typedef float Qfloat to typedef double Qfloat causes the issue to go away. This could be a red herring.
After adding some logging to Solver::select_working_set, I can see that the training process eventually gets stuck on the working set {1525, 2023}, repeatedly selecting these two indices for i and j but with alternating order.

My hunch is that there is a bug in the working set selection or the stopping criteria. I will leave it to the experts to investigate further!

The text was updated successfully, but these errors were encountered:

cjlin1 · 2024-12-04T02:54:21Z

Your data is not a normal one as it has only 1 feature.
Using a high degree with a large regularization parameter easily
causes numerical difficulties.

I can reproduce your results:

$ make; ./svm-train -t 1 -e 0.01 -d 4 -c 100 -g 0.9597420397825849 -w0 1.04884106 -w1 0.95550528 ../test3/data1.txt
make: Nothing to be done for 'all'.
........................................................
WARNING: using -h 0 may be faster
*...............
WARNING: using -h 0 may be faster
*.................
WARNING: using -h 0 may be faster
*..........................................................................
WARNING: using -h 0 may be faster
*
WARNING: reaching max number of iterations

optimization finished, #iter = 10000000
obj = -244346.197253, rho = -1.203479
nSV = 2448, nBSV = 2440
Total nSV = 2448

However, if I change

typedef float Qfloat;

to

typedef double Qfloat;

then the problem disappears

$ make; ./svm-train -t 1 -e 0.01 -d 4 -c 100 -g 0.9597420397825849 -w0 1.04884106 -w1 0.95550528 ../test3/data1.txt
make: Nothing to be done for 'all'.
............................
WARNING: using -h 0 may be faster
*............................................
WARNING: using -h 0 may be faster
*.
WARNING: using -h 0 may be faster
*..................
WARNING: using -h 0 may be faster
*...........................
WARNING: using -h 0 may be faster
*.........
WARNING: using -h 0 may be faster
*............
WARNING: using -h 0 may be faster
*.............
WARNING: using -h 0 may be faster
*..............
WARNING: using -h 0 may be faster
*.....
WARNING: using -h 0 may be faster
*.................
WARNING: using -h 0 may be faster
*...........
WARNING: using -h 0 may be faster
*
optimization finished, #iter = 196403
obj = -245104.184532, rho = -1.003508
nSV = 2455, nBSV = 2448
Total nSV = 2455

If you diff 2.91 and 3.0, the main change is

$ diff ./svm.cpp ../libsvm-3.0/svm.cpp
195c195
< virtual Qfloat *get_QD() const = 0;

virtual double *get_QD() const = 0;
208c208
< virtual Qfloat *get_QD() const = 0;

virtual double *get_QD() const = 0;
415c415
< const Qfloat *QD;

const double QD;
601c601
< double quad_coef = Q_i[i]+Q_j[j]+2Q_i[j];

  	double quad_coef = QD[i]+QD[j]+2*Q_i[j];

644c644
< double quad_coef = Q_i[i]+Q_j[j]-2*Q_i[j];

  	double quad_coef = QD[i]+QD[j]-2*Q_i[j];

822c822
< double quad_coef=Q_i[i]+QD[j]-2.0y[i]Q_i[j];

  			double quad_coef = QD[i]+QD[j]-2.0*y[i]*Q_i[j];

846c846
< double quad_coef=Q_i[i]+QD[j]+2.0y[i]Q_i[j];

  			double quad_coef = QD[i]+QD[j]+2.0*y[i]*Q_i[j];

1074c1074
< double quad_coef = Q_ip[ip]+QD[j]-2*Q_ip[j];

  			double quad_coef = QD[ip]+QD[j]-2*Q_ip[j];

1098c1098
< double quad_coef = Q_in[in]+QD[j]-2*Q_in[j];

  			double quad_coef = QD[in]+QD[j]-2*Q_in[j];

1259c1259
< QD = new Qfloat[prob.l];

  QD = new double[prob.l];

1261c1261
< QD[i]= (Qfloat)(this->*kernel_function)(i,i);

  	QD[i] = (this->*kernel_function)(i,i);

1276c1276
< Qfloat *get_QD() const

double *get_QD() const
1298c1298
< Qfloat *QD;

double *QD;
1308c1308
< QD = new Qfloat[prob.l];

  QD = new double[prob.l];

1310c1310
< QD[i]= (Qfloat)(this->*kernel_function)(i,i);

  	QD[i] = (this->*kernel_function)(i,i);

1325c1325
< Qfloat *get_QD() const

double *get_QD() const
1344c1344
< Qfloat *QD;

double QD;
1355c1355
< QD = new Qfloat[2l];

  QD = new double[2*l];

1364,1365c1364,1365
< QD[k]= (Qfloat)(this->*kernel_function)(k,k);
< QD[k+l]=QD[k];

  	QD[k] = (this->*kernel_function)(k,k);
  	QD[k+l] = QD[k];

1398c1398
< Qfloat *get_QD() const

double *get_QD() const
1419c1419
< Qfloat *QD;

double *QD;
1439c1439

I believe for the case you tried, certain numerical issues occur.
According to our log, in 2.91 we did the change

QD is double now. All Q_i[i] are replaced by QD[i]
to avoid numerical problems.

It aims to solve some numerical problems.

In summary, numerical issues do occur in libsvm in different circumstances.
Yours happens to be one where the old setting works. But we did that
change because of some situations where the old one does not work.

fumoboy007 · 2024-12-04T21:17:46Z

Hi @cjlin1, thanks for taking a look!

Is there a way that LIBSVM can detect numerical difficulties and respond accordingly? I mentioned in #225 (comment) that the working set selection keeps choosing the same two indices without making progress. This seems like a situation that can be detected and mitigated.

Your data is not a normal one as it has only 1 feature.

Using a high degree with a large regularization parameter easily causes numerical difficulties.

I can understand that the data I provided may not seem normal but perhaps it will help if I give some context.

I encountered this issue while developing a sort of “auto ML” platform that automatically selects features and tunes hyperparameters. The feature selection works by starting with one “best” feature, selecting the next best feature, and so on. Hyperparameter tuning is also happening during this process. I ran into the issue during the start of the feature selection process when it was training with only one feature.

If I were doing all of this manually, then yes, maybe it would be reasonable for me to just work around convergence issues. But for an automated platform, I want the underlying learning algorithm implementation (LIBSVM in this case) to be robust enough that it can handle every case, including numerical edge cases.

I hope that makes sense!

cjlin1 · 2024-12-07T03:59:30Z

We have a maximal number of iterations to handle such situations.
Checking such numerical errors first makes the code more complicated
and second slow down the program in the majority of the situations.
Indeed we did not even have a max number of iterations before, and
it was added because of this.
The numerical issues are very rare, so we do not think extra checks are needed.

fumoboy007 · 2024-12-10T21:25:15Z

We have a maximal number of iterations to handle such situations.

Yup, I saw that. However, it’s not great because the training process takes forever to complete. In my case, a regular training process takes less than one second to complete while this problematic one takes hours (I haven’t checked exactly how long).

The numerical issues are very rare, so we do not think extra checks are needed.

Perhaps it’s too early to say whether an extra check is needed? I am trying to investigate the issue more deeply first. Admittedly, it’s difficult for me because I am not an expert here. Would you mind if I ask you some questions about the implementation and my findings?

cjlin1 · 2024-12-12T12:55:41Z

Yes, you are welcome to ask me questions

…

On 2024-12-10 13:25, fumoboy007 wrote: > We have a maximal number of iterations to handle such situations. Yup, I saw that. However, it’s not great because the training process takes forever to complete. In my case, a regular training process takes less than one second to complete while this problematic one takes hours (I haven’t checked exactly how long). > The numerical issues are very rare, so we do not think extra checks > are needed. Perhaps it’s too early to say whether an extra check is needed? I am trying to investigate the issue more deeply first. Admittedly, it’s difficult for me because I am not an expert here. Would you mind if I ask you some questions about the implementation and my findings? -- Reply to this email directly, view it on GitHub [1], or unsubscribe [2]. You are receiving this because you were mentioned.Message ID: ***@***.***> [ { ***@***.***": "http://schema.org", ***@***.***": "EmailMessage", "potentialAction": { ***@***.***": "ViewAction", "target": "#225 (comment)", "url": "#225 (comment)", "name": "View Issue" }, "description": "View this Issue on GitHub", "publisher": { ***@***.***": "Organization", "name": "GitHub", "url": "https://github.com" } } ] Links: ------ [1] #225 (comment) [2] https://github.com/notifications/unsubscribe-auth/ABI3BHU7ZCO6L2K3EPCYU7T2E5L5FAVCNFSM6AAAAABS67EY26VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDKMZSHEZDGOBVHE

fumoboy007 · 2024-12-13T06:26:31Z

Thanks! Here are my findings so far. I added some extra logging and focused on the iterations where the training process starts oscillating between i=1525, j=2023 and i=2023, j=1525.

Findings

Iteration 111360

Select i=1525 and j=2023 with obj_diff_min=-0.01164691862471009.
Calculate quad_coef=0.00000136871642553 and delta=169.13929523467024296.
Update alpha and G as follows:

Variable	Previous Value	New Value
`alpha[1525]`	`95.55052799999999991` (upper bound)	`34.06510009316370713`
`alpha[2023]`	`34.06510009316369292`	`95.55052799999999991` (upper bound)
`G[1525]`	`1.28270253761288888`	`3.23039156859958565`
`G[2023]`	`1.28247103388129835`	`3.23109825804289930`

Iteration 111361

Select i=2023 and j=1525 with obj_diff_min=-0.04086391544522031.
Calculate quad_coef=0.00000136871642553 and delta=516.31545448807764842.
Update alpha and G as follows:

Variable	Previous Value	New Value
`alpha[1525]`	`34.06510009316370713`	`95.55052799999999991` (upper bound)
`alpha[2023]`	`95.55052799999999991` (upper bound)	`34.06510009316365029`
`G[1525]`	`3.23039156859958565`	`1.28270253760459374`
`G[2023]`	`3.23109825804289930`	`1.28247103387388273`

Subsequent Iterations

The values in subsequent iterations are similar to the above two iterations, though the last few digits of obj_diff_min, delta, alpha, and G change slightly.

Questions

In Iteration 111360, how did both G[1525] and G[2023] increase from 1 to 3? As far as I understand, the gradient for one or both i and j should get closer to the optimal value of 0 after one iteration.
I mentioned that in subsequent iterations, the last few digits of G change slightly. Those small changes are increases. Again, shouldn’t G get closer to 0, even if very slowly?
Does the upper bound clipping play any role in the oscillation?
Why exactly is this oscillation happening? If it is numerical instability, where is it occurring?

cjlin1 · 2024-12-15T01:22:46Z

At optimum, G may not be zero as SVM dual is a constrained
optimization problem. You can see the optimality condition shown
on page 3 of
https://www.csie.ntu.edu.tw/~cjlin/papers/quadworkset.pdf

In theory, things should converge, but apparently numerical
issues occur here. To precisely check what happens, the
best way is to compare with the version that works.
You mentioned that an earlier version stops without issues.
You can print out information in every iteration and do a
comparison.

fumoboy007 · 2024-12-19T00:57:29Z

At optimum, G may not be zero as SVM dual is a constrained optimization problem.

Ohhh I see, that’s why m(α) ≤ M(α) is the stopping condition. Makes sense, thanks!

Will investigate further~~

…llate. Fixes cjlin1#225. The optimization algorithm has three main calculations: 1. Select the working set `{i, j}` that minimizes the decrease in the objective function. 2. Change `alpha[i]` and `alpha[j]` to minimize the decrease in the objective function while respecting constraints. 3. Update the gradient of the objective function according to the changes to `alpha[i]` and `alpha[j]`. All three calculations make use of the matrix `Q`, which is represented by the `QMatrix` class. The `QMatrix` class has two main methods: - `get_Q`, which returns an array of values for a single column of the matrix; and - `get_QD`, which returns an array of diagonal values. `Q` values are of type `Qfloat` while `QD` values are of type `double`. `Qfloat` is currently defined as `float`, so there can be inconsistency in the diagonal values returned by `get_Q` and `get_QD`. For example, in cjlin1#225, one of the diagonal values is `181.05748749793070829` as `double` and `180.99411909539512067` as `float`. The first two calculations of the optimization algorithm access the diagonal values via `get_QD`. However, the third calculation accesses the diagonal values via `get_Q`. This inconsistency between the minimization calculations and the gradient update can cause the optimization algorithm to oscillate, as demonstrated by cjlin1#225. We change `get_Q` to return a new class called `QColumn` instead of a plain array of values. The `QColumn` class overloads the subscript operator, so accessing individual elements is the same as before. Internally though, the `QColumn` class will return the `QD` value when the diagonal element is accessed. This guarantees that all calculations are using the same values for the diagonal elements, eliminating the inconsistency. Alternatively, we could change `Qfloat` to be defined as `double`. This would also eliminate the inconsistency; however, it would reduce the cache capacity by half.

…llate. Fixes cjlin1#225. # Background The optimization algorithm has three main calculations: 1. Select the working set `{i, j}` that minimizes the decrease in the objective function. 2. Change `alpha[i]` and `alpha[j]` to minimize the decrease in the objective function while respecting constraints. 3. Update the gradient of the objective function according to the changes to `alpha[i]` and `alpha[j]`. All three calculations make use of the matrix `Q`, which is represented by the `QMatrix` class. The `QMatrix` class has two main methods: - `get_Q`, which returns an array of values for a single column of the matrix; and - `get_QD`, which returns an array of diagonal values. # Problem `Q` values are of type `Qfloat` while `QD` values are of type `double`. `Qfloat` is currently defined as `float`, so there can be inconsistency in the diagonal values returned by `get_Q` and `get_QD`. For example, in cjlin1#225, one of the diagonal values is `181.05748749793070829` as `double` and `180.99411909539512067` as `float`. The first two calculations of the optimization algorithm access the diagonal values via `get_QD`. However, the third calculation accesses the diagonal values via `get_Q`. This inconsistency between the minimization calculations and the gradient update can cause the optimization algorithm to oscillate, as demonstrated by cjlin1#225. # Solution We change `get_Q` to return a new class called `QColumn` instead of a plain array of values. The `QColumn` class overloads the subscript operator, so accessing individual elements is the same as before. Internally though, the `QColumn` class will return the `QD` value when the diagonal element is accessed. This guarantees that all calculations are using the same values for the diagonal elements, eliminating the inconsistency. # Alternatives Considered Alternatively, we could change `Qfloat` to be defined as `double`. This would also eliminate the inconsistency; however, it would reduce the cache capacity by half.

…llate. Fixes cjlin1#225. # Background The optimization algorithm has three main calculations: 1. Select the working set `{i, j}` that minimizes the decrease in the objective function. 2. Change `alpha[i]` and `alpha[j]` to minimize the decrease in the objective function while respecting constraints. 3. Update the gradient of the objective function according to the changes to `alpha[i]` and `alpha[j]`. All three calculations make use of the matrix `Q`, which is represented by the `QMatrix` class. The `QMatrix` class has two main methods: - `get_Q`, which returns an array of values for a single column of the matrix; and - `get_QD`, which returns an array of diagonal values. # Problem `Q` values are of type `Qfloat` while `QD` values are of type `double`. `Qfloat` is currently defined as `float`, so there can be inconsistency in the diagonal values returned by `get_Q` and `get_QD`. For example, in cjlin1#225, one of the diagonal values is `181.05748749793070829` as `double` and `180.99411909539512067` as `float`. The first two calculations of the optimization algorithm access the diagonal values via `get_QD`. However, the third calculation accesses the diagonal values via `get_Q`. This inconsistency between the minimization calculations and the gradient update can cause the optimization algorithm to oscillate, as demonstrated by cjlin1#225. # Solution We change `get_Q` to return a new class called `QColumn` instead of a plain array of values. The `QColumn` class overloads the subscript operator, so accessing individual elements is the same as before. Internally though, the `QColumn` class will return the `QD` value when the diagonal element is accessed. This guarantees that all calculations are using the same values for the diagonal elements, eliminating the inconsistency. # Alternatives Considered Alternatively, we could change `Qfloat` to be defined as `double`. This would also eliminate the inconsistency; however, it would reduce the cache capacity by half. # Future Changes The Java code will be updated similarly in a separate commit.

fumoboy007 · 2024-12-21T02:58:56Z

After a long and painful investigation, I have finally found the root cause of the issue! 🎉

I sent pull request #228 for your consideration. Below is the root cause analysis copied from the pull request description.

Background

The optimization algorithm has three main calculations:

Select the working set {i, j} that minimizes the decrease in the objective function.
Change alpha[i] and alpha[j] to minimize the decrease in the objective function while respecting constraints.
Update the gradient of the objective function according to the changes to alpha[i] and alpha[j].

All three calculations make use of the matrix Q, which is represented by the QMatrix class. The QMatrix class has two main methods:

get_Q, which returns an array of values for a single column of the matrix; and
get_QD, which returns an array of diagonal values.

Problem

Q values are of type Qfloat while QD values are of type double. Qfloat is currently defined as float, so there can be inconsistency in the diagonal values returned by get_Q and get_QD. For example, in #225, one of the diagonal values is 181.05748749793070829 as double and 180.99411909539512067 as float.

The first two calculations of the optimization algorithm access the diagonal values via get_QD. However, the third calculation accesses the diagonal values via get_Q. This inconsistency between the minimization calculations and the gradient update can cause the optimization algorithm to oscillate, as demonstrated by #225.

Solution

We change get_Q to return a new class called QColumn instead of a plain array of values. The QColumn class overloads the subscript operator, so accessing individual elements is the same as before. Internally though, the QColumn class will return the QD value when the diagonal element is accessed. This guarantees that all calculations are using the same values for the diagonal elements, eliminating the inconsistency.

Alternatives Considered

Alternatively, we could change Qfloat to be defined as double. This would also eliminate the inconsistency; however, it would reduce the cache capacity by half.

cjlin1 · 2024-12-24T10:05:32Z

Thanks for the PR. Your change may cause lots of if checks.
I worry the code may become slower in some situations (e.g.,
all kernel elements have been cached and working set
selection is the main task). Redo exps in the JMLR paper
for quadratic working set selection can be a huge task.

Why don't we modify the get_Q function?
After the loop

		for(j=start;j<len;j++)
			data[j] = (Qfloat)(y[i]*y[j]*(this->*kernel_function)(i,j));

we can have a simple assignment to use QD[i]

fumoboy007 · 2024-12-25T03:15:04Z

Why don't we modify the get_Q function? After the loop
		for(j=start;j<len;j++)
			data[j] = (Qfloat)(y[i]*y[j]*(this->*kernel_function)(i,j));
we can have a simple assignment to use QD[i]

Unfortunately, this does not change anything. data is an array of Qfloat, so assigning data[i] = QD[i] would cause a cast from double to Qfloat. Thus, the diagonal values according to get_Q would still be different than the diagonal values according to get_QD.

Your change may cause lots of if checks. I worry the code may become slower in some situations (e.g., all kernel elements have been cached and working set selection is the main task). Redo exps in the JMLR paper for quadratic working set selection can be a huge task.

Yup, the additional checks are unfortunate. Alternatively, we could change QD back to an array of Qfloat, which would also solve the inconsistency. However, I am not sure whether we would reintroduce the other issue that the original change 1c80a42 was trying to solve.

The other alternative I mentioned would be changing Qfloat to double. But not sure how you feel about reducing the cache capacity by half.

cjlin1 · 2024-12-25T11:35:39Z

Thanks and I see. My suggestion is to keep the current code. - The example you gave is a special one with only one feature. Poly kernel is less used and usually C doesn't need to be that large - For the most used RBF kernel, K(i,i) = 1, so this should be less a concern - I recall that the example causing us to introduce double QD is a more typical scenario than your example - People can try to change Qfloat to double if they face numerical issues. We have an faq for this

…

On 2024-12-25 11:15, fumoboy007 wrote: > Why don't we modify the get_Q function? After the loop > > for(j=start;j<len;j++) > data[j] = > (Qfloat)(y[i]*y[j]*(this->*kernel_function)(i,j)); > > we can have a simple assignment to use QD[i] Unfortunately, this does not change anything. data is an array of Qfloat, so assigning data[i] = QD[i] would cause a cast from double to Qfloat. > Your change may cause lots of if checks. I worry the code may become > slower in some situations (e.g., all kernel elements have been > cached and working set selection is the main task). Redo exps in the > JMLR paper for quadratic working set selection can be a huge task. Yup, the additional checks are unfortunate. Alternatively, we could change QD back to an array of Qfloat, which would also solve the inconsistency. However, I am not sure whether we would reintroduce the other issue that the original change 1c80a42 [1] was trying to solve. The other alternative I mentioned would be changing Qfloat to double. But not sure how you feel about reducing the cache capacity by half. -- Reply to this email directly, view it on GitHub [2], or unsubscribe [3]. You are receiving this because you were mentioned.Message ID: ***@***.***> [ { ***@***.***": "http://schema.org", ***@***.***": "EmailMessage", "potentialAction": { ***@***.***": "ViewAction", "target": "#225 (comment)", "url": "#225 (comment)", "name": "View Issue" }, "description": "View this Issue on GitHub", "publisher": { ***@***.***": "Organization", "name": "GitHub", "url": "https://github.com" } } ] Links: ------ [1] 1c80a42 [2] #225 (comment) [3] https://github.com/notifications/unsubscribe-auth/ABI3BHUJIXFZOABOYVZJVAT2HIPM5AVCNFSM6AAAAABS67EY26VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDKNRRGU3TIOBVG4

fumoboy007 · 2024-12-26T19:57:20Z

The example you gave is a special one with only one feature. Poly kernel is less used and usually C doesn't need to be that large

Hmm doesn’t it also affect the linear kernel? Also, what is the effect of C on this issue?

I recall that the example causing us to introduce double QD is a more typical scenario than your example

Is the past issue documented somewhere? I’d like to learn about it.

The problem with my issue is that it was really hard to understand and debug. None of the values were hitting numerical limits, so it wasn’t obvious that it was caused by something related to numerical imprecision. I would classify this issue more as an unexpected bug than a numerical issue: one calculation was using one number but another calculation was using a different number. That seems like something we should prioritize over numerical issues, which are more expected.

People can try to change Qfloat to double if they face numerical issues. We have an faq for this

Well, changing the code and recompiling involves a lot of friction, especially for users who are using a prebuilt executable or library. But that thought inspired me to think of another solution:

Change the type of QD values back to Qfloat to guarantee consistency between calculations. This avoids unexpected issues like this one.
Allow Qfloat to be changed at runtime. This allows users who face numerical issues to easily try a different precision level.

Both changes should be straightforward and won’t have obvious downsides like the other proposed solutions. I can send a pull request if you’re interested in this approach.

…llate. Fixes cjlin1#225. # Background The optimization algorithm has three main calculations: 1. Select the working set `{i, j}` that [minimizes](https://github.com/cjlin1/libsvm/blob/35e55962f7f03ce425bada0e6b9db79193e947f8/svm.cpp#L829-L879) the decrease in the objective function. 2. Change `alpha[i]` and `alpha[j]` to [minimize](https://github.com/cjlin1/libsvm/blob/35e55962f7f03ce425bada0e6b9db79193e947f8/svm.cpp#L606-L691) the decrease in the objective function while respecting constraints. 3. [Update](https://github.com/cjlin1/libsvm/blob/35e55962f7f03ce425bada0e6b9db79193e947f8/svm.cpp#L698-L701) the gradient of the objective function according to the changes to `alpha[i]` and `alpha[j]`. All three calculations make use of the matrix `Q`, which is represented by the `QMatrix` [class](https://github.com/cjlin1/libsvm/blob/35e55962f7f03ce425bada0e6b9db79193e947f8/svm.cpp#L198). The `QMatrix` class has two main methods: - `get_Q`, which returns an array of values for a single column of the matrix; and - `get_QD`, which returns an array of diagonal values. # Problem `Q` values are of type `Qfloat` while `QD` values are of type `double`. `Qfloat` is currently [defined](https://github.com/cjlin1/libsvm/blob/35e55962f7f03ce425bada0e6b9db79193e947f8/svm.cpp#L16) as `float`, so there can be inconsistency in the diagonal values returned by `get_Q` and `get_QD`. For example, in cjlin1#225, one of the diagonal values is `181.05748749793070829` as `double` and `180.99411909539512067` as `float`. The first two calculations of the optimization algorithm access the diagonal values via `get_QD`. However, the third calculation accesses the diagonal values via `get_Q`. This inconsistency between the minimization calculations and the gradient update can cause the optimization algorithm to oscillate, as demonstrated by cjlin1#225. # Solution We change the type of `QD` values from `double` to `Qfloat`. This guarantees that all calculations are using the same values for the diagonal elements, eliminating the inconsistency. Note that this reverts the past commit 1c80a42. That commit changed the type of `QD` values from `Qfloat` to `double` to address a numerical issue. In a follow-up commit, we will allow `Qfloat` to be defined as `double` at runtime as a more general fix for numerical issues. # Future Changes The Java code will be updated similarly in a separate commit.

fumoboy007 · 2024-12-27T02:09:50Z

Both changes should be straightforward and won’t have obvious downsides like the other proposed solutions. I can send a pull request if you’re interested in this approach.

See #229.

fumoboy007 mentioned this issue Dec 4, 2024

Hang when fitting SVC to a specific dataset scikit-learn/scikit-learn#30353

Open

cjlin1 closed this as completed Dec 4, 2024

fumoboy007 mentioned this issue Dec 21, 2024

Fix inconsistency that could cause the optimization algorithm to oscillate #228

Open

fumoboy007 mentioned this issue Dec 27, 2024

More robust fixes for numerical issues #229

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Training gets stuck on a specific dataset #225

Training gets stuck on a specific dataset #225

fumoboy007 commented Dec 4, 2024 •

edited

Loading

cjlin1 commented Dec 4, 2024

fumoboy007 commented Dec 4, 2024 •

edited

Loading

cjlin1 commented Dec 7, 2024

fumoboy007 commented Dec 10, 2024

cjlin1 commented Dec 12, 2024 via email

fumoboy007 commented Dec 13, 2024 •

edited

Loading

cjlin1 commented Dec 15, 2024

fumoboy007 commented Dec 19, 2024

fumoboy007 commented Dec 21, 2024

cjlin1 commented Dec 24, 2024

fumoboy007 commented Dec 25, 2024 •

edited

Loading

cjlin1 commented Dec 25, 2024 via email

fumoboy007 commented Dec 26, 2024

fumoboy007 commented Dec 27, 2024

Training gets stuck on a specific dataset #225

Training gets stuck on a specific dataset #225

Comments

fumoboy007 commented Dec 4, 2024 • edited Loading

Reproduction Steps

Preliminary Investigation

cjlin1 commented Dec 4, 2024

$ diff ./svm.cpp ../libsvm-3.0/svm.cpp 195c195 < virtual Qfloat *get_QD() const = 0;

644c644 < double quad_coef = Q_i[i]+Q_j[j]-2*Q_i[j];

822c822 < double quad_coef=Q_i[i]+QD[j]-2.0*y[i]*Q_i[j];

846c846 < double quad_coef=Q_i[i]+QD[j]+2.0*y[i]*Q_i[j];

1074c1074 < double quad_coef = Q_ip[ip]+QD[j]-2*Q_ip[j];

1098c1098 < double quad_coef = Q_in[in]+QD[j]-2*Q_in[j];

1259c1259 < QD = new Qfloat[prob.l];

1261c1261 < QD[i]= (Qfloat)(this->*kernel_function)(i,i);

1276c1276 < Qfloat *get_QD() const

1310c1310 < QD[i]= (Qfloat)(this->*kernel_function)(i,i);

1325c1325 < Qfloat *get_QD() const

1364,1365c1364,1365 < QD[k]= (Qfloat)(this->*kernel_function)(k,k); < QD[k+l]=QD[k];

1398c1398 < Qfloat *get_QD() const

fumoboy007 commented Dec 4, 2024 • edited Loading

cjlin1 commented Dec 7, 2024

fumoboy007 commented Dec 10, 2024

cjlin1 commented Dec 12, 2024 via email

fumoboy007 commented Dec 13, 2024 • edited Loading

Findings

Iteration 111360

Iteration 111361

Subsequent Iterations

Questions

cjlin1 commented Dec 15, 2024

fumoboy007 commented Dec 19, 2024

fumoboy007 commented Dec 21, 2024

Background

Problem

Solution

Alternatives Considered

cjlin1 commented Dec 24, 2024

fumoboy007 commented Dec 25, 2024 • edited Loading

cjlin1 commented Dec 25, 2024 via email

fumoboy007 commented Dec 26, 2024

fumoboy007 commented Dec 27, 2024

fumoboy007 commented Dec 4, 2024 •

edited

Loading

$ diff ./svm.cpp ../libsvm-3.0/svm.cpp
195c195
< virtual Qfloat *get_QD() const = 0;

644c644
< double quad_coef = Q_i[i]+Q_j[j]-2*Q_i[j];

822c822
< double quad_coef=Q_i[i]+QD[j]-2.0y[i]Q_i[j];

846c846
< double quad_coef=Q_i[i]+QD[j]+2.0y[i]Q_i[j];

1074c1074
< double quad_coef = Q_ip[ip]+QD[j]-2*Q_ip[j];

1098c1098
< double quad_coef = Q_in[in]+QD[j]-2*Q_in[j];

1259c1259
< QD = new Qfloat[prob.l];

1261c1261
< QD[i]= (Qfloat)(this->*kernel_function)(i,i);

1276c1276
< Qfloat *get_QD() const

1310c1310
< QD[i]= (Qfloat)(this->*kernel_function)(i,i);

1325c1325
< Qfloat *get_QD() const

1364,1365c1364,1365
< QD[k]= (Qfloat)(this->*kernel_function)(k,k);
< QD[k+l]=QD[k];

1398c1398
< Qfloat *get_QD() const

fumoboy007 commented Dec 4, 2024 •

edited

Loading

fumoboy007 commented Dec 13, 2024 •

edited

Loading

fumoboy007 commented Dec 25, 2024 •

edited

Loading