-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Training gets stuck on a specific dataset #225
Comments
Your data is not a normal one as it has only 1 feature. I can reproduce your results: $ make; ./svm-train -t 1 -e 0.01 -d 4 -c 100 -g 0.9597420397825849 -w0 1.04884106 -w1 0.95550528 ../test3/data1.txt optimization finished, #iter = 10000000 However, if I change typedef float Qfloat; to typedef double Qfloat; then the problem disappears $ make; ./svm-train -t 1 -e 0.01 -d 4 -c 100 -g 0.9597420397825849 -w0 1.04884106 -w1 0.95550528 ../test3/data1.txt If you diff 2.91 and 3.0, the main change is $ diff ./svm.cpp ../libsvm-3.0/svm.cpp
|
Hi @cjlin1, thanks for taking a look! Is there a way that LIBSVM can detect numerical difficulties and respond accordingly? I mentioned in #225 (comment) that the working set selection keeps choosing the same two indices without making progress. This seems like a situation that can be detected and mitigated.
I can understand that the data I provided may not seem normal but perhaps it will help if I give some context. I encountered this issue while developing a sort of “auto ML” platform that automatically selects features and tunes hyperparameters. The feature selection works by starting with one “best” feature, selecting the next best feature, and so on. Hyperparameter tuning is also happening during this process. I ran into the issue during the start of the feature selection process when it was training with only one feature. If I were doing all of this manually, then yes, maybe it would be reasonable for me to just work around convergence issues. But for an automated platform, I want the underlying learning algorithm implementation (LIBSVM in this case) to be robust enough that it can handle every case, including numerical edge cases. I hope that makes sense! |
We have a maximal number of iterations to handle such situations. |
Yup, I saw that. However, it’s not great because the training process takes forever to complete. In my case, a regular training process takes less than one second to complete while this problematic one takes hours (I haven’t checked exactly how long).
Perhaps it’s too early to say whether an extra check is needed? I am trying to investigate the issue more deeply first. Admittedly, it’s difficult for me because I am not an expert here. Would you mind if I ask you some questions about the implementation and my findings? |
Yes, you are welcome to ask me questions
…On 2024-12-10 13:25, fumoboy007 wrote:
> We have a maximal number of iterations to handle such situations.
Yup, I saw that. However, it’s not great because the training
process takes forever to complete. In my case, a regular training
process takes less than one second to complete while this problematic
one takes hours (I haven’t checked exactly how long).
> The numerical issues are very rare, so we do not think extra checks
> are needed.
Perhaps it’s too early to say whether an extra check is needed? I am
trying to investigate the issue more deeply first. Admittedly, it’s
difficult for me because I am not an expert here. Would you mind if I
ask you some questions about the implementation and my findings?
--
Reply to this email directly, view it on GitHub [1], or unsubscribe
[2].
You are receiving this because you were mentioned.Message ID:
***@***.***> [ { ***@***.***":
"http://schema.org", ***@***.***": "EmailMessage", "potentialAction": {
***@***.***": "ViewAction", "target":
"#225 (comment)",
"url":
"#225 (comment)",
"name": "View Issue" }, "description": "View this Issue on GitHub",
"publisher": { ***@***.***": "Organization", "name": "GitHub", "url":
"https://github.com" } } ]
Links:
------
[1] #225 (comment)
[2]
https://github.com/notifications/unsubscribe-auth/ABI3BHU7ZCO6L2K3EPCYU7T2E5L5FAVCNFSM6AAAAABS67EY26VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDKMZSHEZDGOBVHE
|
Thanks! Here are my findings so far. I added some extra logging and focused on the iterations where the training process starts oscillating between FindingsIteration 111360
Iteration 111361
Subsequent IterationsThe values in subsequent iterations are similar to the above two iterations, though the last few digits of Questions
|
At optimum, G may not be zero as SVM dual is a constrained In theory, things should converge, but apparently numerical |
Ohhh I see, that’s why Will investigate further~~ |
…llate. Fixes cjlin1#225. The optimization algorithm has three main calculations: 1. Select the working set `{i, j}` that minimizes the decrease in the objective function. 2. Change `alpha[i]` and `alpha[j]` to minimize the decrease in the objective function while respecting constraints. 3. Update the gradient of the objective function according to the changes to `alpha[i]` and `alpha[j]`. All three calculations make use of the matrix `Q`, which is represented by the `QMatrix` class. The `QMatrix` class has two main methods: - `get_Q`, which returns an array of values for a single column of the matrix; and - `get_QD`, which returns an array of diagonal values. `Q` values are of type `Qfloat` while `QD` values are of type `double`. `Qfloat` is currently defined as `float`, so there can be inconsistency in the diagonal values returned by `get_Q` and `get_QD`. For example, in cjlin1#225, one of the diagonal values is `181.05748749793070829` as `double` and `180.99411909539512067` as `float`. The first two calculations of the optimization algorithm access the diagonal values via `get_QD`. However, the third calculation accesses the diagonal values via `get_Q`. This inconsistency between the minimization calculations and the gradient update can cause the optimization algorithm to oscillate, as demonstrated by cjlin1#225. We change `get_Q` to return a new class called `QColumn` instead of a plain array of values. The `QColumn` class overloads the subscript operator, so accessing individual elements is the same as before. Internally though, the `QColumn` class will return the `QD` value when the diagonal element is accessed. This guarantees that all calculations are using the same values for the diagonal elements, eliminating the inconsistency. Alternatively, we could change `Qfloat` to be defined as `double`. This would also eliminate the inconsistency; however, it would reduce the cache capacity by half.
…llate. Fixes cjlin1#225. # Background The optimization algorithm has three main calculations: 1. Select the working set `{i, j}` that minimizes the decrease in the objective function. 2. Change `alpha[i]` and `alpha[j]` to minimize the decrease in the objective function while respecting constraints. 3. Update the gradient of the objective function according to the changes to `alpha[i]` and `alpha[j]`. All three calculations make use of the matrix `Q`, which is represented by the `QMatrix` class. The `QMatrix` class has two main methods: - `get_Q`, which returns an array of values for a single column of the matrix; and - `get_QD`, which returns an array of diagonal values. # Problem `Q` values are of type `Qfloat` while `QD` values are of type `double`. `Qfloat` is currently defined as `float`, so there can be inconsistency in the diagonal values returned by `get_Q` and `get_QD`. For example, in cjlin1#225, one of the diagonal values is `181.05748749793070829` as `double` and `180.99411909539512067` as `float`. The first two calculations of the optimization algorithm access the diagonal values via `get_QD`. However, the third calculation accesses the diagonal values via `get_Q`. This inconsistency between the minimization calculations and the gradient update can cause the optimization algorithm to oscillate, as demonstrated by cjlin1#225. # Solution We change `get_Q` to return a new class called `QColumn` instead of a plain array of values. The `QColumn` class overloads the subscript operator, so accessing individual elements is the same as before. Internally though, the `QColumn` class will return the `QD` value when the diagonal element is accessed. This guarantees that all calculations are using the same values for the diagonal elements, eliminating the inconsistency. # Alternatives Considered Alternatively, we could change `Qfloat` to be defined as `double`. This would also eliminate the inconsistency; however, it would reduce the cache capacity by half.
…llate. Fixes cjlin1#225. # Background The optimization algorithm has three main calculations: 1. Select the working set `{i, j}` that minimizes the decrease in the objective function. 2. Change `alpha[i]` and `alpha[j]` to minimize the decrease in the objective function while respecting constraints. 3. Update the gradient of the objective function according to the changes to `alpha[i]` and `alpha[j]`. All three calculations make use of the matrix `Q`, which is represented by the `QMatrix` class. The `QMatrix` class has two main methods: - `get_Q`, which returns an array of values for a single column of the matrix; and - `get_QD`, which returns an array of diagonal values. # Problem `Q` values are of type `Qfloat` while `QD` values are of type `double`. `Qfloat` is currently defined as `float`, so there can be inconsistency in the diagonal values returned by `get_Q` and `get_QD`. For example, in cjlin1#225, one of the diagonal values is `181.05748749793070829` as `double` and `180.99411909539512067` as `float`. The first two calculations of the optimization algorithm access the diagonal values via `get_QD`. However, the third calculation accesses the diagonal values via `get_Q`. This inconsistency between the minimization calculations and the gradient update can cause the optimization algorithm to oscillate, as demonstrated by cjlin1#225. # Solution We change `get_Q` to return a new class called `QColumn` instead of a plain array of values. The `QColumn` class overloads the subscript operator, so accessing individual elements is the same as before. Internally though, the `QColumn` class will return the `QD` value when the diagonal element is accessed. This guarantees that all calculations are using the same values for the diagonal elements, eliminating the inconsistency. # Alternatives Considered Alternatively, we could change `Qfloat` to be defined as `double`. This would also eliminate the inconsistency; however, it would reduce the cache capacity by half. # Future Changes The Java code will be updated similarly in a separate commit.
After a long and painful investigation, I have finally found the root cause of the issue! 🎉 I sent pull request #228 for your consideration. Below is the root cause analysis copied from the pull request description. BackgroundThe optimization algorithm has three main calculations:
All three calculations make use of the matrix
Problem
The first two calculations of the optimization algorithm access the diagonal values via SolutionWe change Alternatives ConsideredAlternatively, we could change |
Thanks for the PR. Your change may cause lots of if checks. Why don't we modify the get_Q function?
we can have a simple assignment to use QD[i] |
Unfortunately, this does not change anything.
Yup, the additional checks are unfortunate. Alternatively, we could change The other alternative I mentioned would be changing |
Thanks and I see.
My suggestion is to keep the current code.
- The example you gave is a special one with only one feature.
Poly kernel is less used and usually C doesn't need to be that
large
- For the most used RBF kernel, K(i,i) = 1, so this should be less
a concern
- I recall that the example causing us to introduce double QD is
a more typical scenario than your example
- People can try to change Qfloat to double if they face numerical
issues. We have an faq for this
…On 2024-12-25 11:15, fumoboy007 wrote:
> Why don't we modify the get_Q function? After the loop
>
> for(j=start;j<len;j++)
> data[j] =
> (Qfloat)(y[i]*y[j]*(this->*kernel_function)(i,j));
>
> we can have a simple assignment to use QD[i]
Unfortunately, this does not change anything. data is an array of
Qfloat, so assigning data[i] = QD[i] would cause a cast from double to
Qfloat.
> Your change may cause lots of if checks. I worry the code may become
> slower in some situations (e.g., all kernel elements have been
> cached and working set selection is the main task). Redo exps in the
> JMLR paper for quadratic working set selection can be a huge task.
Yup, the additional checks are unfortunate. Alternatively, we could
change QD back to an array of Qfloat, which would also solve the
inconsistency. However, I am not sure whether we would reintroduce the
other issue that the original change 1c80a42 [1] was trying to solve.
The other alternative I mentioned would be changing Qfloat to double.
But not sure how you feel about reducing the cache capacity by half.
--
Reply to this email directly, view it on GitHub [2], or unsubscribe
[3].
You are receiving this because you were mentioned.Message ID:
***@***.***> [ { ***@***.***":
"http://schema.org", ***@***.***": "EmailMessage", "potentialAction": {
***@***.***": "ViewAction", "target":
"#225 (comment)",
"url":
"#225 (comment)",
"name": "View Issue" }, "description": "View this Issue on GitHub",
"publisher": { ***@***.***": "Organization", "name": "GitHub", "url":
"https://github.com" } } ]
Links:
------
[1]
1c80a42
[2] #225 (comment)
[3]
https://github.com/notifications/unsubscribe-auth/ABI3BHUJIXFZOABOYVZJVAT2HIPM5AVCNFSM6AAAAABS67EY26VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDKNRRGU3TIOBVG4
|
Hmm doesn’t it also affect the linear kernel? Also, what is the effect of
Is the past issue documented somewhere? I’d like to learn about it. The problem with my issue is that it was really hard to understand and debug. None of the values were hitting numerical limits, so it wasn’t obvious that it was caused by something related to numerical imprecision. I would classify this issue more as an unexpected bug than a numerical issue: one calculation was using one number but another calculation was using a different number. That seems like something we should prioritize over numerical issues, which are more expected.
Well, changing the code and recompiling involves a lot of friction, especially for users who are using a prebuilt executable or library. But that thought inspired me to think of another solution:
Both changes should be straightforward and won’t have obvious downsides like the other proposed solutions. I can send a pull request if you’re interested in this approach. |
…llate. Fixes cjlin1#225. # Background The optimization algorithm has three main calculations: 1. Select the working set `{i, j}` that [minimizes](https://github.com/cjlin1/libsvm/blob/35e55962f7f03ce425bada0e6b9db79193e947f8/svm.cpp#L829-L879) the decrease in the objective function. 2. Change `alpha[i]` and `alpha[j]` to [minimize](https://github.com/cjlin1/libsvm/blob/35e55962f7f03ce425bada0e6b9db79193e947f8/svm.cpp#L606-L691) the decrease in the objective function while respecting constraints. 3. [Update](https://github.com/cjlin1/libsvm/blob/35e55962f7f03ce425bada0e6b9db79193e947f8/svm.cpp#L698-L701) the gradient of the objective function according to the changes to `alpha[i]` and `alpha[j]`. All three calculations make use of the matrix `Q`, which is represented by the `QMatrix` [class](https://github.com/cjlin1/libsvm/blob/35e55962f7f03ce425bada0e6b9db79193e947f8/svm.cpp#L198). The `QMatrix` class has two main methods: - `get_Q`, which returns an array of values for a single column of the matrix; and - `get_QD`, which returns an array of diagonal values. # Problem `Q` values are of type `Qfloat` while `QD` values are of type `double`. `Qfloat` is currently [defined](https://github.com/cjlin1/libsvm/blob/35e55962f7f03ce425bada0e6b9db79193e947f8/svm.cpp#L16) as `float`, so there can be inconsistency in the diagonal values returned by `get_Q` and `get_QD`. For example, in cjlin1#225, one of the diagonal values is `181.05748749793070829` as `double` and `180.99411909539512067` as `float`. The first two calculations of the optimization algorithm access the diagonal values via `get_QD`. However, the third calculation accesses the diagonal values via `get_Q`. This inconsistency between the minimization calculations and the gradient update can cause the optimization algorithm to oscillate, as demonstrated by cjlin1#225. # Solution We change the type of `QD` values from `double` to `Qfloat`. This guarantees that all calculations are using the same values for the diagonal elements, eliminating the inconsistency. Note that this reverts the past commit 1c80a42. That commit changed the type of `QD` values from `Qfloat` to `double` to address a numerical issue. In a follow-up commit, we will allow `Qfloat` to be defined as `double` at runtime as a more general fix for numerical issues. # Future Changes The Java code will be updated similarly in a separate commit.
See #229. |
I am trying to train a C-SVC on a specific dataset. The training process gets stuck, never finishing.
Reproduction Steps
svm.h
andsvm.cpp
from LIBSVM v335 (or any older version up to and including v300) to the same directory.clang++ -std=c++11 -o reproduce_issue reproduce_issue.cpp svm.cpp
./reproduce_issue
Preliminary Investigation
typedef float Qfloat
totypedef double Qfloat
causes the issue to go away. This could be a red herring.Solver::select_working_set
, I can see that the training process eventually gets stuck on the working set{1525, 2023}
, repeatedly selecting these two indices fori
andj
but with alternating order.My hunch is that there is a bug in the working set selection or the stopping criteria. I will leave it to the experts to investigate further!
The text was updated successfully, but these errors were encountered: