-
-
Notifications
You must be signed in to change notification settings - Fork 85
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
More-Thuente Linesearch Algorithm Bug: "Search direction must be a descent direction" #400
Comments
Here's a snippet of the logs that I am getting from running an optimization problem with the following parameters: // Backtracking is a more stable line search, but slow
// let linesearch = BacktrackingLineSearch::new(ArmijoCondition::new(1e-3_f64)?);
// More Thuente is efficient and converges quickly, but hits a condition where
// the search direction is not a descent direction and throws an error often.
let linesearch = MoreThuenteLineSearch::new().with_bounds(1e-10, 0.01)?;
// Hager Zhang is chaotic and doesn't make a lot of sense... I haven't gotten
// it to converge nicely
// let linesearch = HagerZhangLineSearch::new().with_bounds(f64::EPSILON, 0.1)?;
// let solver = SteepestDescent::new(linesearch);
let solver = LBFGS::new(linesearch, 11);
As a bit more explanation, I have found that increasing the bounds of the I also have another problem that I run the same optimization setup on, but it converges and produces the following logs. I don't see a difference in the size of the gradient vector, but the sign does flip.
|
I investigated this and I was able to reproduce the problem with L-BFGS, MoreThuenteLineSearch and the multidimensional Rosenbrock function. First I thought the problem was a untrustworthy gradient, and to some extent this seems to be true. Once the gradient gets small and numerical noise starts to dominate, the gradient points in random directions which causes this problem. By actually computing the gradient instead of using finite differences, the code does not panic anymore, but still does not converge to the correct solution. The actual problem seems to be a too small minimum step size of MoreThuenteLineSearch ( use argmin::{
core::{observers::ObserverMode, CostFunction, Error, Executor, Gradient},
solver::{linesearch::MoreThuenteLineSearch, quasinewton::LBFGS},
};
use argmin_observer_slog::SlogLogger;
use argmin_testfunctions::{rosenbrock, rosenbrock_derivative};
use finitediff::FiniteDiff;
use ndarray::{array, Array1};
struct Rosenbrock {
a: f64,
b: f64,
finitediff: bool,
}
impl CostFunction for Rosenbrock {
type Param = Array1<f64>;
type Output = f64;
fn cost(&self, p: &Self::Param) -> Result<Self::Output, Error> {
Ok(rosenbrock(&p.to_vec(), self.a, self.b))
}
}
impl Gradient for Rosenbrock {
type Param = Array1<f64>;
type Gradient = Array1<f64>;
fn gradient(&self, p: &Self::Param) -> Result<Self::Gradient, Error> {
if self.finitediff {
Ok((*p).forward_diff(&|x| rosenbrock(&x.to_vec(), self.a, self.b)))
} else {
Ok(Array1::from(rosenbrock_derivative(
&p.to_vec(),
self.a,
self.b,
)))
}
}
}
fn run() -> Result<(), Error> {
let case = 1;
let (finite_diff, step_min, init_param) = match case {
// This case uses finitediff to compute the gradient and sets the lower bound in
// MoreThuenteLineSearch to the default value sqrt(EPS).
// In this case, the individual elements of the gradient are around ~ 1e-7 and the
// run panics with "Search direction must be a descent direction".
// My guess is that numerical errors due to using finitediff cause the gradient to point
// in random directions.
1 => (
// finite_diff
true,
// step_min
std::f64::EPSILON.sqrt(),
// init_param
array![-1.2, 1.0, -10.0, 2.0, 3.0, 2.0, 4.0, 9.0],
),
// This case uses the numerically computed gradient of the Rosenbrock function instead
// of finitediff. No more panics, but it still fails to converge to the result. In
// particular the first dimension "gets stuck".
// Gradient elements are agin around 1e-7, but here it doesn't cause panics, probably
// because the numerical errors don't dominate.
2 => (
// finite_diff off, direct computation of gradient
false,
// step_min
std::f64::EPSILON.sqrt(),
// init_param
array![-1.2, 1.0, -10.0, 2.0, 3.0, 2.0, 4.0, 9.0],
),
// Funnily, when we add another dimension to the problem, and changing the second to last
// value of the initial parameter from 9 to 10, it does converge.
// This is just a side note and probably not really relevant.
3 => (
// finite_diff off, direct computation of gradient
false,
// step_min
std::f64::EPSILON.sqrt(),
// init_param
array![-1.2, 1.0, -10.0, 2.0, 3.0, 2.0, 4.0, 10.0, 4.0],
),
// Building upon test case _2_, by increasing the minimum step size of MoreThuenteLineSearch
// it does converge again.
4 => (
// finite_diff off, direct computation of gradient
false,
// step_min increased to 1e-4
1e-4,
// init_param
array![-1.2, 1.0, -10.0, 2.0, 3.0, 2.0, 4.0, 9.0],
),
// With the increased step with, even finitediff works.
// This is a very convoluted way to say that increasing `step_min` probably solves the
// issue.
5 => (
// finite_diff ON again!
true,
// step_min increased to 1e-4
1e-4,
// init_param
array![-1.2, 1.0, -10.0, 2.0, 3.0, 2.0, 4.0, 9.0],
),
_ => panic!("Non existing case"),
};
// Define cost function
let cost = Rosenbrock {
a: 1.0,
b: 100.0,
finitediff: finite_diff,
};
// set up a line search
let linesearch = MoreThuenteLineSearch::new()
.with_c(1e-4, 0.9)?
.with_bounds(step_min, std::f64::INFINITY)?;
// Set up solver
let solver = LBFGS::new(linesearch, 7);
// Run solver
let res = Executor::new(cost, solver)
.configure(|state| state.param(init_param).max_iters(200))
.add_observer(SlogLogger::term(), ObserverMode::Always)
.run()?;
// Print result
println!("{res}");
Ok(())
}
fn main() {
if let Err(ref e) = run() {
println!("{e}");
std::process::exit(1);
}
} How does adapting the lower bound of the stepsize affect your problem? |
Thank you for the suggestion to tweak the minimum step size on the line search. I have modified it to be I then adjusted the minimum step size back down to my original value of Taking a step back, I thought that the line search was meant to determine its optimal step size as well? Having to constrain the bounds of the line search seems like overfitting the problem, and I am worried that by just changing the problem definition slightly, it will fail to converge and be stable. What do you think? |
I must apologize, I did not read your first post carefully enough and did not notice that you had already changed the bounds. 🤦♂️ I though you had changed the c parameters with
Yes, it seems reasonable that in that case smaller steps are desirable. However, I'd expect that rapidly varying cost functions don't exhibit this problem at all.
I believe that the effect of these parameters is difficult to judge from a few example settings, because it is likely that the entire trajectory changes and it is unclear when it will end up in an area of the cost function which causes these issues.
As far as I understand it, this is its sole purpose.
I guess it is not uncommon to adjust these settings to a problem, but I have the same concerns as you. It is difficult to trust the algorithm in production when it is so sensitive to changes of parameters. It is out of question that the algorithm should not panic in that case. An open question is what should happen instead. Does the algorithm get stuck at a stationary point and we simply have to terminate, or is it a bug? |
Not a waste at all! Thank you for taking the time to suggest something for me to try and diagnose the problem. What I also find very interesting about the configuration parameters, is that with the following min/max very close to each other, the first iteration is much closer to a solution, but then it will only go through a single iteration before hitting the error condition: let linesearch = MoreThuenteLineSearch::new()
.with_c(1e-4, 0.9)?
.with_bounds(1e-4, 1e-3)?; 2024-01-24T14:48:52.712Z Cost fn evaluated to 0.6676890118219991
2024-01-24T14:48:52.712Z Using a delta value of 0.000000001 to compute the local gradient
2024-01-24T14:48:52.713Z Computed grad vector: [0, 3.0997862054960024, 0]
2024-01-24T14:48:52.713Z Cost fn evaluated to 0.65808025556374
2024-01-24T14:48:52.713Z Using a delta value of 0.000000001 to compute the local gradient
2024-01-24T14:48:52.714Z Computed grad vector: [0, 3.0998505984314306, 0]
2024-01-24T14:48:52.714Z Using a delta value of 0.000000001 to compute the local gradient
2024-01-24T14:48:52.715Z Computed grad vector: [0, 3.0998505984314306, 0]
I think the algorithm should return that it "converged" but also in the return data we have access to the cost function value, and the termination condition, which should be set to So is the conclusion that the algorithm is not able to optimize this problem and I need to switch algorithms? |
I'm not sure if I interpret the output correctly, but to me it looks as if it is not really progressing at all.
You do have access to the final cost function value (if it doesn't crash of course ;)). But it will be difficult to judge whether something is a stationary point or an (local) optimum.
I will think about how I can incorporate this in a reasonable way. I think my first approach would be to handle the error inside
Maybe, at least I think it would be worth a try. Is this a common problem that is usually solved with L-BFGS? |
With #416, the error is now handled slightly better. L-BFGS terminates with |
Discussed in #358
Originally posted by itrumper May 24, 2023
I have been using the More-Thuente linesearch with a L-BFGS solver on an optimization problem. The convergence is typically very efficient (in my opinion), however I often encounter the error,
MoreThuenteLineSearch: Search direction must be a descent direction.
I have looked at the code for L-BFGS, and it initializes the search direction on this line.The code in
MoreThuenteLineSearch
then evaluates whether the search direction is within 90 degrees of the descent direction on this line.I'm curious if anyone can help me understand why the L-BFGS algorithm may be providing a search direction that violates this condition? Looking at the optimization state prior to this error being thrown I don't see anything in the gradient or cost functions that stand out to me, it seems like everything is progressing along smoothly.
I have also attempted to adjust the bounds and Wolfe condition parameters, but those do not seem to ever completely remove this behavior, although I am not very experienced with this realm of mathematics, so my investigation has been purely experimental.
It also seems that the check in
MoreThuenteLineSesarch
could be improved so that if the search direction is more anti-parallel to the descent direction, we could flip the search direction's sign to align. In the case of perpendicular, I agree, an error should be thrown.Thank you in advance for any help and suggestions!
The text was updated successfully, but these errors were encountered: