Line search
Given current parameters \(x_k\) and a descent direction \(p_k\), the goal of a line search method is to find a step size \(\alpha_k\) such that the one-dimensional function
is minimized or at least a sufficient decrease is guaranteed.
Sufficient decrease and curvature conditions
Exactly minimizing \(\varphi\) is often computationally costly. Instead, it is often preferred to search for \(\alpha_k\) satisfying certain conditions. One example of these conditions are the Wolfe conditions
where \(0 < c_1 < c_2 < 1\). These conditions are explained in greater detail in Nocedal and Wright, see equations (3.6a) and (3.6b) there.
A step size may satisfy the Wolfe conditions without being particularly close to a minimizer of \(\varphi\) (Nocedal and Wright, Figure 3.5). The curvature condition in the second equation can be modified to force the step size to lie in at least a broad neighborhood of a stationary point of \(\varphi\). Combined with the sufficient decrease condition in the first equation, these are known as the strong Wolfe conditions
where again \(0 < c_1 < c_2 < 1\). See Nocedal and Wright, equations (3.7a) and (3.7b).
Algorithms
|
Backtracking line search. |
|
Hager-Zhang line search. |
The BacktrackingLineSearch
algorithm
iteratively reduces the step size by some decrease factor until the conditions
above are satisfied. Example:
ls = BacktrackingLineSearch(fun=fun, maxiter=20, condition="strong-wolfe",
decrease_factor=0.8)
stepsize, state = ls.run(init_stepsize=1.0, params=params,
descent_direction=descent_direction,
value=value, grad=grad)
where
init_stepsize
is the first step size value to try,params
are the current parameters \(x_k\),descent_direction
is the provided descent direction \(p_k\) (optional, defaults to \(-\nabla f(x_k)\)),value
is the current value \(f(x_k)\) (optional, recomputed if not provided),grad
is the current gradient \(\nabla f(x_k)\) (optional, recomputed if not provided),
The returned state
contains useful information such as state.params
,
which contains \(x_k + \alpha_k p_k\) and state.grad
, which contains
\(\nabla f(x_k + \alpha_k p_k)\).
References:
Numerical Optimization, Jorge Nocedal and Stephen Wright, Second edition.