Line search =========== Given current parameters :math:`x_k` and a descent direction :math:`p_k`, the goal of a line search method is to find a step size :math:`\alpha_k` such that the one-dimensional function .. math:: \varphi(\alpha_k) \triangleq f(x_k + \alpha_k p_k) is minimized or at least a sufficient decrease is guaranteed. Sufficient decrease and curvature conditions -------------------------------------------- Exactly minimizing :math:`\varphi` is often computationally costly. Instead, it is often preferred to search for :math:`\alpha_k` satisfying certain conditions. One example of these conditions are the **Wolfe conditions** .. math:: \begin{aligned} f(x_k + \alpha_k p_k) &\le f(x_k) + c_1 \alpha_k \nabla f(x_k)^\top p_k \\ \nabla f(x_k + \alpha_k p_k)^\top p_k &\ge c_2 \nabla f(x_k)^\top p_k \end{aligned} where :math:`0 < c_1 < c_2 < 1`. These conditions are explained in greater detail in Nocedal and Wright, see equations (3.6a) and (3.6b) there. A step size may satisfy the Wolfe conditions without being particularly close to a minimizer of :math:`\varphi` (Nocedal and Wright, Figure 3.5). The curvature condition in the second equation can be modified to force the step size to lie in at least a broad neighborhood of a stationary point of :math:`\varphi`. Combined with the sufficient decrease condition in the first equation, these are known as the **strong Wolfe conditions** .. math:: \begin{aligned} f(x_k + \alpha_k p_k) &\le f(x_k) + c_1 \alpha_k \nabla f(x_k)^\top p_k \\ |\nabla f(x_k + \alpha_k p_k)^\top p_k| &\le c_2 |\nabla f(x_k)^\top p_k| \end{aligned} where again :math:`0 < c_1 < c_2 < 1`. See Nocedal and Wright, equations (3.7a) and (3.7b). Algorithms ---------- .. autosummary:: :toctree: _autosummary jaxopt.BacktrackingLineSearch jaxopt.HagerZhangLineSearch The :class:`BacktrackingLineSearch ` algorithm iteratively reduces the step size by some decrease factor until the conditions above are satisfied. Example:: ls = BacktrackingLineSearch(fun=fun, maxiter=20, condition="strong-wolfe", decrease_factor=0.8) stepsize, state = ls.run(init_stepsize=1.0, params=params, descent_direction=descent_direction, value=value, grad=grad) where * ``init_stepsize`` is the first step size value to try, * ``params`` are the current parameters :math:`x_k`, * ``descent_direction`` is the provided descent direction :math:`p_k` (optional, defaults to :math:`-\nabla f(x_k)`), * ``value`` is the current value :math:`f(x_k)` (optional, recomputed if not provided), * ``grad`` is the current gradient :math:`\nabla f(x_k)` (optional, recomputed if not provided), The returned ``state`` contains useful information such as ``state.params``, which contains :math:`x_k + \alpha_k p_k` and ``state.grad``, which contains :math:`\nabla f(x_k + \alpha_k p_k)`. .. topic:: References: * Numerical Optimization, Jorge Nocedal and Stephen Wright, Second edition.