Acceleration methods: N-GMRES and O-ACCEL

Constructors

NGMRES(;
        alphaguess = LineSearches.InitialStatic(),
        linesearch = LineSearches.HagerZhang(),
        manifold = Flat(),
        wmax::Int = 10,
        ϵ0 = 1e-12,
        nlprecon = GradientDescent(
            alphaguess = LineSearches.InitialPrevious(),
            linesearch = LineSearches.Static(alpha=1e-4,scaled=true),
            manifold = manifold),
        nlpreconopts = Options(iterations = 1, allow_f_increases = true),
      )

OACCEL(;manifold::Manifold = Flat(),
       alphaguess = LineSearches.InitialStatic(),
       linesearch = LineSearches.HagerZhang(),
       nlprecon = GradientDescent(
           alphaguess = LineSearches.InitialPrevious(),
           linesearch = LineSearches.Static(alpha=1e-4,scaled=true),
           manifold = manifold),
       nlpreconopts = Options(iterations = 1, allow_f_increases = true),
       ϵ0 = 1e-12,
       wmax::Int = 10)

Description

These algorithms take a step given by the nonlinear preconditioner nlprecon and proposes an accelerated step on a subspace spanned by the previous wmax iterates.

N-GMRES accelerates based on a minimization of an approximation to the $\ell_2$ norm of the

gradient.

O-ACCEL accelerates based on a minimization of a n approximation to the objective.

N-GMRES was originally developed for solving nonlinear systems [1], and reduces to GMRES for linear problems. Application of the algorithm to optimization is covered, for example, in [2]. A description of O-ACCEL and its connection to N-GMRES can be found in [3].

We recommend trying LBFGS on your problem before N-GMRES or O-ACCEL. All three algorithms have similar computational cost and memory requirements, however, L-BFGS is more efficient for many problems.

Example

This example shows how to accelerate GradientDescent on the Extended Rosenbrock problem. First, we try to optimize using GradientDescent.

using Optim, OptimTestProblems
UP = OptimTestProblems.UnconstrainedProblems
prob = UP.examples["Extended Rosenbrock"]
optimize(UP.objective(prob), UP.gradient(prob), prob.initial_x, GradientDescent())

The algorithm does not converge within 1000 iterations.

Results of Optimization Algorithm
 * Algorithm: Gradient Descent
 * Starting Point: [-1.2,1.0, ...]
 * Minimizer: [0.8923389282461412,0.7961268644300445, ...]
 * Minimum: 2.898230e-01
 * Iterations: 1000
 * Convergence: false
   * |x - x'| ≤ 1.0e-32: false 
     |x - x'| = 4.02e-04 
   * |f(x) - f(x')| ≤ 1.0e-32 |f(x)|: false
     |f(x) - f(x')| = 2.38e-03 |f(x)|
   * |g(x)| ≤ 1.0e-08: false 
     |g(x)| = 8.23e-02 
   * Stopped by an increasing objective: false
   * Reached Maximum Number of Iterations: true
 * Objective Calls: 2525
 * Gradient Calls: 2525

Now, we use OACCEL to accelerate GradientDescent.

# Default nonlinear procenditioner for `OACCEL`
nlprecon =  GradientDescent(linesearch=LineSearches.Static(alpha=1e-4,scaled=true))
# Default size of subspace that OACCEL accelerates over is `wmax = 10`
oacc10 = OACCEL(nlprecon=nlprecon, wmax=10)
optimize(UP.objective(prob), UP.gradient(prob), prob.initial_x, oacc10)

This drastically improves the GradientDescent algorithm, converging in 87 iterations.

Results of Optimization Algorithm
 * Algorithm: O-ACCEL preconditioned with Gradient Descent
 * Starting Point: [-1.2,1.0, ...]
 * Minimizer: [1.0000000011361219,1.0000000022828495, ...]
 * Minimum: 3.255053e-17
 * Iterations: 87
 * Convergence: true
   * |x - x'| ≤ 1.0e-32: false 
     |x - x'| = 6.51e-08 
   * |f(x) - f(x')| ≤ 1.0e-32 |f(x)|: false
     |f(x) - f(x')| = 7.56e+02 |f(x)|
   * |g(x)| ≤ 1.0e-08: true 
     |g(x)| = 1.06e-09 
   * Stopped by an increasing objective: false
   * Reached Maximum Number of Iterations: false
 * Objective Calls: 285
 * Gradient Calls: 285

We can improve the acceleration further by changing the acceleration subspace size wmax.

oacc5 = OACCEL(nlprecon=nlprecon, wmax=5)
optimize(UP.objective(prob), UP.gradient(prob), prob.initial_x, oacc5)

Now, the O-ACCEL algorithm has accelerated GradientDescent to converge in 50 iterations.

Results of Optimization Algorithm
 * Algorithm: O-ACCEL preconditioned with Gradient Descent
 * Starting Point: [-1.2,1.0, ...]
 * Minimizer: [0.9999999999392858,0.9999999998784691, ...]
 * Minimum: 9.218164e-20
 * Iterations: 50
 * Convergence: true
   * |x - x'| ≤ 1.0e-32: false 
     |x - x'| = 2.76e-07 
   * |f(x) - f(x')| ≤ 1.0e-32 |f(x)|: false
     |f(x) - f(x')| = 5.18e+06 |f(x)|
   * |g(x)| ≤ 1.0e-08: true 
     |g(x)| = 4.02e-11 
   * Stopped by an increasing objective: false
   * Reached Maximum Number of Iterations: false
 * Objective Calls: 181
 * Gradient Calls: 181

As a final comparison, we can do the same with N-GMRES.

ngmres5 = NGMRES(nlprecon=nlprecon, wmax=5)
optimize(UP.objective(prob), UP.gradient(prob), prob.initial_x, ngmres5)

Again, this significantly improves the GradientDescent algorithm, and converges in 63 iterations.

Results of Optimization Algorithm
 * Algorithm: Nonlinear GMRES preconditioned with Gradient Descent
 * Starting Point: [-1.2,1.0, ...]
 * Minimizer: [0.9999999998534468,0.9999999997063993, ...]
 * Minimum: 5.375569e-19
 * Iterations: 63
 * Convergence: true
   * |x - x'| ≤ 1.0e-32: false 
     |x - x'| = 9.94e-09 
   * |f(x) - f(x')| ≤ 1.0e-32 |f(x)|: false
     |f(x) - f(x')| = 1.29e+03 |f(x)|
   * |g(x)| ≤ 1.0e-08: true 
     |g(x)| = 4.94e-11 
   * Stopped by an increasing objective: false
   * Reached Maximum Number of Iterations: false
 * Objective Calls: 222
 * Gradient Calls: 222

References

[1] De Sterck. Steepest descent preconditioning for nonlinear GMRES optimization. NLAA, 2013. [2] Washio and Oosterlee. Krylov subspace acceleration for nonlinear multigrid schemes. ETNA, 1997. [3] Riseth. Objective acceleration for unconstrained optimization. 2018.