Minimizing a function

Minimizing a multivariate function

To show how the Optim package can be used, we implement the Rosenbrock function, a classic problem in numerical optimization. We'll assume that you've already installed the Optim package using Julia's package manager. First, we load Optim and define the Rosenbrock function:

using Optim
f(x) = (1.0 - x[1])^2 + 100.0 * (x[2] - x[1]^2)^2

Once we've defined this function, we can find the minimum of the Rosenbrock function using any of our favorite optimization algorithms. With a function defined, we just specify an initial point x and run:

optimize(f, [0.0, 0.0])

Optim will default to using the Nelder-Mead method in this case, as we did not provide a gradient. This can also be explicitly specified using:

optimize(f, [0.0, 0.0], NelderMead())

Other solvers are available. Below, we use L-BFGS, a quasi-Newton method that requires a gradient. If we pass f alone, Optim will construct an approximate gradient for us using central finite differencing:

optimize(f, [0.0, 0.0], LBFGS())

For better performance and greater precision, you can pass your own gradient function. For the Rosenbrock example, the analytical gradient can be shown to be:

function g!(storage, x)
storage[1] = -2.0 * (1.0 - x[1]) - 400.0 * (x[2] - x[1]^2) * x[1]
storage[2] = 200.0 * (x[2] - x[1]^2)
end

Note that the functions we're using to calculate the gradient (and later the Hessian h!) of the Rosenbrock function mutate a fixed-sized storage array, which is passed as an additional argument called storage. By mutating a single array over many iterations, this style of function definition removes the sometimes considerable costs associated with allocating a new array during each call to the g! or h! functions. You can use Optim without manually defining a gradient or Hessian function, but if you do define these functions, they must take these two arguments in this order. Returning to our optimization problem, you simply pass g! together with f from before to use the gradient:

optimize(f, g!, [0.0, 0.0], LBFGS())

For some methods, like simulated annealing, the gradient will be ignored:

optimize(f, g!, [0.0, 0.0], SimulatedAnnealing())

In addition to providing gradients, you can provide a Hessian function h! as well. In our current case this is:

function h!(storage, x)
    storage[1, 1] = 2.0 - 400.0 * x[2] + 1200.0 * x[1]^2
    storage[1, 2] = -400.0 * x[1]
    storage[2, 1] = -400.0 * x[1]
    storage[2, 2] = 200.0
end

Now we can use Newton's method for optimization by running:

optimize(f, g!, h!, [0.0, 0.0])

Which defaults to Newton() since a Hessian was provided. Like gradients, the Hessian function will be ignored if you use a method that does not require it:

optimize(f, g!, h!, [0.0, 0.0], LBFGS())

Note that Optim will not generate approximate Hessians using finite differencing because of the potentially low accuracy of approximations to the Hessians. Other than Newton's method, none of the algorithms provided by the Optim package employ exact Hessians.

Box minimization

A primal interior-point algorithm for simple "box" constraints (lower and upper bounds) is also available. Reusing our Rosenbrock example from above, boxed minimization is performed as follows:

lower = [1.25, -2.1]
upper = [Inf, Inf]
initial_x = [2.0, 2.0]
od = OnceDifferentiable(f, g!, initial_x)
results = optimize(od, initial_x, lower, upper, Fminbox{GradientDescent}())

This performs optimization with a barrier penalty, successively scaling down the barrier coefficient and using the chosen optimizer (GradientDescent above) for convergence at each step. Notice that the Optimizer type, not an instance should be passed (GradientDescent, not GradientDescent()).

This algorithm uses diagonal preconditioning to improve the accuracy, and hence is a good example of how to use ConjugateGradient or LBFGS with preconditioning. Other methods will currently not use preconditioning. Only the box constraints are used. If you can analytically compute the diagonal of the Hessian of your objective function, you may want to consider writing your own preconditioner.

There are two iterations parameters: an outer iterations parameter used to control Fminbox and an inner iterations parameter used to control the inner optimizer. For this reason, the options syntax is a bit different from the rest of the package. All parameters regarding the outer iterations are passed as keyword arguments, and options for the interior optimizer is passed as an Optim.Options type using the keyword optimizer_o.

For example, the following restricts the optimization to 2 major iterations

od = OnceDifferentiable(f, g!, initial_x)
results = optimize(od, initial_x, lower, upper, Fminbox{GradientDescent}(); iterations = 2)

In contrast, the following sets the maximum number of iterations for each ConjugateGradient optimization to 2

od = OnceDifferentiable(f, g!, initial_x)
results = Optim.optimize(od, initial_x, lower, upper, Fminbox{GradientDescent}(); optimizer_o = Optim.Options(iterations = 2))

Minimizing a univariate function on a bounded interval

Minimization of univariate functions without derivatives is available through the optimize interface:

f_univariate(x) = 2x^2+3x+1
optimize(f_univariate, -2.0, 1.0)

Two methods are available:

Brent's method, the default (can be explicitly selected with Brent()).
Golden section search, available with GoldenSection().

In addition to the iterations, store_trace, show_trace and extended_trace options, the following options are also available:

rel_tol: The relative tolerance used for determining convergence. Defaults to sqrt(eps(T)).
abs_tol: The absolute tolerance used for determining convergence. Defaults to eps(T).

Obtaining results

After we have our results in res, we can use the API for getting optimization results. This consists of a collection of functions. They are not exported, so they have to be prefixed by Optim.. Say we do the following optimization:

res = optimize(x->dot(x,[1 0. 0; 0 3 0; 0 0 1]*x), zeros(3))

If we can't remember what method we used, we simply use

Optim.summary(res)

which will return "Nelder Mead". A bit more useful information is the minimizer and minimum of the objective functions, which can be found using

julia> Optim.minimizer(res)
3-element Array{Float64,1}:
 -0.499921
 -0.3333
 -1.49994

julia> Optim.minimum(res)
 -2.8333333205768865

Complete list of functions

A complete list of functions can be found below.

Defined for all methods:

summary(res)
minimizer(res)
minimum(res)
iterations(res)
iteration_limit_reached(res)
trace(res)
x_trace(res)
f_trace(res)
f_calls(res)
converged(res)

Defined for univariate optimization:

lower_bound(res)
upper_bound(res)
x_lower_trace(res)
x_upper_trace(res)
rel_tol(res)
abs_tol(res)

Defined for multivariate optimization:

g_norm_trace(res)
g_calls(res)
x_converged(res)
f_converged(res)
g_converged(res)
initial_state(res)

Input types

Most users will input Vector's as their initial_x's, and get an Optim.minimizer(res) out that is also a vector. For zeroth and first order methods, it is also possible to pass in matrices, or even higher dimensional arrays. The only restriction imposed by leaving the Vector case is, that it is no longer possible to use finite difference approximations or autmatic differentiation. Second order methods (variants of Newton's method) do not support this more general input type.

Notes on convergence flags and checks

Currently, it is possible to access a minimizer using Optim.minimizer(result) even if all convergence flags are false. This means that the user has to be a bit careful when using the output from the solvers. It is advised to include checks for convergence if the minimizer or minimum is used to carry out further calculations.

A related note is that first and second order methods makes a convergence check on the gradient before entering the optimization loop. This is done to prevent line search errors if initial_x is a stationary point. Notice, that this is only a first order check. If initial_x is any type of stationary point, g_converged will be true. This includes local minima, saddle points, and local maxima. If iterations is 0 and g_converged is true, the user needs to keep this point in mind.