macopt-a nippy wee optimizer

What is wrong with the Numerical Recipes Conjugate Gradient optimizers

I used the conjugate gradient code in the Numerical Recipes book for several years to train successful neural networks. But this code is actually very poorly written. It's not just a question of their use of global variables and other ugliness. The optimizers can be very clumsy when applied to real functions.

Summary of the macopt optimizer

I have written my own conjugate gradient algorithm that attempts to improve on the NR code. My algorithm is called macopt. It has the following properties: I find that this algorithm sometimes is ten times faster than the NR code. The Xerion group at University of Toronto have also written their own conjugate gradient optimizers, and they have adaptive conjugate gradient optimizers that only need 1.6 gradient evaluations per line search.

Tar files: | New C version | C++ version - tar file |

| original C version | | older C version |

Other versions

Matlab and Octave: (Mon Jun 28 2004) Iain Murray has made matlab and octave wrappers for macopt. | matlab wrapper | octave wrapper |
Notes from Iain: 1. For instructions, cd macopt_oct; cat README. The command make will download macopt.tar.gz. 2. Then run octave-2.1.57 and help macoptII. 3. I haven't wrapped the covariant version, and I haven't exposed every option to the end user.

Java: Macopt has been put into JAVA in the Phylogenetic Analysis Library | PAL mirror site | local copy of tar file, pal-1.4.tar.gz |

Instructions for the New C version

Most of the original instructions, given below, are accurate. The only difference is that I organize my directories differently (for ease of maintaining executables on multiple platforms). When you tar xvf, you will get a directory called newansi. This directory must contain a directory called bin$ARC, for example if $ARC is i386, it should be called bini386. This directory should inherit its makefile from newansi/_Makefile thus:
 newansi/bini386/makefile -> ../_Makefile
The above example link is created for you by the tar file. All the .o files and executables are put into bini386. To execute an executable, give the path to it (eg bini386/test_macII) or modify your path to include ./bini386.

Apologies for this minor complication, hope it works OK for you!

Instructions for original C version

There is a demonstration program called test_mac which uses macopt to minimize a quadratic function. When you get this tar file, uncompress it, tar xvf it, (Note this creates a directory called ansi), and cd ansi, then type
 make test_macII
Hopefully you will get an executable test_macII which, when executed, minimizes the function 1/2 xAx - bx, where A and b are given by the user interactively. [NB, you must give a positive definite symmetric matrix, e.g.
2 1
1 2        .]

The program also makes use of the maccheckgrad function, to check (visually) that the function that claims to return a gradient really does so. In one column, the gradient is given, and in the other the numerical first difference of the objective function.

The anonymous ftp route for the tar file, if you need it, is: ftp www.inference.phy.cam.ac.uk; cd pub/www/mackay/c; binary.

Messing with the defaults

You may wish to change the following variables.
  1. In the makefile:
    1. Optimization at compile time. Cut out the -O2 -funroll-loops flags to disable optimization. This will make things run slower but will make them easier to debug with many debuggers.
  2. In the macopt_arg structure (default values set by macopt_defaults). [See macopt.h for further information, and see macopt_defaults in macopt.c.]
    end_if_small_step
    Defines whether the end of the optimization occurs when a small step is made, or when the gradient is small. Some users find things work well if you make the termination condition based on the gradient rather than the step size.
    tol
    Defines `small' in the above senses. I recommend using end_if_small_step=1, and setting tol using your knowledge of your parameter space. What is a `really small' change in a typical parameter?

    Note that tol only defines when the optimization ends. It does not have any effect on what happens during the line searches.

    lastx and lastx_default
    defines typical distance in parameter space at which the line minimum is expected; both these should be set. the default is consulted if something goes badly wrong and a reset is demanded. It is not essential to give these good values, as the algorithm is adaptive, but the idea is that you should specify a typical expected step size here. Err on the small side if unsure.
    itmax
    Maximum number of line minimizations performed.
    rich
    Whether a gradient is evaluated at the beginning of every line minimization. If you can get away with this being 0, then the optimizer runs faster.
    verbose
    if verbose = 1 then there is one report for each line minimization. if verbose = 2 then there is an additional report for each step of the line minimization. if verbose = 3 then extra debugging routines kick in.

Error messages

Failure to compile.
If it complains about srand being redefined, don't worry. We don't need srand.
If it complains about the return type of main(), tell it to be less pedantic.
Warning! maclinmin overran- inner product at 0 = 0.09674
This sort of error message is a problem. The inner product at the beginning of a line search should be negative, otherwise the whole method doesn't work at all.

If you get this error message a lot, either there is a bug in your gradient computing software (always make use of the check_grad routine to see if all is well with your gradient) or macopt is being too ambitious and needs to be a bit more conservative: set `rich' to 1.

Questions from users of macopt:

We are using your macopt package on an application that we would like to release under the GNU Lesser General Pulic License (LGPL). Do you offer a license that will permit us to distribute your code under the LGPL?

Yes, as of Thu 15/8/02, the macopt releases in both C and C++ on my website are distributed under the LGPL; the associated example test files are distributed under the GPL.
Note however that three files in the tar file belong to other people:
rand.h is copyright by Radford Neal. (It is not actually used by macopt, so this doesn't matter.)
nrutil.c and nrutil.h are derived from the Numerical Recipes in C software library. These minor routines are used by macopt to handle memory allocation and deallocation.

David MacKay's work is largely supported by donations. If you find macopt useful, please feel free to make a donation.

is there now any need for mynr.h in the release - nothing in it seems to get used anywhere ?
You are right, mynr.h is not needed. Sorry about this untidiness.
I didn't initially realise how inherently macopt depends on gradients being easily computed. Is this really that often the case ?
Yes, macopt is built on the assumption that a gradient evaluation costs only about twice as much as a function evaluation. This is the case in all the statistical models I work with (neural nets, etc.)
If in fact function evaluations are relatively cheap and gradients are very expensive, then you might want to change to a different line minimizer. [A friend told me that lnsrch in the new edition of numerical recipes is a good line minimizer.]
However, Barak Pearlmutter points out that Gradients can always be calculated with at worst 5 times more effort, and typically just 2 times (even if there is an iterate-to-fixedpoint in the function evaluation itself, which complicates things a little.) [references] Thus it makes sense to always make a gradient-finding routine.

References concerning optimization and the ease of computing derivatives:
@PHDTHESIS{Speelpenning1980a,
  author = "Bert Speelpenning",
  month = "January",
  year = 1980,
  title = "Compiling Fast Partial Derivatives of Functions Given by Algorithms",
  address = "Urbana-Champaign, IL",
  school = "Department of Computer Science, University of Illinois at
           Urbana-Champaign",
  key = "Speelpenning1980a",
  abstract = "This is the author's doctoral thesis. It starts by comparing
             previous work in the area of symbolic differentiation of
             algorithms. Specifically it considers the work of Warner in 1975,
             Joss in 1976 and Kedem in 1977. The conclusions reached in this
             discussion are that the work by Joss is the best in terms of
             improvement. The author proceeds to describe how Joss' work can
             be improved in terms of speed, accuracy and space. A package,
             Jake, is described which is a compiler that takes a Fortran 66
             input definition of a function. This input is limited in that
             only one subroutine can be specified, and that certain Fortran 66
             statements are disallowed. Jake is instructed on how to perform
             its task by directives within the subroutine. Timing results are
             provided on the performance of the code produced by Jake over
             those where finite differencing is used.",
  keywords = "point algorithm; precompiler; numerical results."
}

@BOOK{Griewank2000a,
  author = "Andreas Griewank",
  year = 2000,
  title = "Evaluating Derivatives: Principles and Techniques of Algorithmic
          Differentiation",
  series = "Frontiers in Appl. Math.",
  number = 19,
  publisher = "SIAM",
  address = "Philadelphia, PA",
  key = "Griewank2000a",
  isbn = "0--89871--451--6"
}

David MacKay's: home page, publications and code.
Last modified: Tue Jun 29 11:23:41 2004