Adam and AdaMax
This page contains information about Adam and AdaMax. Notice, that these algorithms do not use line search algorithms, so some tuning of alpha
may be necessary to obtain sufficiently fast convergence on your specific problem.
Constructors
Adam(; alpha=0.0001,
beta_mean=0.9,
beta_var=0.999,
epsilon=1e-8)
where alpha
is the step length or learning parameter. beta_mean
and beta_var
are exponential decay parameters for the first and second moments estimates. Setting these closer to 0 will cause past iterates to matter less for the current steps and setting them closer to 1 means emphasizing past iterates more. epsilon
should rarely be changed, and just exists to avoid a division by 0.
AdaMax(; alpha=0.002,
beta_mean=0.9,
beta_var=0.999,
epsilon=1e-8)
where alpha
is the step length or learning parameter. beta_mean
and beta_var
are exponential decay parameters for the first and second moments estimates. Setting these closer to 0 will cause past iterates to matter less for the current steps and setting them closer to 1 means emphasizing past iterates more.
References
Kingma, Diederik P., and Jimmy Ba. "Adam: A method for stochastic optimization." arXiv preprint arXiv:1412.6980 (2014).