Developer Guide and Reference

  • 2021.4
  • 09/27/2021
  • Public Content
Contents

Adaptive Subgradient Method

The adaptive subgradient method (AdaGrad) [Duchi2011] follows the algorithmic framework of an iterative solver with the algorithm-specific transformation
T
, set of intrinsic parameters LaTex Math image. defined for the learning rate LaTex Math image., and algorithm-specific vector
U
and power
d
of Lebesgue space defined as follows:
LaTex Math image.
LaTex Math image.:
  1. LaTex Math image., where LaTex Math image. is the
    i
    -th coordinate of the gradient LaTex Math image.
  2. LaTex Math image., where
    LaTex Math image.
Convergence check: LaTex Math image.

Computation

The adaptive subgradient (AdaGrad) method is a special case of an iterative solver. For parameters, input, and output of iterative solvers, see Computation for Iterative Solver.
Algorithm Input
In addition to the input of the iterative solver, the AdaGrad method accepts the following optional input:
OptionalDataID
Input
gradientSquareSum
A numeric table of size LaTex Math image. with the values of LaTex Math image.. Each value is an accumulated sum of squares of coordinate values of a corresponding gradient.
Algorithm Parameters
In addition to parameters of the iterative solver, the AdaGrad method has the following parameters:
Parameter
Default Value
Description
algorithmFPType
float
The floating-point type that the algorithm uses for intermediate computations. Can be
float
or
double
.
method
defaultDense
Default performance-oriented computation method.
batchIndices
NULL
A numeric table of size LaTex Math image. for the
defaultDense
method that represents 32-bit integer indices of terms in the objective function. If no indices are provided, the algorithm generates random indices.
batchSize
128
The number of batch indices to compute the stochastic gradient.
If
batchSize
equals the number of terms in the objective function, no random sampling is performed, and all terms are used to calculate the gradient.
The algorithm ignores this parameter if the
batchIndices
parameter is provided.
learningRate
A numeric table of size LaTex Math image. that contains the default step length equal to
0.01
.
A numeric table of size LaTex Math image. that contains the value of learning rate LaTex Math image..
This parameter can be an object of any class derived from
NumericTable
, except for
PackedTriangularMatrix
,
PackedSymmetricMatrix
, and
CSRNumericTable
.
degenerateCasesThreshold
LaTex Math image.
Value LaTex Math image. needed to avoid degenerate cases when computing square roots.
engine
SharePtr< engines:: mt19937:: Batch>()
Pointer to the random number generator engine that is used internally for generation of 32-bit integer indices of terms in the objective function.
Algorithm Output
In addition to the output of the iterative solver, the AdaGrad method calculates the following optional result:
OptionalDataID
Output
gradientSquareSum
A numeric table of size LaTex Math image. with the values of LaTex Math image.. Each value is an accumulated sum of squares of coordinate values of a corresponding gradient.

Product and Performance Information

1

Performance varies by use, configuration and other factors. Learn more at www.Intel.com/PerformanceIndex.