Developer Guide and Reference

  • 2022.1
  • 04/11/2022
  • Public Content
Contents

Softmax

General

The softmax primitive performs forward or backward softmax or logsoftmax operation along a particular axis on data with arbitrary dimensions. All other axes are treated as independent (batch).
Forward
In general form, the operation is defined by the following formulas (the variable names follow the standard Naming Conventions).
Softmax:
LaTex Math image.
Logsoftmax:
LaTex Math image.
Above
  • LaTex Math image. is the axis over which the operation is computed on,
  • LaTex Math image. is the outermost index (to the left of the axis),
  • LaTex Math image. is the innermost index (to the right of the axis), and
  • LaTex Math image. is used to produce numerically stable results and defined as:
    LaTex Math image.
Difference Between Forward Training and Forward Inference
There is no difference between the dnnl_forward_training and dnnl_forward_inference propagation kinds.
Backward
The backward propagation computes LaTex Math image., based on LaTex Math image. and LaTex Math image..

Execution Arguments

When executed, the inputs and outputs should be mapped to an execution argument index as specified by the following table.
Primitive input/output
Execution argument index
LaTex Math image.
DNNL_ARG_SRC
LaTex Math image.
DNNL_ARG_DST
LaTex Math image.
DNNL_ARG_DIFF_SRC
LaTex Math image.
DNNL_ARG_DIFF_DST

Implementation Details

General Notes
  1. Both forward and backward propagation support in-place operations, meaning that
    src
    can be used as input and output for forward propagation, and
    diff_dst
    can be used as input and output for backward propagation. In case of in-place operation, the original data will be overwritten. This support is limited to cases when data types of
    src
    /
    dst
    or
    diff_src
    /
    diff_dst
    are identical.
Post-ops and Attributes
Attributes enable you to modify the behavior of the softmax primitive. The following attributes are supported by the softmax primitive:
Propagation
Type
Operation
Description
Restrictions
forward
attribute
Output scale
Scales the result of softmax by given scale factor
int8 softmax only, zero mask only
Data Type Support
The softmax primitive supports the following combinations of data types:
Propagation
Source
Destination
forward / backward
f32, bf16
f32, bf16
forward
f16
f16
forward
f32, bf16, u8, s8
u8, s8
forward
u8, s8
f32, bf16
Data Representation
Source, Destination, and Their Gradients
The softmax primitive works with arbitrary data tensors. There is no special meaning associated with any logical dimensions. However, the softmax axis is typically referred to as channels (hence in formulas we use LaTex Math image.).

Implementation Limitations

  1. No primitive specific limitations. Refer to Data Types for limitations related to data types support.

Performance Tips

  1. Use in-place operations whenever possible.
  2. Currently the softmax primitive is optimized for the cases where the dimension of the softmax axis is physically dense. For instance:
    • Optimized: 2D case, tensor LaTex Math image., softmax axis 1 (B), format tag dnnl_ab
    • Optimized: 4D case, tensor LaTex Math image., softmax axis 3 (D), format tag dnnl_abcd
    • Optimized: 4D case, tensor LaTex Math image., softmax axis 1 (B), format tag dnnl_abcd, and LaTex Math image.
    • Optimized: 4D case, tensor LaTex Math image., softmax axis 1 (B), format tag dnnl_acdb or dnnl_aBcd16b, and LaTex Math image.
    • Non-optimized: 2D case, tensor LaTex Math image., softmax axis 0 (A), format tag dnnl_ab, and LaTex Math image.
    • Non-optimized: 2D case, tensor LaTex Math image., softmax axis 1 (B), format tag dnnl_ba, and LaTex Math image.
    • Non-optimized: 4D case, tensor LaTex Math image., softmax axis 2 (C), format tag dnnl_acdb, and and LaTex Math image.

Example

This C++ API example demonstrates how to create and execute a Softmax primitive in forward training propagation mode.
Key optimizations included in this example:
  • In-place primitive execution;
  • Softmax along axis 1 (C) for 2D tensors.

Product and Performance Information

1

Performance varies by use, configuration and other factors. Learn more at www.Intel.com/PerformanceIndex.