Intel® oneAPI Deep Neural Network Developer Guide and Reference
A newer version of this document is available. Customers should click here to go to the newest version.
RNN
General
The RNN primitive computes a stack of unrolled recurrent cells, as depicted in Figure 1.  ,
,  and
 and  are optional parameters (the variable names follow the standard Naming Conventions). If not provided,
 are optional parameters (the variable names follow the standard Naming Conventions). If not provided,  and
 and  will default to 0.
 will default to 0.
 
 
   The RNN primitive supports four modes for evaluation direction:
- left2right will process the input data timestamps by increasing order 
- right2left will process the input data timestamps by decreasing order 
- bidirectional_concat will process all the stacked layers from left2right and from right2left independently, and will concatenate the output in  over the channel dimension. over the channel dimension.
- bidirectional_sum will process all the stacked layers from left2right and from right2left independently, and will sum the two outputs to  . .
Even though the RNN primitive supports passing a different number of channels for  ,
,  ,
,  , and
, and  , we always require the following conditions in order for the dimension to be consistent:
, we always require the following conditions in order for the dimension to be consistent:
 , ,
- when  , , , ,
- when  , , , ,
- when using the bidirectional_concat direction,  . .
The general formula for the execution of a stack of unrolled recurrent cells depends on the current iteration of the previous layer ( and
 and  ) and the previous iteration of the current layer (
) and the previous iteration of the current layer ( ). Here is the exact equation for non-LSTM cells:
). Here is the exact equation for non-LSTM cells:
 
 
   where  are the indices of the timestamp and the layer of the cell being executed.
 are the indices of the timestamp and the layer of the cell being executed.
And here is the equation for LSTM cells:
 
 
   where  are the indices of the timestamp and the layer of the cell being executed.
 are the indices of the timestamp and the layer of the cell being executed.
Cell Functions
The RNN API provides four cell functions:
- Vanilla RNN, a single-gate recurrent cell, 
- LSTM, a four-gate long short-term memory cell, 
- GRU, a three-gate gated recurrent unit cell, 
- Linear-before-reset GRU, a three-gate recurrent unit cell with the linear layer before the reset gate, 
- AUGRU, a three-gate gated recurrent unit cell with the attention update gate, 
- Linear-before-reset AUGRU, a three-gate recurrent unit cell with the linear layer before the reset gate and the attention update gate. 
Vanilla RNN
A single-gate recurrent cell initialized with dnnl::vanilla_rnn_forward::primitive_desc::primitive_desc() or dnnl::vanilla_rnn_backward::primitive_desc::primitive_desc() as in the following example.
auto vanilla_rnn_pd = dnnl::vanilla_rnn_forward::primitive_desc( engine, aprop, activation, direction, src_layer_desc, src_iter_desc, weights_layer_desc, weights_iter_desc, bias_desc, dst_layer_desc, dst_iter_desc);
The Vanilla RNN cell supports the ReLU, Tanh and Sigmoid activation functions. The following equations defines the mathematical operation performed by the Vanilla RNN cell for the forward pass:
 
 
   LSTM
LSTM (or Vanilla LSTM)
A four-gate long short-term memory recurrent cell initialized with dnnl::lstm_forward::primitive_desc::primitive_desc() or dnnl::lstm_backward::primitive_desc::primitive_desc() as in the following example.
auto lstm_pd = lstm_forward::primitive_desc(
    engine, aprop, direction, src_layer_desc, src_iter_h_desc,
    src_iter_c_desc, weights_layer_desc, weights_iter_desc, bias_desc,
    dst_layer_desc, dst_iter_h_desc, dst_iter_c_desc); 
   Note that for all tensors with a dimension depending on the gate number, we implicitly require the order of these gates to be i, f,  , and o. The following equation gives the mathematical description of these gates and output for the forward pass:
, and o. The following equation gives the mathematical description of these gates and output for the forward pass:
 
 
   where  are stored in
 are stored in  ,
,  are stored in
 are stored in  and
 and  are stored in
 are stored in  .
.
 .
. 
   LSTM with Peephole
A four-gate long short-term memory recurrent cell with peephole initialized with dnnl::lstm_forward::primitive_desc::primitive_desc() or dnnl::lstm_backward::primitive_desc::primitive_desc() as in the following example.
auto lstm_pd = dnnl::lstm_forward::primitive_desc( engine, aprop, direction, src_layer_desc, src_iter_h_desc, src_iter_c_desc, weights_layer_desc, weights_iter_desc, weights_peephole_desc, bias_desc, dst_layer_desc, dst_iter_h_desc, dst_iter_c_desc);
Similarly to vanilla LSTM, we implicitly require the order of the gates to be i, f,  , and o for all tensors with a dimension depending on the gates. For peephole weights, the gates order is i, f, o. The following equation gives the mathematical description of these gates and output for the forward pass:
, and o for all tensors with a dimension depending on the gates. For peephole weights, the gates order is i, f, o. The following equation gives the mathematical description of these gates and output for the forward pass:
 
 
   where  are stored in weights_peephole, and the other parameters are the same as in vanilla LSTM.
 are stored in weights_peephole, and the other parameters are the same as in vanilla LSTM.
LSTM with Projection (or LSTMP)
A four-gate long short-term memory recurrent cell with projection initialized with dnnl::lstm_forward::primitive_desc::primitive_desc() or dnnl::lstm_backward::primitive_desc::primitive_desc() as in the following example.
auto lstm_pd = dnnl::lstm_forward::primitive_desc( engine, aprop, direction, src_layer_desc, src_iter_h_desc, src_iter_c_desc, weights_layer_desc, weights_iter_desc, weights_peephole_desc, weights_projection_desc, bias_desc, dst_layer_desc, dst_iter_h_desc, dst_iter_c_desc);
Similarly to vanilla LSTM, we implicitly require the order of the gates to be i, f,  , and o for all tensors with a dimension depending on the gates. The following equation gives the mathematical description of these gates and output for the forward pass (for simplicity, LSTM without peephole is shown):
, and o for all tensors with a dimension depending on the gates. The following equation gives the mathematical description of these gates and output for the forward pass (for simplicity, LSTM without peephole is shown):
 
 
   where  is stored in weights_projection, and the other parameters are the same as in vanilla LSTM.
 is stored in weights_projection, and the other parameters are the same as in vanilla LSTM.
GRU
A three-gate gated recurrent unit cell, initialized with dnnl::gru_forward::primitive_desc::primitive_desc() or dnnl::gru_backward::primitive_desc::primitive_desc() as in the following example.
auto gru_pd = dnnl::gru_forward::primitive_desc( engine, aprop, direction, src_layer_desc, src_iter_desc, weights_layer_desc, weights_iter_desc, bias_desc, dst_layer_desc, dst_iter_desc);
Note that for all tensors with a dimension depending on the gate number, we implicitly require the order of these gates to be u, r, and o. The following equation gives the mathematical definition of these gates.
 
 
   where  are in
 are in  ,
,  are in
 are in  , and
, and  are stored in
 are stored in  .
.
 ,
, 
     and
 and 
     by
 by 
     . This is possible as
. This is possible as 
     , and
, and 
     .
. 
   Linear-Before-Reset GRU
A three-gate gated recurrent unit cell with linear layer applied before the reset gate, initialized with dnnl::lbr_gru_forward::primitive_desc::primitive_desc() or dnnl::lbr_gru_backward::primitive_desc::primitive_desc() as in the following example.
auto lbr_gru_pd = dnnl::lbr_gru_forward::primitive_desc( engine, aprop, direction, src_layer_desc, src_iter_desc, weights_layer_desc, weights_iter_desc, bias_desc, dst_layer_desc, dst_iter_desc);
The following equation describes the mathematical behavior of the Linear-Before-Reset GRU cell.
 
 
   Note that for all tensors with a dimension depending on the gate number, except the bias, we implicitly require the order of these gates to be u, r, and o. For the  tensor, we implicitly require the order of the gates to be u, r, o, and u .
 tensor, we implicitly require the order of the gates to be u, r, o, and u .
 ,
, 
     and
 and 
     by
 by 
     . This is possible as
. This is possible as 
     , and
, and 
     .
. 
   AUGRU
A three-gate gated recurrent unit cell, initialized with dnnl::augru_forward::primitive_desc::primitive_desc() or dnnl::augru_backward::primitive_desc::primitive_desc() as in the following example.
auto augru_pd = dnnl::augru_forward::primitive_desc( engine, aprop, direction, src_layer_desc, src_iter_desc, attention_desc, weights_layer_desc, weights_iter_desc, bias_desc, dst_layer_desc, dst_iter_desc);
Note that for all tensors with a dimension depending on the gate number, we implicitly require the order of these gates to be u, r, and o. The following equation gives the mathematical definition of these gates.
 
 
   where  are in
 are in  ,
,  are in
 are in  , and
, and  are stored in
 are stored in  .
.
Linear-Before-Reset AUGRU
A three-gate gated recurrent unit cell with linear layer applied before the reset gate, initialized with dnnl::lbr_augru_forward::primitive_desc::primitive_desc() or dnnl::lbr_augru_backward::primitive_desc::primitive_desc() as in the following example.
auto lbr_augru_pd = dnnl::lbr_augru_forward::primitive_desc( engine, aprop, direction, src_layer_desc, src_iter_desc, attention_desc, weights_layer_desc, weights_iter_desc, bias_desc, dst_layer_desc, dst_iter_desc);
The following equation describes the mathematical behavior of the Linear-Before-Reset AUGRU cell.
 
 
   Note that for all tensors with a dimension depending on the gate number, except the bias, we implicitly require the order of these gates to be u, r, and o. For the  tensor, we implicitly require the order of the gates to be u, r, o, and u .
 tensor, we implicitly require the order of the gates to be u, r, o, and u .
Considerations for Training
When using the RNN API for training, the forward pass should use the forward_training propagation kind, and a workspace should be passed to both the forward pass and the backward pass. Note that after executing the backward pass, the workspace is no more valid and should be populated once again by another forward pass.
The RNN primitive backward pass accumulates gradients to its weight outputs (namely  ,
,  ,
,  ,
,  ,
,  ). Hence, these tensors should be properly initialized to zero before their first use, and can be reused across calls to accumulate gradients if need be. This behavior can be altered by the RNN flag diff_weights_overwrite. If this flag is set weight gradients will be initialized by zeros by the RNN primitive.
). Hence, these tensors should be properly initialized to zero before their first use, and can be reused across calls to accumulate gradients if need be. This behavior can be altered by the RNN flag diff_weights_overwrite. If this flag is set weight gradients will be initialized by zeros by the RNN primitive.
Execution Arguments
When executed, the inputs and outputs should be mapped to an execution argument index as specified by the following table.
| Primitive input/output | Execution argument index | 
|---|---|
| 
 | DNNL_ARG_SRC_LAYER | 
| 
 | DNNL_ARG_SRC_LAYER_ATTENTION | 
| 
 | DNNL_ARG_SRC_ITER | 
| 
 | DNNL_ARG_SRC_ITER_C | 
| 
 | DNNL_ARG_WEIGHTS_LAYER | 
| 
 | DNNL_ARG_WEIGHTS_ITER | 
| 
 | DNNL_ARG_WEIGHTS_PEEPHOLE | 
| 
 | DNNL_ARG_WEIGHTS_PROJECTION | 
| 
 | DNNL_ARG_BIAS | 
| 
 | DNNL_ARG_DST_LAYER | 
| 
 | DNNL_ARG_DST_ITER | 
| 
 | DNNL_ARG_DST_ITER_C | 
| 
 | DNNL_WORKSPACE | 
| 
 | DNNL_ARG_DIFF_SRC_LAYER | 
| 
 | DNNL_ARG_DIFF_SRC_LAYER_ATTENTION | 
| 
 | DNNL_ARG_DIFF_SRC_ITER | 
| 
 | DNNL_ARG_DIFF_SRC_ITER_C | 
| 
 | DNNL_ARG_DIFF_WEIGHTS_LAYER | 
| 
 | DNNL_ARG_DIFF_WEIGHTS_ITER | 
| 
 | DNNL_ARG_DIFF_WEIGHTS_PEEPHOLE | 
| 
 | DNNL_ARG_DIFF_WEIGHTS_PROJECTION | 
| 
 | DNNL_ARG_DIFF_BIAS | 
| 
 | DNNL_ARG_DIFF_DST_LAYER | 
| 
 | DNNL_ARG_DIFF_DST_ITER | 
| 
 | DNNL_ARG_DIFF_DST_ITER_C | 
Implementation Details
Data Type Support
The following table lists the combination of data types supported by the RNN primitive for each input and output memory object.
| Propagation | Cell Function | Input data | Recurrent data (1) | Weights | Bias | Output Data | 
|---|---|---|---|---|---|---|
| Forward / Backward | All | f32 | f32 | f32 | f32 | f32 | 
| Forward / Backward (2) | All (3) | bf16 | bf16 | bf16 | f32 | bf16 | 
| Forward | All (3) | f16 | f16 | f16 | f16 | f16 | 
| Forward inference | Vanilla LSTM, LSTMP and GRU | u8 | u8 | s8 | f32 | u8, f32 | 
| Forward inference | Vanilla LSTM, LSTMP | s8 | s8 | s8 | f32 | s8, f32 | 
- With LSTM and Peephole LSTM cells, the cell state datatype is f32, except for the f16 configuration. 
- In backward propagation, all diff_* tensors are in f32. 
- Projection LSTM is not supported. 
Data Representation
In the oneDNN programming model, the RNN primitive is one of a few that support the placeholder memory format dnnl::memory::format_tag::any (shortened to any from now on) and can define data and weight memory objects format based on the primitive parameters.
The following table summarizes the data layouts supported by the RNN primitive.
| Propagation | Input/Output Data | Recurrent Data | Layer and Iteration Weights | Peephole Weights and Bias | Projection LSTM Weights | 
|---|---|---|---|---|---|
| Forward / Backward | |||||
| Forward | |||||
| Backward | 
While an RNN primitive can be created with memory formats specified explicitly, the performance is likely to be sub-optimal. When using any it is necessary to first create an RNN primitive descriptor and then query it for the actual data and weight memory objects formats.
Post-Ops and Attributes
Currently post-ops and attributes are only used by the int8 variants of LSTM and GRU. See the markdown RNN int8 inference example for more details on how to use and set these quantization parameters.
Implementation Limitations
- Refer to Data Types for limitations related to data types support. 
- Bias must always be present (that is, the corresponding memory descriptor argument cannot be zero memory descriptor when the RNN primitive descriptor is initialized). 
- CPU - oneDNN supports s8 as input data only on systems with Advanced Matrix Extension(AMX) support. 
- Projection LSTM for bf16 data type is not supported. 
- f16 data type is not supported. 
 
- GPU - No support for AUGRU. 
- No support for Peephole LSTM and Projection LSTM. 
- Int8 support is provided for LSTM only. 
- Bias and cell state of bf16 data type is not supported. 
- No support for diff weights overwrite. 
 
Example
LSTM RNN Primitive Example
This C++ API example demonstrates how to create and execute an LSTM RNN primitive in forward training propagation mode.
Key optimizations included in this example:
- Creation of optimized memory format from the primitive descriptor.