Developer Guide and Reference

  • 2021.6
  • 04/11/2022
  • Public Content
Contents

Distributed Processing

The distributed processing mode assumes that the data set R is split in
nblocks
blocks across computation nodes.

Parameters

In the distributed processing mode, initialization of item factors for the implicit ALS algorithm has the following parameters:
Parameters for Implicit Alternating Least Squares Initialization (Distributed Processing)
Parameter
Default Value
Description
algorithmFPType
float
The floating-point type that the algorithm uses for intermediate computations. Can be
float
or
double
.
method
fastCSR
Performance-oriented computation method for CSR numeric tables, the only method supported by the algorithm.
nFactors
LaTex Math image.
The total number of factors.
fullNUsers
LaTex Math image.
The total number of users LaTex Math image..
partition
Not applicable
A numeric table of size either LaTex Math image. that provides the number of input data parts or LaTex Math image., where
nblocks
is the number of input data parts, and the LaTex Math image.-th element contains the offset of the transposed LaTex Math image.-th data part to be computed by the initialization algorithm.
engine
SharePtr< engines:: mt19937:: Batch>()
Pointer to the random number generator engine that is used internally at the initialization step.
To initialize the implicit ALS algorithm in the distributed processing mode, use the one-step process illustrated by the following diagram for LaTex Math image.:
Implicit Alternating Least Squares Initialization: General Schema of Distributed Processing

Step 1 - on Local Nodes

Implicit Alternating Least Squares Initialization: Distributed Processing, Step 1 - on Local Nodes
Input
In the distributed processing mode, initialization of item factors for the implicit ALS algorithm accepts the input described below. Pass the
Input ID
as a parameter to the methods that provide input for your algorithm. For more details, see Algorithms.
Input for Implicit Alternating Least Squares Initialization (Distributed Processing, Step 1)
Input ID
Input
dataColumnSlice
An LaTex Math image. numeric table with the part of the input data set. Each node holds LaTex Math image. rows of the full transposed input data set LaTex Math image..
The input should be an object of
CSRNumericTable
class.
Output
In the distributed processing mode, initialization of item factors for the implicit ALS algorithm calculates the results described below. Pass the
Partial Result ID
as a parameter to the methods that access the results of your algorithm. Partial results that correspond to the
outputOfInitForComputeStep3
and
offsets
Partial Result IDs should be transferred to Step 3 of the distributed ALS training algorithm.
Output of Initialization for Computing Step 3 (
outputOfInitForComputeStep3
) is a key-value data collection that maps components of the partial model on the LaTex Math image.-th node to all local nodes. Keys in this data collection are indices of the nodes and the value that corresponds to each key LaTex Math image. is a numeric table that contains indices of the factors of the items to be transferred to the LaTex Math image.-th node on Step 3 of the distributed ALS training algorithm.
User Offsets (
offsets
) is a key-value data collection, where the keys are indices of the nodes and the value that correspond to the key LaTex Math image. is a numeric table of size LaTex Math image. that contains the value of the starting offset of the user factors stored on the LaTex Math image.-th node.
For more details, see Algorithms.
Output for Implicit Alternating Least Squares Initialization (Distributed Processing, Step 1)
Partial Result ID
Result
partialModel
The model with initialized item factors. The result can only be an object of the
PartialModel
class.
outputOfInitForComputeStep3
A key-value data collection that maps components of the partial model to the local nodes.
offsets
A key-value data collection of size
nblocks
that holds the starting offsets of the factor indices on each node.
outputOfStep1ForStep2
A key-value data collection of size
nblocks
that contains the parts of the input numeric table: LaTex Math image. -th element of this collection is a numeric table of size LaTex Math image., where LaTex Math image. and the values LaTex Math image. are defined by the
partition
parameter.

Step 2 - on Local Nodes

Implicit Alternating Least Squares Initialization: Distributed Processing, Step 2 - on Local Nodes
Input
This step uses the results of the previous step.
Input for Implicit Alternating Least Squares Initialization (Distributed Processing, Step 3)
Input ID
Input
inputOfStep2FromStep1
A key-value data collection of size nblocks that contains the parts of the input data set: LaTex Math image. -th element of this collection is a numeric table of size LaTex Math image.. Each numeric table in the collection should be an object of CSRNumericTable class.
Output
In this step, implicit ALS initialization calculates the partial results described below. Pass the
Partial Result ID
as a parameter to the methods that access the results of your algorithm. Partial results that correspond to the
outputOfInitForComputeStep3
and
offsets
Partial Result IDs should be transferred to Step 3 of the distributed ALS training algorithm.
Output of Initialization for Computing Step 3 (
outputOfInitForComputeStep3
) is a key-value data collection that maps components of the partial model on the LaTex Math image.-th node to all local nodes. Keys in this data collection are indices of the nodes and the value that corresponds to each key i is a numeric table that contains indices of the user factors to be transferred to the i-th node on Step 3 of the distributed ALS training algorithm.
Item Offsets (
offsets
) is a key-value data collection, where the keys are indices of the nodes and the value that correspond to the key LaTex Math image. is a numeric table of size LaTex Math image. that contains the value of the starting offset of the item factors stored on the LaTex Math image.-th node.
For more details, see Algorithms.
Output for Implicit Alternating Least Squares Initialization (Distributed Processing, Step 2)
Partial Result ID
Result
dataRowSlice
An LaTex Math image. numeric table with the mining data. LaTex Math image.-th node gets LaTex Math image. rows of the full input data set LaTex Math image..
outputOfInitForComputeStep3
A key-value data collection that maps components of the partial model to the local nodes.
offsets
A key-value data collection of size
nblocks
that holds the starting offsets of the factor indices on each node.

Product and Performance Information

1

Performance varies by use, configuration and other factors. Learn more at www.Intel.com/PerformanceIndex.