Intel® oneAPI Data Analytics Library Developer Guide and Reference
A newer version of this document is available. Customers should click here to go to the newest version.
Distributed Processing
The distributed processing mode assumes that the data set R is split in nblocks blocks across computation nodes.
Parameters
In the distributed processing mode, initialization of item factors for the implicit ALS algorithm has the following parameters:
Parameter  |  
        Default Value  |  
        Description  |  
       
|---|---|---|
algorithmFPType  |  
        float  |  
        The floating-point type that the algorithm uses for intermediate computations. Can be float or double.  |  
       
method  |  
        fastCSR  |  
        Performance-oriented computation method for CSR numeric tables, the only method supported by the algorithm.  |  
       
nFactors  |  
        10  |  
        The total number of factors.  |  
       
fullNUsers  |  
        0  |  
        The total number of users m.  |  
       
partition  |  
        Not applicable  |  
        A numeric table of size either   |  
       
engine  |  
        SharePtr< engines:: mt19937:: Batch>()  |  
        Pointer to the random number generator engine that is used internally at the initialization step.  |  
       
To initialize the implicit ALS algorithm in the distributed processing mode, use the one-step process illustrated by the following diagram for 
:
 
   Step 1 - on Local Nodes
 
   Input
In the distributed processing mode, initialization of item factors for the implicit ALS algorithm accepts the input described below. Pass the Input ID as a parameter to the methods that provide input for your algorithm. For more details, see Algorithms.
Input ID  |  
        Input  |  
       
|---|---|
dataColumnSlice  |  
        An  The input should be an object of CSRNumericTable class.  |  
       
Output
In the distributed processing mode, initialization of item factors for the implicit ALS algorithm calculates the results described below. Pass the Partial Result ID as a parameter to the methods that access the results of your algorithm. Partial results that correspond to the outputOfInitForComputeStep3 and offsets Partial Result IDs should be transferred to Step 3 of the distributed ALS training algorithm.
Output of Initialization for Computing Step 3 (outputOfInitForComputeStep3) is a key-value data collection that maps components of the partial model on the i-th node to all local nodes. Keys in this data collection are indices of the nodes and the value that corresponds to each key i is a numeric table that contains indices of the factors of the items to be transferred to the i-th node on Step 3 of the distributed ALS training algorithm.
User Offsets (offsets) is a key-value data collection, where the keys are indices of the nodes and the value that correspond to the key i is a numeric table of size 
 that contains the value of the starting offset of the user factors stored on the i-th node.
For more details, see Algorithms.
Partial Result ID  |  
        Result  |  
       
|---|---|
partialModel  |  
        The model with initialized item factors. The result can only be an object of the PartialModel class.  |  
       
outputOfInitForComputeStep3  |  
        A key-value data collection that maps components of the partial model to the local nodes.  |  
       
offsets  |  
        A key-value data collection of size nblocks that holds the starting offsets of the factor indices on each node.  |  
       
outputOfStep1ForStep2  |  
        A key-value data collection of size nblocks that contains the parts of the input numeric table: j -th element of this collection is a numeric table of size   |  
       
Step 2 - on Local Nodes
 
   Input
This step uses the results of the previous step.
Input ID  |  
        Input  |  
       
|---|---|
inputOfStep2FromStep1  |  
        A key-value data collection of size nblocks that contains the parts of the input data set: i -th element of this collection is a numeric table of size   |  
       
Output
In this step, implicit ALS initialization calculates the partial results described below. Pass the Partial Result ID as a parameter to the methods that access the results of your algorithm. Partial results that correspond to the outputOfInitForComputeStep3 and offsets Partial Result IDs should be transferred to Step 3 of the distributed ALS training algorithm.
Output of Initialization for Computing Step 3 (outputOfInitForComputeStep3) is a key-value data collection that maps components of the partial model on the i-th node to all local nodes. Keys in this data collection are indices of the nodes and the value that corresponds to each key i is a numeric table that contains indices of the user factors to be transferred to the i-th node on Step 3 of the distributed ALS training algorithm.
Item Offsets (offsets) is a key-value data collection, where the keys are indices of the nodes and the value that correspond to the key i is a numeric table of size 
 that contains the value of the starting offset of the item factors stored on the i-th node.
For more details, see Algorithms.
Partial Result ID  |  
        Result  |  
       
|---|---|
dataRowSlice  |  
        An   |  
       
outputOfInitForComputeStep3  |  
        A key-value data collection that maps components of the partial model to the local nodes.  |  
       
offsets  |  
        A key-value data collection of size nblocks that holds the starting offsets of the factor indices on each node.  |  
       
, where nblocks is the number of input data parts, and the i-th element contains the offset of the transposed i-th data part to be computed by the initialization algorithm.
 numeric table with the part of the input data set. Each node holds 
 rows of the full transposed input data set 
.
, where 
 and the values 
 are defined by the partition parameter.
. Each numeric table in the collection should be an object of CSRNumericTable class.
 numeric table with the mining data. j-th node gets