Distributed Processing

Intel® oneAPI Data Analytics Library Developer Guide and Reference

Download PDF

ID 772611

Date 11/07/2023

Version

Public

A newer version of this document is available. Customers should click here to go to the newest version.

Distributed Processing

The distributed processing mode assumes that the data set R is split in nblocks blocks across computation nodes.

Parameters

In the distributed processing mode, initialization of item factors for the implicit ALS algorithm has the following parameters:

Parameters for Implicit Alternating Least Squares Initialization (Distributed Processing)
Parameter	Default Value	Description
`algorithmFPType`	`float`	The floating-point type that the algorithm uses for intermediate computations. Can be `float` or `double`.
`method`	`fastCSR`	Performance-oriented computation method for CSR numeric tables, the only method supported by the algorithm.
`nFactors`	10	The total number of factors.
`fullNUsers`	0	The total number of users m.
`partition`	Not applicable	A numeric table of size either that provides the number of input data parts or , where `nblocks` is the number of input data parts, and the i-th element contains the offset of the transposed i-th data part to be computed by the initialization algorithm.
`engine`	SharePtr< engines:: mt19937:: Batch>()	Pointer to the random number generator engine that is used internally at the initialization step.

To initialize the implicit ALS algorithm in the distributed processing mode, use the one-step process illustrated by the following diagram for :

Implicit Alternating Least Squares Initialization: General Schema of Distributed Processing

Step 1 - on Local Nodes

Implicit Alternating Least Squares Initialization: Distributed Processing, Step 1 - on Local Nodes

Input

In the distributed processing mode, initialization of item factors for the implicit ALS algorithm accepts the input described below. Pass the Input ID as a parameter to the methods that provide input for your algorithm. For more details, see Algorithms.

Input for Implicit Alternating Least Squares Initialization (Distributed Processing, Step 1)
Input ID	Input
`dataColumnSlice`	An numeric table with the part of the input data set. Each node holds rows of the full transposed input data set . The input should be an object of `CSRNumericTable` class.

Output

In the distributed processing mode, initialization of item factors for the implicit ALS algorithm calculates the results described below. Pass the Partial Result ID as a parameter to the methods that access the results of your algorithm. Partial results that correspond to the outputOfInitForComputeStep3 and offsets Partial Result IDs should be transferred to Step 3 of the distributed ALS training algorithm.

Output of Initialization for Computing Step 3 (outputOfInitForComputeStep3) is a key-value data collection that maps components of the partial model on the i-th node to all local nodes. Keys in this data collection are indices of the nodes and the value that corresponds to each key i is a numeric table that contains indices of the factors of the items to be transferred to the i-th node on Step 3 of the distributed ALS training algorithm.

User Offsets (offsets) is a key-value data collection, where the keys are indices of the nodes and the value that correspond to the key i is a numeric table of size that contains the value of the starting offset of the user factors stored on the i-th node.

For more details, see Algorithms.

Output for Implicit Alternating Least Squares Initialization (Distributed Processing, Step 1)
Partial Result ID	Result
`partialModel`	The model with initialized item factors. The result can only be an object of the `PartialModel` class.
`outputOfInitForComputeStep3`	A key-value data collection that maps components of the partial model to the local nodes.
`offsets`	A key-value data collection of size `nblocks` that holds the starting offsets of the factor indices on each node.
`outputOfStep1ForStep2`	A key-value data collection of size `nblocks` that contains the parts of the input numeric table: j -th element of this collection is a numeric table of size , where and the values are defined by the `partition` parameter.

Step 2 - on Local Nodes

Implicit Alternating Least Squares Initialization: Distributed Processing, Step 2 - on Local Nodes

Input

This step uses the results of the previous step.

Input for Implicit Alternating Least Squares Initialization (Distributed Processing, Step 3)
Input ID	Input
`inputOfStep2FromStep1`	A key-value data collection of size nblocks that contains the parts of the input data set: i -th element of this collection is a numeric table of size . Each numeric table in the collection should be an object of CSRNumericTable class.

Output

In this step, implicit ALS initialization calculates the partial results described below. Pass the Partial Result ID as a parameter to the methods that access the results of your algorithm. Partial results that correspond to the outputOfInitForComputeStep3 and offsets Partial Result IDs should be transferred to Step 3 of the distributed ALS training algorithm.

Output of Initialization for Computing Step 3 (outputOfInitForComputeStep3) is a key-value data collection that maps components of the partial model on the i-th node to all local nodes. Keys in this data collection are indices of the nodes and the value that corresponds to each key i is a numeric table that contains indices of the user factors to be transferred to the i-th node on Step 3 of the distributed ALS training algorithm.

Item Offsets (offsets) is a key-value data collection, where the keys are indices of the nodes and the value that correspond to the key i is a numeric table of size that contains the value of the starting offset of the item factors stored on the i-th node.

For more details, see Algorithms.

Output for Implicit Alternating Least Squares Initialization (Distributed Processing, Step 2)
Partial Result ID	Result
`dataRowSlice`	An numeric table with the mining data. j-th node gets rows of the full input data set R.
`outputOfInitForComputeStep3`	A key-value data collection that maps components of the partial model to the local nodes.
`offsets`	A key-value data collection of size `nblocks` that holds the starting offsets of the factor indices on each node.

Select Your Language

Using Intel.com Search

Quick Links

Recent Searches

Advanced Search

Only search in

Intel® oneAPI Data Analytics Library Developer Guide and Reference

Distributed Processing

Parameters

Step 1 - on Local Nodes

Step 2 - on Local Nodes