Intel® oneAPI Data Analytics Library Developer Guide and Reference
A newer version of this document is available. Customers should click here to go to the newest version.
Batch Processing
Input
Centroid initialization for K-Means clustering accepts the input described below. Pass the Input ID as a parameter to the methods that provide input for your algorithm.
Input ID  |  
        Input  |  
       
|---|---|
data  |  
        Pointer to the   |  
       
Parameters
The following table lists parameters of centroid initialization for K-Means clustering, which depend on the initialization method parameter method.
Parameter  |  
        method  |  
        Default Value  |  
        Description  |  
       
|---|---|---|---|
algorithmFPType  |  
        any  |  
        float  |  
        The floating-point type that the algorithm uses for intermediate computations. Can be float or double.  |  
       
method  |  
        Not applicable  |  
        defaultDense  |  
        Available initialization methods for K-Means clustering: For CPU: 
 For GPU: 
  |  
       
nClusters  |  
        any  |  
        Not applicable  |  
        The number of clusters. Required.  |  
       
nTrials  |  
         
         
  |  
        1  |  
        The number of trails to generate all clusters but the first initial cluster. For details, see [Arthur2007], section 5  |  
       
oversamplingFactor  |  
         
         
  |  
        0.5  |  
        A fraction of nClusters in each of nRounds of parallel K-Means++. L=nClusters*oversamplingFactor points are sampled in a round. For details, see [Bahmani2012], section 3.3.  |  
       
nRounds  |  
         
         
  |  
        5  |  
        The number of rounds for parallel K-Means++. (L*nRounds) must be greater than nClusters. For details, see [Bahmani2012], section 3.3.  |  
       
engine  |  
        any  |  
        SharePtr< engines:: mt19937:: Batch>()  |  
        Pointer to the random number generator engine that is used internally for random numbers generation.  |  
       
Output
Centroid initialization for K-Means clustering calculates the result described below. Pass the Result ID as a parameter to the methods that access the results of your algorithm.
Result ID  |  
        Result  |  
       
|---|---|
centroids  |  
        Pointer to the   |  
       
 numeric table with the data to be clustered.
 numeric table with the cluster centroids.