Visible to Intel only — GUID: GUID-27120A63-4B1A-4F42-98C5-9D714666D214
Visible to Intel only — GUID: GUID-27120A63-4B1A-4F42-98C5-9D714666D214
Density-Based Spatial Clustering of Applications with Noise
Density-based spatial clustering of applications with noise (DBSCAN) is a data clustering algorithm proposed in [Ester96]. It is a density-based clustering non-parametric algorithm: given a set of observations in some space, it groups together observations that are closely packed together (observations with many nearby neighbors), marking as outliers observations that lie alone in low-density regions (whose nearest neighbors are too far away).
Details
Given the set of np-dimensional feature vectors (further referred as observations), a positive floating-point number epsilon and a positive integer minObservations, the problem is to get clustering assignments for each input observation, based on the definitions below [Ester96]:
- core observation
-
An observation x is called core observation if at least minObservations input observations (including x) are within distance epsilon from observation x;
- directly reachable
-
An observation y is directly reachable from x if y is within distance epsilon from core observationx. Observations are only said to be directly reachable from core observations.
- reachable
-
An observation y is reachable from an observation x if there is a path with and , where each is directly reachable from . This implies that all observations on the path must be core observations, with the possible exception of y.
- noise observation
-
Noise observations are observations that are not reachable from any other observation.
- cluster
-
Two observations x and y are considered to be in the same cluster if there is a core observationz, and x and y are both reachable from z.
Each cluster gets a unique identifier, an integer number from 0 to . Each observation is assigned an identifier of the cluster it belongs to, or -1 if the observation considered to be a noise observation.
Examples
C++ (CPU)
Batch Processing:
Distributed Processing:
Python*
Batch Processing:
Distributed Processing: