Density-Based Spatial Clustering of Applications with Noise
Density-based spatial clustering of applications with noise (DBSCAN) is a data clustering algorithm proposed in [Ester96].
It is a density-based clustering non-parametric algorithm: given a set of observations in some space,
it groups together observations that are closely packed together (observations with many nearby neighbors),
marking as outliers observations that lie alone in low-density regions (whose nearest neighbors are too far away).
Details
Given the set
of
-dimensional feature vectors (further referred as observations),
a positive floating-point number
epsilon
and a positive integer minObservations
,
the problem is to get clustering assignments for each input observation, based on the definitions below [Ester96]:- core observation
- An observation
is called core observation if at least
minObservationsinput observations (including) are within distance
epsilonfrom observation;
- directly reachable
- An observation
is directly reachable from
if
is within distance
epsilonfrom core observation. Observations are only said to be directly reachable from core observations.
- reachable
- An observation
is reachable from an observation
if there is a path
with
and
, where each
is directly reachable from
. This implies that all observations on the path must be core observations, with the possible exception of
.
- noise observation
- Noise observations are observations that are not reachable from any other observation.
- cluster
- Two observations
and
are considered to be in the same cluster if there is a core observation
, and
and
are both reachable from
.
Each cluster gets a unique identifier, an integer number from
to
.
Each observation is assigned an identifier of the cluster it belongs to,
or
if the observation considered to be a noise observation.
Examples
C++ (CPU)
Batch Processing:
Distributed Processing:
Java*
There is no support for Java on GPU.
Batch Processing:
Distributed Processing:
Python* with DPC++ support
Batch Processing:
Python*
Batch Processing:
Distributed Processing: