Developer Guide and Reference

  • 2021.6
  • 04/11/2022
  • Public Content

Density-Based Spatial Clustering of Applications with Noise

Density-based spatial clustering of applications with noise (DBSCAN) is a data clustering algorithm proposed in [Ester96]. It is a density-based clustering non-parametric algorithm: given a set of observations in some space, it groups together observations that are closely packed together (observations with many nearby neighbors), marking as outliers observations that lie alone in low-density regions (whose nearest neighbors are too far away).


Given the set LaTex Math image. of LaTex Math image. LaTex Math image.-dimensional feature vectors (further referred as observations), a positive floating-point number
and a positive integer
, the problem is to get clustering assignments for each input observation, based on the definitions below [Ester96]:
core observation
An observation LaTex Math image. is called core observation if at least
input observations (including LaTex Math image.) are within distance
from observation LaTex Math image.;
directly reachable
An observation LaTex Math image. is directly reachable from LaTex Math image. if LaTex Math image. is within distance
from core observation LaTex Math image.. Observations are only said to be directly reachable from core observations.
An observation LaTex Math image. is reachable from an observation LaTex Math image. if there is a path LaTex Math image. with LaTex Math image. and LaTex Math image., where each LaTex Math image. is directly reachable from LaTex Math image.. This implies that all observations on the path must be core observations, with the possible exception of LaTex Math image..
noise observation
Noise observations are observations that are not reachable from any other observation.
Two observations LaTex Math image. and LaTex Math image. are considered to be in the same cluster if there is a core observation LaTex Math image., and LaTex Math image. and LaTex Math image. are both reachable from LaTex Math image..
Each cluster gets a unique identifier, an integer number from LaTex Math image. to LaTex Math image.. Each observation is assigned an identifier of the cluster it belongs to, or LaTex Math image. if the observation considered to be a noise observation.


The following computation modes are available:


C++ (CPU)
Batch Processing:
Distributed Processing:
There is no support for Java on GPU.
Batch Processing:
Distributed Processing:
Python* with DPC++ support
Batch Processing:
Batch Processing:
Distributed Processing:

Product and Performance Information


Performance varies by use, configuration and other factors. Learn more at