@Reference(prefix="closely related", authors="M. Patwary, D. Palsetia, A. Agrawal, W. K. Liao, F. Manne, A. Choudhary", title="A new scalable parallel DBSCAN algorithm using the disjoint-set data structure", booktitle="IEEE Int. Conf. for High Performance Computing, Networking, Storage and Analysis (SC)", url="https://doi.org/10.1109/SC.2012.9", bibkey="DBLP:conf/sc/PatwaryPALMC12") public class ParallelGeneralizedDBSCAN extends AbstractAlgorithm<Clustering<Model>> implements ClusteringAlgorithm<Clustering<Model>>
This is the archetype of a non-linear shared-memory DBSCAN that does not sequentially expand a cluster, but processes points in arbitrary order and merges clusters when neighboring core points occur.
Because of synchronization when labeling points, the speedup will only be sublinear in the number of cores. But in particular without an index and on large data, the majority of the work is finding the neighbors; not in labeling the points.
Reference:
Please cite the latest ELKI version.
Related is the following publication, whose "disjoint set data structure" appears to be a similar union-find approach to ours, and whose DSDBSCAN appears rather similar. The main benefit of our approach is that we avoid using the union-find data structure for every object, but only use it for merging clusters.
M. Patwary, D. Palsetia, A. Agrawal, W. K. Liao, F. Manne, A. Choudhary
A new scalable parallel DBSCAN algorithm using the disjoint-set data
structure
In IEEE Int. Conf. for High Performance Computing, Networking, Storage and
Analysis (SC)
Modifier and Type | Class and Description |
---|---|
static class |
ParallelGeneralizedDBSCAN.Instance<T>
Instance for a particular data set.
|
static class |
ParallelGeneralizedDBSCAN.Parameterizer
Parameterization class
|
Modifier and Type | Field and Description |
---|---|
protected boolean |
coremodel
Track which objects are "core" objects.
|
protected CorePredicate<?> |
corepred
The core predicate factory.
|
private static Logging |
LOG
Get a logger for this algorithm
|
protected NeighborPredicate<?> |
npred
The neighborhood predicate factory.
|
ALGORITHM_ID
Constructor and Description |
---|
ParallelGeneralizedDBSCAN(NeighborPredicate<?> npred,
CorePredicate<?> corepred,
boolean coremodel)
Constructor for parameterized algorithm.
|
Modifier and Type | Method and Description |
---|---|
TypeInformation[] |
getInputTypeRestriction()
Get the input type restriction used for negotiating the data query.
|
protected Logging |
getLogger()
Get the (STATIC) logger for this class.
|
Clustering<Model> |
run(Database database)
Runs the algorithm.
|
private static final Logging LOG
protected NeighborPredicate<?> npred
protected CorePredicate<?> corepred
protected boolean coremodel
public ParallelGeneralizedDBSCAN(NeighborPredicate<?> npred, CorePredicate<?> corepred, boolean coremodel)
npred
- Neighbor predicate.corepred
- Core point predicate.coremodel
- Keep track of core points.public Clustering<Model> run(Database database)
Algorithm
run
in interface Algorithm
run
in interface ClusteringAlgorithm<Clustering<Model>>
run
in class AbstractAlgorithm<Clustering<Model>>
database
- the database to run the algorithm onpublic TypeInformation[] getInputTypeRestriction()
AbstractAlgorithm
getInputTypeRestriction
in interface Algorithm
getInputTypeRestriction
in class AbstractAlgorithm<Clustering<Model>>
protected Logging getLogger()
AbstractAlgorithm
getLogger
in class AbstractAlgorithm<Clustering<Model>>
Copyright © 2019 ELKI Development Team. License information.