Cluster Analysis in ELKI

Version information: Updated for ELKI 0.7.0

Overview

ELKI contains a wide variety of clustering algorithms. These can be roughly divided into the following families:

Recommendations

Hierarchical clustering

For single-linkage, SLINK is the fastest algorithm (Quadratic runtime with small constant factors, linear memory).

For complete-linkage, CLINK is fast but appears to give worse results than the others.

For other linkages, the Anderberg is usually the best choice we currently offer.

K-means

The classic Lloyd k-means is not worth trying. The Hamerly, Elkan, and MacQueen (0.7.1: Compare and Sort) algorithms appear to always outperform the textbook algorithm.

X-means can be useful to choose the parameter k.

DBSCAN and OPTICS

Always use an index with DBSCAN. Recommeded is to use the SimpleCoverTree index, which works for most data sets and requires no other parameters except the distance function.

MinPts should be chosen larger than the data set dimensionality (e.g. 2*dim). For 2-dimensional data, the absolute minimum to use is 4, but values such as 10 or 20 may be appropriate for noisy data.

If you do now have prior knowledge about a good value for Epsilon, you can try the following: