wiki:Benchmarking
Last modified 12 months ago Last modified on 05/07/13 14:50:43

Benchmarking with ELKI

Setup

For benchmarking, do NOT set -verbose as the verbose output will often come at cost in computation time, and verbosity varies a lot from algorithm to algorithm.

Version 0.5: With -time you can get timing information for the algorithm step. You might want to disable automatic evaluation using -evaluator NoAutomaticEvaluation and make sure you have not enabled verbose logging.

Version 0.4: to get algorithm timing information, you can use:

-enableDebug de.lmu.ifi.dbs.elki.workflow.AlgorithmStep=INFO

This enables verbose logging only for the wrapper class that invokes the main algorithms.

Versions up to 0.3: set the -time parameter to obtain timing information.

Fair benchmarking

ELKI much like any other system (in particular Java based systems) comes with a certain performance cost compared to highly optimized code. An optimized C implementation will beat any ELKI algorithm by an order of magnitude. Results based on entirely independent implementations are therefore usually misleading.

When comparing different algorithms you must therefore

  • ensure they are implemented in the same language and level of optimization
  • they use the same backing features (database layer etc.)
  • use the same level of abstraction/modularization (e.g. modular distance functions as opposed to hard-coded euclidean distance)
  • are properly setup within the framework (e.g. try different index structures, use the best each)

Many "reference" algorithms such as DBSCAN benefit immensely of index structures in low dimensional spaces and large data sets!

Therefore, we strongly encourage you to implement all comparison methods in ELKI, too. This is essentially the only way to ensure fair comparisons. If you have found a bottleneck, send us a Patch to improve it for everyone!

Of course we try to make ELKI as fast as possible (but we will not sacrifice extendability or code readability for this).

We also strongly encourage reviewers to pay attention to fair benchmarks, and outright reject papers when these are not done in a sound way. Unfortunately, many articles do lack when it comes to fair experiments.

Performance of ELKI versions

If you are not convinced, here is an empiric example of how huge implementation differences are.
For this, we will compare ELKI with ELKI, just using different versions.

For this benchmark we use the ALOI image data set, which has 110.250 objects, with just 8 dimensions (so the algorithms, distance functions and index structures are supposed okay). We tested four workloads, each with and without an R*-Tree index. For this experiment, all R*-Trees were incrementally loaded (bulk loading is faster, but was not yet available in all versions)

The test system is a 2.67 GHz Intel Xenon X5650 (single-threaded, memory limit 32 GB)

Task ELKI 0.1 ELKI 0.2 ELKI 0.3 ELKI 0.4 ELKI 0.5.0 ELKI 0.5.5
Load only n/a 13.0 15.4 3.2 4.0 4.0
Load into R*-Tree n/a 21.9 22.0 9.5 8.6 8.6
10NN queries each n/a 2077.2 1579.2 657.5 656.2 584.9
10NN with R*-Tree n/a 290.9 126.0 111.0 72.2 71.5
Single-Link clustering n/a 5099.3 4662.1 1353.7 1383.3 875.4
DBSCAN clustering 3140.8 2368.4 1651.6 669.1 610.6 603.2
DBSCAN with R*-Tree n/a 303.4 116.7 111.4 87.7 91.3 (*)
OPTICS clustering 3133.1 2395.3 1884.8 704.5 586.4 546.4
OPTICS with R*-Tree n/a 448.6 235.4 179.9 142.3 141.5
Local Outlier Factor n/a 2189.0 1643.6 710.8 570.9 605.2 (*)
LOF with R*-Tree n/a 321.4 146.4 135.8 93.0 90.8
k-means lloyd k=100 249.7 125.6 81.2 77.4 60.2 57.3

As you can see, ELKI 0.3 had a slightly more expensive parser, but was already slightly faster than ELKI 0.2. In ELKI 0.4 we see major performance gains, which we attribute to removing Java's auto-boxing and unboxing in various places. There are some regressions marked with (*) that are under investigation, but may be benchmark fluctuations and probably are not significant.

However, we cannot remove all boxing/unboxing in Java without losing much of the generality of ELKI. We expect significant performance gains to be possible with a low-level C implementation. (Which are, however, not of scientific interest to us.) Therefore, benchmarking ELKI against implementations in C (or R or Matlab, or any other language that has high-performant mathematical libraries embedded) is not sound!

ELKI in comparison to other software

As mentioned above, we advocate not comparing algorithms from different frameworks with each other. Implementation details can make a huge difference. For example k-means from the R "flexclus" package seems to be about half as fast as the R native kmeans. In the most extreme example of this benchmark, the same algorithm is 280x faster in one implementation than the other (LOF in ELKI vs. LOF in "Data mining with R"). So from a scientific point of view, even a performance difference of three orders of magnitude can be explained with implementation differences.

2.67 GHz Intel Xenon X5650 (single-threaded, memory limit 32 GB)

Task ELKI 0.5 ELKI 0.5.5 WEKA 3.7.5 R
Load only 4.00 4.01 3.6 (filter) 5.76
Load into R*-Tree 8.61 4.57 n/a n/a
10NN queries each 656.22 584.86 n/a 73047.88 (naive R code)
10NN with R*-Tree 108.95 29.31 n/a n/a
Single-Link clustering 1383.28 875.34 (too large) (too large)
DBSCAN clustering 610.63 603.22 5090.19 4339.88 (fpc)
DBSCAN with R*-Tree 122.54 50.78 n/a n/a
OPTICS clustering 586.22 546.37 4667.42 (incomplete) n/a
OPTICS with R*-Tree 196.06 103.50 n/a n/a
EM clustering 698.81 766.58 835.56 (too large)
Local Outlier Factor 570.94 605.15 2611.60 13402.38 (DMwR lofactor)
LOF with R*-Tree 133.54 47.24 n/a n/a
k-means lloyd k=100 60.19 57.27 372.49 (with "-fast") 57.13 (internal), 117.2 (flexclus)

Single-Link clustering could not be run in Weka or R. Weka crashes with an IllegalArgumentException, due to an integer overflow. R detects the overflow, and fails with "vector size specified is too large". This is not surprising, as both use an implementation that needs quadratic memory and cubic time. The implementation in ELKI is the SLINK algorithm, which needs linear memory and quadratic time only, and is thus expected to be significantly faster on a large data set such as this. But hierarchical clustering is not sensible for large data sets anyway. R also fails with the same error on EM, and OPTICS in Weka is an incomplete implementation only and does not extract clusters from the plot.

Weka DBSCAN and OPTICS runtime has decreased 8x with extension version 1.0.3, by removing unnecessary safety checks. ELKI's DBSCAN has become 5x faster across versions. Do not do runtime benchmarking on code that you did not profile and optimize to the same extent - the result will be meaningless!