GUI invocation

ELKI comes with a simple GUI that helps with parameterization by offering input assistance.

Since release 0.3, the GUI is the default operation when launching the .jar file:

java -jar mypath/elki.jar

Some examples for completely parameterized calls of ELKI

Here, we provide just some examples of usage of ELKI for some algorithms. Hopefully, from here you can easily extend to other algorithms and data sets. Throughout all examples, we assume you have the executable jar-archive elki.jar in some directory locally reachable from your console as mypath, and downloaded the example data file from (http://www.dbs.ifi.lmu.de/research/KDD/ELKI/datasets/example/exampledata.txt) to a location reachable from your console as mydata/exampledata.txt.

Example: DBSCAN

Basic Call:

java -cp mypath/elki.jar de.lmu.ifi.dbs.elki.application.KDDCLIApplication -algorithm clustering.DBSCAN -dbc.in mydata/exampledata.txt -dbscan.epsilon 20 -dbscan.minpts 10

This requests the algorithm DBSCAN to cluster the data set using DBSCAN parameters epsilon=20 and minpts=10. The clustering result is just printed to the console.

Call with specified output file/directory:

java -cp mypath/elki.jar de.lmu.ifi.dbs.elki.application.KDDCLIApplication -algorithm clustering.DBSCAN -dbc.in mydata/exampledata.txt -dbscan.epsilon 20 -dbscan.minpts 10 -out myresults/DBSCANeps20min10

Same as before but, this time, a directory for collecting the output is explicitly specified. This results in one file per cluster as found by DBSCAN within the specified directory myresults/DBSCANeps20min10. Each file starts with providing metadata information and information concerning the used parameters before listing the data points contained in the cluster. For example, in this case, the file for cluster 0 starts like:

###############################################################
# Settings:
# de.lmu.ifi.dbs.elki.workflow.InputStep
# -db StaticArrayDatabase
#·
# de.lmu.ifi.dbs.elki.database.StaticArrayDatabase
# -dbc FileBasedDatabaseConnection
#·
# de.lmu.ifi.dbs.elki.datasource.FileBasedDatabaseConnection
# -dbc.in mydata/exampledata.txt
# -dbc.parser DoubleVectorLabelParser
#·
# de.lmu.ifi.dbs.elki.datasource.parser.DoubleVectorLabelParser
# -parser.colsep \s+
# -parser.quote "
# -parser.labelIndices [unset]
#·
# de.lmu.ifi.dbs.elki.datasource.FileBasedDatabaseConnection
# -dbc.filter [unset]
#·
# de.lmu.ifi.dbs.elki.database.StaticArrayDatabase
# -db.index [unset]
#·
# de.lmu.ifi.dbs.elki.workflow.AlgorithmStep
# -algorithm clustering.DBSCAN
#·
# de.lmu.ifi.dbs.elki.algorithm.clustering.DBSCAN
# -algorithm.distancefunction EuclideanDistanceFunction
# -dbscan.epsilon 20.0
# -dbscan.minpts 10
#·
# de.lmu.ifi.dbs.elki.workflow.EvaluationStep
# -evaluator [unset]
###############################################################
# Cluster: Cluster 0

Most of the parameters shown here are set implicitly with default values or not used ([unset] or false).

To get a list of additional parameters, add -help to the command line. Here you will also options not affecting the algorithm result such as -verbose which often gives progress information.

Unused was also the possibility of normalizing the data. Normalization is available as a filter for the input step, using the -dbc.filter option and is done during loading the data set. As option value, a comma separated list of filter classes is expected. ELKI provides for example the AttributeWiseMinMaxNormalization as a possibility. Other normalization procedures could easily be provided by any user by implementing the interface de.lmu.ifi.dbs.elki.datasource.filter.ObjectFilter. Note that the resulting files will contain the normalized data vectors, since ELKI by default doesn't keep a copy of the denormalized data.

Example call requesting time and verbose messages and using a normalization:

java -cp mypath/elki.jar de.lmu.ifi.dbs.elki.application.KDDCLIApplication -algorithm clustering.DBSCAN -dbc.in mydata/exampledata.txt -dbc.filter AttributeWiseMinMaxNormalization -dbscan.epsilon 0.02 -dbscan.minpts 10 -out myresults/DBSCANeps20min10 -evaluator paircounting.EvaluatePairCountingFMeasure -verbose -enableDebug de.lmu.ifi.dbs.elki.workflow.AlgorithmStep

Note that the value for dbscan.epsilon is decreased considerably to suit the normalized data (the AttributeWiseMinMaxNormalization normalizes all attribute values to the range [0:1]).
We also added an evaluation module for the clustering, which will output the pair counting F-Measure to the file pair-fmeasure.txt

For notes about fair benchmarking with ELKI, please read the comments on Benchmarking in the Wiki. Do not benchmark ELKI against other software, since there is an obvious cost in the generality of the implementation, and you for example do not want to benchmark Java versus C. To benchmark the performance of actual algorithms, you must implement them within the ELKI framework to get sound results.

Different algorithms

To become acquainted with an unknown algorithm, try the option -description. For example, here, we request a description of how to use the algorithm clustering.correlation.FourC:

java -cp mypath/elki.jar de.lmu.ifi.dbs.elki.application.KDDCLIApplication -description de.lmu.ifi.dbs.elki.algorithm.clustering.correlation.FourC

The output describes the parameters available for FourC with default values. Setting for example a different distance function may in turn produce addtional parameters.

Note that we here gave the full name of the class FourC (i.e., including the complete package name), while we ommitted the prefix de.lmu.ifi.dbs.elki.algorithm. for clustering.DBSCAN above. The reason for this difference is as follows:

If as a parameter value a class name is expected, usually also a restriction class is known, i.e., an interface or a class which must be implemented or extended by the specified parameter value. For example,

the restriction class for the parameter value of -algorithm is de.lmu.ifi.dbs.elki.algorithm.Algorithm.
the restriction class for the parameter value of -algorithm.distancefunction is de.lmu.ifi.dbs.elki.distance.distancefunction.
the restriction class for the parameter value of -description is java.lang.Object.

If the specified class cannot be initialized by the given name, the initialization tries the same class name using as prefix the package of the restriction class. Thus,

the value for parameter -algorithm, clustering.DBSCAN (which is not a valid class name per se), will be automatically completed with the prefix de.lmu.ifi.dbs.elki.algorithm.,
the corresponding incomplete value for parameter -description, clustering.correlation.FourC, however, would be automatically completed with the prefix java.lang., which does not result in a valid class name.

Hence, here (i.e., for parameter -description), we are to specify the complete class name in the first place. On the other hand, would we like to use FourC as algorithm, as parameter value for -algorithm the specification clustering.correlation.FourC would suffice.

The restriction class and already available implementations (suitable as possible values for the parameter) are listed in the parameter description. See, e.g., the description of -algorithm (as provided after using -description as above or using -help):

-algorithm <object_1|class_1,...,object_n|class_n>
   Algorithm to run.
   Implementing de.lmu.ifi.dbs.elki.algorithm.Algorithm
   Known classes (default package de.lmu.ifi.dbs.elki.algorithm.):
   -> NullAlgorithm
   -> clustering.DBSCAN
   -> clustering.DeLiClu
   -> clustering.EM
   -> clustering.KMeans
   -> clustering.OPTICSXi
   -> clustering.OPTICS
   -> clustering.SLINK
   -> clustering.SNNClustering
   -> clustering.correlation.CASH
   -> clustering.correlation.COPAC
   -> clustering.correlation.ERiC
   -> clustering.correlation.FourC
   -> clustering.correlation.HiCO
   -> clustering.correlation.ORCLUS
   -> clustering.subspace.CLIQUE
   -> clustering.subspace.DiSH
   -> clustering.subspace.HiSC
   -> clustering.subspace.PreDeCon
   -> clustering.subspace.PROCLUS
   -> clustering.subspace.SUBCLU
   -> clustering.trivial.ByLabelClustering
   -> clustering.trivial.ByLabelHierarchicalClustering
   -> clustering.trivial.TrivialAllInOne
   -> clustering.trivial.TrivialAllNoise
   -> outlier.ABOD
   -> outlier.AggarwalYuEvolutionary
   -> outlier.AggarwalYuNaive
   -> outlier.DBOutlierDetection
   -> outlier.DBOutlierScore
   -> outlier.EMOutlier
   -> outlier.GaussianModel
   -> outlier.GaussianUniformMixture
   -> outlier.INFLO
   -> outlier.KNNOutlier
   -> outlier.KNNWeightOutlier
   -> outlier.LDOF
   -> outlier.LOCI
   -> outlier.LOF
   -> outlier.LoOP
   -> outlier.OPTICSOF
   -> outlier.ReferenceBasedOutlierDetection
   -> outlier.SOD
   -> outlier.OnlineLOF
   -> outlier.spatial.CTLuGLSBackwardSearchAlgorithm
   -> outlier.spatial.CTLuMeanMultipleAttributes
   -> outlier.spatial.CTLuMedianAlgorithm
   -> outlier.spatial.CTLuMedianMultipleAttributes
   -> outlier.spatial.CTLuMoranScatterplotOutlier
   -> outlier.spatial.CTLuRandomWalkEC
   -> outlier.spatial.CTLuScatterplotOutlier
   -> outlier.spatial.CTLuZTestOutlier
   -> outlier.spatial.SLOM
   -> outlier.spatial.SOF
   -> outlier.spatial.TrimmedMeanApproach
   -> outlier.meta.ExternalDoubleOutlierScore
   -> outlier.meta.FeatureBagging
   -> outlier.meta.RescaleMetaOutlierAlgorithm
   -> outlier.trivial.ByLabelOutlier
   -> outlier.trivial.TrivialAllOutlier
   -> outlier.trivial.TrivialNoOutlier
   -> statistics.EvaluateRankingQuality
   -> statistics.RankingQualityHistogram
   -> statistics.DistanceStatisticsWithClasses
   -> APRIORI
   -> DependencyDerivator
   -> KNNDistanceOrder
   -> KNNJoin
   -> MaterializeDistances