ELKI comes with a simple GUI that helps with parameterization by offering input assistance.
Since release 0.3, the GUI is the default operation when launching the .jar file:
java -jar mypath/elki.jar
Here, we provide just some examples of usage of ELKI for some algorithms. Hopefully, from here you can easily extend to other algorithms and data sets.
Throughout all examples, we assume you have the executable jar-archive elki.jar
in some directory locally reachable from your console as mypath
,
and downloaded the example data file from (http://www.dbs.ifi.lmu.de/research/KDD/ELKI/datasets/example/exampledata.txt)
to a location reachable from your console as mydata/exampledata.txt
.
java -cp mypath/elki.jar de.lmu.ifi.dbs.elki.application.KDDCLIApplication -algorithm clustering.DBSCAN -dbc.in mydata/exampledata.txt -dbscan.epsilon 20 -dbscan.minpts 10This requests the algorithm DBSCAN to cluster the data set using DBSCAN parameters
epsilon=20
and minpts=10
. The clustering result is just printed to the console.
java -cp mypath/elki.jar de.lmu.ifi.dbs.elki.application.KDDCLIApplication -algorithm clustering.DBSCAN -dbc.in mydata/exampledata.txt -dbscan.epsilon 20 -dbscan.minpts 10 -out myresults/DBSCANeps20min10Same as before but, this time, a directory for collecting the output is explicitly specified. This results in one file per cluster as found by DBSCAN within the specified directory
myresults/DBSCANeps20min10
.
Each file starts with providing metadata information and information concerning the used parameters before listing the data points contained in the cluster.
For example, in this case, the file for cluster 0 starts like:
############################################################### # Settings: # de.lmu.ifi.dbs.elki.workflow.InputStep # -db StaticArrayDatabase #· # de.lmu.ifi.dbs.elki.database.StaticArrayDatabase # -dbc FileBasedDatabaseConnection #· # de.lmu.ifi.dbs.elki.datasource.FileBasedDatabaseConnection # -dbc.in mydata/exampledata.txt # -dbc.parser DoubleVectorLabelParser #· # de.lmu.ifi.dbs.elki.datasource.parser.DoubleVectorLabelParser # -parser.colsep \s+ # -parser.quote " # -parser.labelIndices [unset] #· # de.lmu.ifi.dbs.elki.datasource.FileBasedDatabaseConnection # -dbc.filter [unset] #· # de.lmu.ifi.dbs.elki.database.StaticArrayDatabase # -db.index [unset] #· # de.lmu.ifi.dbs.elki.workflow.AlgorithmStep # -algorithm clustering.DBSCAN #· # de.lmu.ifi.dbs.elki.algorithm.clustering.DBSCAN # -algorithm.distancefunction EuclideanDistanceFunction # -dbscan.epsilon 20.0 # -dbscan.minpts 10 #· # de.lmu.ifi.dbs.elki.workflow.EvaluationStep # -evaluator [unset] ############################################################### # Cluster: Cluster 0
Most of the parameters shown here are set implicitly with default values or not used
([unset]
or false
).
To get a list of additional parameters, add -help
to the command line. Here you will also
options not affecting the algorithm result such as -verbose
which often gives progress information.
Unused was also the possibility of normalizing the data. Normalization is available as a filter for the input step,
using the -dbc.filter
option and is done during loading the data set.
As option value, a comma separated list of filter classes is expected. ELKI provides for example the
AttributeWiseMinMaxNormalization as a possibility.
Other normalization procedures could easily be provided by any user by implementing the interface
de.lmu.ifi.dbs.elki.datasource.filter.ObjectFilter.
Note that the resulting files will contain the normalized data vectors, since
ELKI by default doesn't keep a copy of the denormalized data.
java -cp mypath/elki.jar de.lmu.ifi.dbs.elki.application.KDDCLIApplication -algorithm clustering.DBSCAN -dbc.in mydata/exampledata.txt -dbc.filter AttributeWiseMinMaxNormalization -dbscan.epsilon 0.02 -dbscan.minpts 10 -out myresults/DBSCANeps20min10 -evaluator paircounting.EvaluatePairCountingFMeasure -verbose -enableDebug de.lmu.ifi.dbs.elki.workflow.AlgorithmStepNote that the value for
dbscan.epsilon
is decreased considerably to suit the normalized data
(the AttributeWiseMinMaxNormalization normalizes all attribute values to the range [0:1]
).
pair-fmeasure.txt
For notes about fair benchmarking with ELKI, please read the comments on Benchmarking in the Wiki. Do not benchmark ELKI against other software, since there is an obvious cost in the generality of the implementation, and you for example do not want to benchmark Java versus C. To benchmark the performance of actual algorithms, you must implement them within the ELKI framework to get sound results.
-description
. For example, here, we request a description of how to use
the algorithm clustering.correlation.FourC:
java -cp mypath/elki.jar de.lmu.ifi.dbs.elki.application.KDDCLIApplication -description de.lmu.ifi.dbs.elki.algorithm.clustering.correlation.FourCThe output describes the parameters available for FourC with default values. Setting for example a different distance function may in turn produce addtional parameters.
Note that we here gave the full name of the class FourC
(i.e., including the complete package name),
while we ommitted the prefix de.lmu.ifi.dbs.elki.algorithm.
for clustering.DBSCAN
above.
The reason for this difference is as follows:
If as a parameter value a class name is expected, usually also a restriction class is known, i.e., an interface or a class which must be implemented or extended by the specified parameter value. For example,
-algorithm
is
de.lmu.ifi.dbs.elki.algorithm.Algorithm.-algorithm.distancefunction
is
de.lmu.ifi.dbs.elki.distance.distancefunction.-description
is java.lang.Object
.-algorithm
, clustering.DBSCAN
(which is not a valid class name per se),
will be automatically completed with the prefix
de.lmu.ifi.dbs.elki.algorithm.
,-description
,
clustering.correlation.FourC
,
however, would be automatically completed with the prefix
java.lang.
, which does not result in a valid class name.-description
), we are
to specify the complete class name in the first place.
On the other hand, would we like to use FourC as algorithm, as parameter value for -algorithm
the specification
clustering.correlation.FourC
would suffice.
The restriction class and already available implementations (suitable as possible values for the parameter)
are listed in the parameter description. See, e.g., the description of -algorithm
(as provided after using -description
as above or using -help
):
-algorithm <object_1|class_1,...,object_n|class_n> Algorithm to run. Implementing de.lmu.ifi.dbs.elki.algorithm.Algorithm Known classes (default package de.lmu.ifi.dbs.elki.algorithm.): -> NullAlgorithm -> clustering.DBSCAN -> clustering.DeLiClu -> clustering.EM -> clustering.KMeans -> clustering.OPTICSXi -> clustering.OPTICS -> clustering.SLINK -> clustering.SNNClustering -> clustering.correlation.CASH -> clustering.correlation.COPAC -> clustering.correlation.ERiC -> clustering.correlation.FourC -> clustering.correlation.HiCO -> clustering.correlation.ORCLUS -> clustering.subspace.CLIQUE -> clustering.subspace.DiSH -> clustering.subspace.HiSC -> clustering.subspace.PreDeCon -> clustering.subspace.PROCLUS -> clustering.subspace.SUBCLU -> clustering.trivial.ByLabelClustering -> clustering.trivial.ByLabelHierarchicalClustering -> clustering.trivial.TrivialAllInOne -> clustering.trivial.TrivialAllNoise -> outlier.ABOD -> outlier.AggarwalYuEvolutionary -> outlier.AggarwalYuNaive -> outlier.DBOutlierDetection -> outlier.DBOutlierScore -> outlier.EMOutlier -> outlier.GaussianModel -> outlier.GaussianUniformMixture -> outlier.INFLO -> outlier.KNNOutlier -> outlier.KNNWeightOutlier -> outlier.LDOF -> outlier.LOCI -> outlier.LOF -> outlier.LoOP -> outlier.OPTICSOF -> outlier.ReferenceBasedOutlierDetection -> outlier.SOD -> outlier.OnlineLOF -> outlier.spatial.CTLuGLSBackwardSearchAlgorithm -> outlier.spatial.CTLuMeanMultipleAttributes -> outlier.spatial.CTLuMedianAlgorithm -> outlier.spatial.CTLuMedianMultipleAttributes -> outlier.spatial.CTLuMoranScatterplotOutlier -> outlier.spatial.CTLuRandomWalkEC -> outlier.spatial.CTLuScatterplotOutlier -> outlier.spatial.CTLuZTestOutlier -> outlier.spatial.SLOM -> outlier.spatial.SOF -> outlier.spatial.TrimmedMeanApproach -> outlier.meta.ExternalDoubleOutlierScore -> outlier.meta.FeatureBagging -> outlier.meta.RescaleMetaOutlierAlgorithm -> outlier.trivial.ByLabelOutlier -> outlier.trivial.TrivialAllOutlier -> outlier.trivial.TrivialNoOutlier -> statistics.EvaluateRankingQuality -> statistics.RankingQualityHistogram -> statistics.DistanceStatisticsWithClasses -> APRIORI -> DependencyDerivator -> KNNDistanceOrder -> KNNJoin -> MaterializeDistances