de.lmu.ifi.dbs.elki.algorithm.clustering
Class KMeans<V extends NumberVector<V,?>,D extends Distance<D>>

java.lang.Object
  extended by de.lmu.ifi.dbs.elki.algorithm.AbstractAlgorithm<R>
      extended by de.lmu.ifi.dbs.elki.algorithm.AbstractPrimitiveDistanceBasedAlgorithm<V,D,Clustering<MeanModel<V>>>
          extended by de.lmu.ifi.dbs.elki.algorithm.clustering.KMeans<V,D>
Type Parameters:
D - a type of Distance as returned by the used distance function
V - a type of NumberVector as a suitable datatype for this algorithm
All Implemented Interfaces:
Algorithm, ClusteringAlgorithm<Clustering<MeanModel<V>>>, InspectionUtilFrequentlyScanned, Parameterizable

@Title(value="K-Means")
@Description(value="Finds a partitioning into k clusters.")
@Reference(authors="J. MacQueen",
           title="Some Methods for Classification and Analysis of Multivariate Observations",
           booktitle="5th Berkeley Symp. Math. Statist. Prob., Vol. 1, 1967, pp 281-297",
           url="http://projecteuclid.org/euclid.bsmsp/1200512992")
public class KMeans<V extends NumberVector<V,?>,D extends Distance<D>>
extends AbstractPrimitiveDistanceBasedAlgorithm<V,D,Clustering<MeanModel<V>>>
implements ClusteringAlgorithm<Clustering<MeanModel<V>>>

Provides the k-means algorithm.

Reference: J. MacQueen: Some Methods for Classification and Analysis of Multivariate Observations.
In 5th Berkeley Symp. Math. Statist. Prob., Vol. 1, 1967, pp 281-297.


Nested Class Summary
static class KMeans.Parameterizer<V extends NumberVector<V,?>,D extends Distance<D>>
          Parameterization class.
 
Field Summary
private  int k
          Holds the value of K_ID.
static OptionID K_ID
          Parameter to specify the number of clusters to find, must be an integer greater than 0.
private static Logging logger
          The logger for this class.
private  int maxiter
          Holds the value of MAXITER_ID.
static OptionID MAXITER_ID
          Parameter to specify the number of clusters to find, must be an integer greater or equal to 0, where 0 means no limit.
private  Long seed
          Holds the value of SEED_ID.
static OptionID SEED_ID
          Parameter to specify the random generator seed.
 
Constructor Summary
KMeans(PrimitiveDistanceFunction<? super V,D> distanceFunction, int k, int maxiter, Long seed)
          Constructor.
 
Method Summary
 TypeInformation[] getInputTypeRestriction()
          Get the input type restriction used for negotiating the data query.
protected  Logging getLogger()
          Get the (STATIC) logger for this class.
protected  List<V> means(List<? extends ModifiableDBIDs> clusters, List<V> means, Relation<V> database)
          Returns the mean vectors of the given clusters in the given database.
 Clustering<MeanModel<V>> run(Database database, Relation<V> relation)
          Run k-means
protected  List<? extends ModifiableDBIDs> sort(List<V> means, Relation<V> database)
          Returns a list of clusters.
 
Methods inherited from class de.lmu.ifi.dbs.elki.algorithm.AbstractPrimitiveDistanceBasedAlgorithm
getDistanceFunction
 
Methods inherited from class de.lmu.ifi.dbs.elki.algorithm.AbstractAlgorithm
makeParameterDistanceFunction, run
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 
Methods inherited from interface de.lmu.ifi.dbs.elki.algorithm.clustering.ClusteringAlgorithm
run
 

Field Detail

logger

private static final Logging logger
The logger for this class.


K_ID

public static final OptionID K_ID
Parameter to specify the number of clusters to find, must be an integer greater than 0.


MAXITER_ID

public static final OptionID MAXITER_ID
Parameter to specify the number of clusters to find, must be an integer greater or equal to 0, where 0 means no limit.


SEED_ID

public static final OptionID SEED_ID
Parameter to specify the random generator seed.


k

private int k
Holds the value of K_ID.


maxiter

private int maxiter
Holds the value of MAXITER_ID.


seed

private Long seed
Holds the value of SEED_ID.

Constructor Detail

KMeans

public KMeans(PrimitiveDistanceFunction<? super V,D> distanceFunction,
              int k,
              int maxiter,
              Long seed)
Constructor.

Parameters:
distanceFunction - distance function
k - k parameter
maxiter - Maxiter parameter
seed - Random generator seed
Method Detail

run

public Clustering<MeanModel<V>> run(Database database,
                                    Relation<V> relation)
                                                       throws IllegalStateException
Run k-means

Parameters:
database - Database
relation - relation to use
Returns:
result
Throws:
IllegalStateException

means

protected List<V> means(List<? extends ModifiableDBIDs> clusters,
                        List<V> means,
                        Relation<V> database)
Returns the mean vectors of the given clusters in the given database.

Parameters:
clusters - the clusters to compute the means
means - the recent means
database - the database containing the vectors
Returns:
the mean vectors of the given clusters in the given database

sort

protected List<? extends ModifiableDBIDs> sort(List<V> means,
                                               Relation<V> database)
Returns a list of clusters. The kth cluster contains the ids of those FeatureVectors, that are nearest to the kth mean.

Parameters:
means - a list of k means
database - the database to cluster
Returns:
list of k clusters

getInputTypeRestriction

public TypeInformation[] getInputTypeRestriction()
Description copied from class: AbstractAlgorithm
Get the input type restriction used for negotiating the data query.

Specified by:
getInputTypeRestriction in interface Algorithm
Specified by:
getInputTypeRestriction in class AbstractAlgorithm<Clustering<MeanModel<V extends NumberVector<V,?>>>>
Returns:
Type restriction

getLogger

protected Logging getLogger()
Description copied from class: AbstractAlgorithm
Get the (STATIC) logger for this class.

Specified by:
getLogger in class AbstractAlgorithm<Clustering<MeanModel<V extends NumberVector<V,?>>>>
Returns:
the static logger

Release 0.4.0 (2011-09-20_1324)