
D - a type of Distance as returned by the used distance
functionV - a type of NumberVector as a suitable datatype for this
algorithm@Title(value="K-Means") @Description(value="Finds a partitioning into k clusters.") @Reference(authors="J. MacQueen", title="Some Methods for Classification and Analysis of Multivariate Observations", booktitle="5th Berkeley Symp. Math. Statist. Prob., Vol. 1, 1967, pp 281-297", url="http://projecteuclid.org/euclid.bsmsp/1200512992") public class KMeans<V extends NumberVector<V,?>,D extends Distance<D>> extends AbstractPrimitiveDistanceBasedAlgorithm<V,D,Clustering<MeanModel<V>>> implements ClusteringAlgorithm<Clustering<MeanModel<V>>>
Reference: J. MacQueen: Some Methods for Classification and Analysis of
Multivariate Observations.
In 5th Berkeley Symp. Math. Statist. Prob., Vol. 1, 1967, pp 281-297.
| Modifier and Type | Class and Description |
|---|---|
static class |
KMeans.Parameterizer<V extends NumberVector<V,?>,D extends Distance<D>>
Parameterization class.
|
| Modifier and Type | Field and Description |
|---|---|
private int |
k
Holds the value of
K_ID. |
static OptionID |
K_ID
Parameter to specify the number of clusters to find, must be an integer
greater than 0.
|
private static Logging |
logger
The logger for this class.
|
private int |
maxiter
Holds the value of
MAXITER_ID. |
static OptionID |
MAXITER_ID
Parameter to specify the number of clusters to find, must be an integer
greater or equal to 0, where 0 means no limit.
|
private Long |
seed
Holds the value of
SEED_ID. |
static OptionID |
SEED_ID
Parameter to specify the random generator seed.
|
| Constructor and Description |
|---|
KMeans(PrimitiveDistanceFunction<? super V,D> distanceFunction,
int k,
int maxiter,
Long seed)
Constructor.
|
| Modifier and Type | Method and Description |
|---|---|
TypeInformation[] |
getInputTypeRestriction()
Get the input type restriction used for negotiating the data query.
|
protected Logging |
getLogger()
Get the (STATIC) logger for this class.
|
protected List<V> |
means(List<? extends ModifiableDBIDs> clusters,
List<V> means,
Relation<V> database)
Returns the mean vectors of the given clusters in the given database.
|
Clustering<MeanModel<V>> |
run(Database database,
Relation<V> relation)
Run k-means
|
protected List<? extends ModifiableDBIDs> |
sort(List<V> means,
Relation<V> database)
Returns a list of clusters.
|
getDistanceFunctionmakeParameterDistanceFunction, runclone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, waitrunprivate static final Logging logger
public static final OptionID K_ID
public static final OptionID MAXITER_ID
public static final OptionID SEED_ID
private int k
K_ID.private int maxiter
MAXITER_ID.public KMeans(PrimitiveDistanceFunction<? super V,D> distanceFunction, int k, int maxiter, Long seed)
distanceFunction - distance functionk - k parametermaxiter - Maxiter parameterseed - Random generator seedpublic Clustering<MeanModel<V>> run(Database database, Relation<V> relation) throws IllegalStateException
database - Databaserelation - relation to useIllegalStateExceptionprotected List<V> means(List<? extends ModifiableDBIDs> clusters, List<V> means, Relation<V> database)
clusters - the clusters to compute the meansmeans - the recent meansdatabase - the database containing the vectorsprotected List<? extends ModifiableDBIDs> sort(List<V> means, Relation<V> database)
means - a list of k meansdatabase - the database to clusterpublic TypeInformation[] getInputTypeRestriction()
AbstractAlgorithmgetInputTypeRestriction in interface AlgorithmgetInputTypeRestriction in class AbstractAlgorithm<Clustering<MeanModel<V extends NumberVector<V,?>>>>protected Logging getLogger()
AbstractAlgorithmgetLogger in class AbstractAlgorithm<Clustering<MeanModel<V extends NumberVector<V,?>>>>