D
- a type of Distance
as returned by the used distance
functionV
- a type of NumberVector
as a suitable datatype for this
algorithm@Title(value="K-Means") @Description(value="Finds a partitioning into k clusters.") @Reference(authors="J. MacQueen", title="Some Methods for Classification and Analysis of Multivariate Observations", booktitle="5th Berkeley Symp. Math. Statist. Prob., Vol. 1, 1967, pp 281-297", url="http://projecteuclid.org/euclid.bsmsp/1200512992") public class KMeans<V extends NumberVector<V,?>,D extends Distance<D>> extends AbstractPrimitiveDistanceBasedAlgorithm<V,D,Clustering<MeanModel<V>>> implements ClusteringAlgorithm<Clustering<MeanModel<V>>>
Reference: J. MacQueen: Some Methods for Classification and Analysis of
Multivariate Observations.
In 5th Berkeley Symp. Math. Statist. Prob., Vol. 1, 1967, pp 281-297.
Modifier and Type | Class and Description |
---|---|
static class |
KMeans.Parameterizer<V extends NumberVector<V,?>,D extends Distance<D>>
Parameterization class.
|
Modifier and Type | Field and Description |
---|---|
private int |
k
Holds the value of
K_ID . |
static OptionID |
K_ID
Parameter to specify the number of clusters to find, must be an integer
greater than 0.
|
private static Logging |
logger
The logger for this class.
|
private int |
maxiter
Holds the value of
MAXITER_ID . |
static OptionID |
MAXITER_ID
Parameter to specify the number of clusters to find, must be an integer
greater or equal to 0, where 0 means no limit.
|
private Long |
seed
Holds the value of
SEED_ID . |
static OptionID |
SEED_ID
Parameter to specify the random generator seed.
|
Constructor and Description |
---|
KMeans(PrimitiveDistanceFunction<? super V,D> distanceFunction,
int k,
int maxiter,
Long seed)
Constructor.
|
Modifier and Type | Method and Description |
---|---|
TypeInformation[] |
getInputTypeRestriction()
Get the input type restriction used for negotiating the data query.
|
protected Logging |
getLogger()
Get the (STATIC) logger for this class.
|
protected List<V> |
means(List<? extends ModifiableDBIDs> clusters,
List<V> means,
Relation<V> database)
Returns the mean vectors of the given clusters in the given database.
|
Clustering<MeanModel<V>> |
run(Database database,
Relation<V> relation)
Run k-means
|
protected List<? extends ModifiableDBIDs> |
sort(List<V> means,
Relation<V> database)
Returns a list of clusters.
|
getDistanceFunction
makeParameterDistanceFunction, run
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
run
private static final Logging logger
public static final OptionID K_ID
public static final OptionID MAXITER_ID
public static final OptionID SEED_ID
private int k
K_ID
.private int maxiter
MAXITER_ID
.public KMeans(PrimitiveDistanceFunction<? super V,D> distanceFunction, int k, int maxiter, Long seed)
distanceFunction
- distance functionk
- k parametermaxiter
- Maxiter parameterseed
- Random generator seedpublic Clustering<MeanModel<V>> run(Database database, Relation<V> relation) throws IllegalStateException
database
- Databaserelation
- relation to useIllegalStateException
protected List<V> means(List<? extends ModifiableDBIDs> clusters, List<V> means, Relation<V> database)
clusters
- the clusters to compute the meansmeans
- the recent meansdatabase
- the database containing the vectorsprotected List<? extends ModifiableDBIDs> sort(List<V> means, Relation<V> database)
means
- a list of k meansdatabase
- the database to clusterpublic TypeInformation[] getInputTypeRestriction()
AbstractAlgorithm
getInputTypeRestriction
in interface Algorithm
getInputTypeRestriction
in class AbstractAlgorithm<Clustering<MeanModel<V extends NumberVector<V,?>>>>
protected Logging getLogger()
AbstractAlgorithm
getLogger
in class AbstractAlgorithm<Clustering<MeanModel<V extends NumberVector<V,?>>>>