V
- Vector typeM
- Cluster model typepublic abstract class AbstractKMeans<V extends NumberVector,M extends Model> extends AbstractNumberVectorDistanceBasedAlgorithm<V,Clustering<M>> implements KMeans<V,M>, ClusteringAlgorithm<Clustering<M>>
Modifier and Type | Class and Description |
---|---|
static class |
AbstractKMeans.Parameterizer<V extends NumberVector>
Parameterization class.
|
Modifier and Type | Field and Description |
---|---|
protected KMeansInitialization<? super V> |
initializer
Method to choose initial means.
|
protected int |
k
Number of cluster centers to initialize.
|
protected int |
maxiter
Maximum number of iterations
|
distanceFunction
INIT_ID, K_ID, MAXITER_ID, SEED_ID
DISTANCE_FUNCTION_ID
Constructor and Description |
---|
AbstractKMeans(NumberVectorDistanceFunction<? super V> distanceFunction,
int k,
int maxiter,
KMeansInitialization<? super V> initializer)
Constructor.
|
Modifier and Type | Method and Description |
---|---|
protected boolean |
assignToNearestCluster(Relation<? extends V> relation,
List<? extends NumberVector> means,
List<? extends ModifiableDBIDs> clusters,
WritableIntegerDataStore assignment,
double[] varsum)
Returns a list of clusters.
|
TypeInformation[] |
getInputTypeRestriction()
Get the input type restriction used for negotiating the data query.
|
protected void |
incrementalUpdateMean(Vector mean,
V vec,
int newsize,
double op)
Compute an incremental update for the mean.
|
protected void |
logVarstat(DoubleStatistic varstat,
double[] varsum)
Log statistics on the variance sum.
|
protected boolean |
macQueenIterate(Relation<V> relation,
List<Vector> means,
List<ModifiableDBIDs> clusters,
WritableIntegerDataStore assignment,
double[] varsum)
Perform a MacQueen style iteration.
|
protected List<Vector> |
means(List<? extends DBIDs> clusters,
List<? extends NumberVector> means,
Relation<V> database)
Returns the mean vectors of the given clusters in the given database.
|
protected List<Vector> |
medians(List<? extends DBIDs> clusters,
List<Vector> medians,
Relation<V> database)
Returns the median vectors of the given clusters in the given database.
|
void |
setDistanceFunction(NumberVectorDistanceFunction<? super V> distanceFunction)
Set the distance function to use.
|
void |
setK(int k)
Set the value of k.
|
private boolean |
updateMeanAndAssignment(List<ModifiableDBIDs> clusters,
List<Vector> means,
int minIndex,
V fv,
DBIDIter iditer,
WritableIntegerDataStore assignment)
Try to update the cluster assignment.
|
getDistanceFunction
getLogger, makeParameterDistanceFunction, run
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
run
getDistanceFunction
protected int k
protected int maxiter
protected KMeansInitialization<? super V extends NumberVector> initializer
public AbstractKMeans(NumberVectorDistanceFunction<? super V> distanceFunction, int k, int maxiter, KMeansInitialization<? super V> initializer)
distanceFunction
- distance functionk
- k parametermaxiter
- Maxiter parameterinitializer
- Function to generate the initial meansprotected boolean assignToNearestCluster(Relation<? extends V> relation, List<? extends NumberVector> means, List<? extends ModifiableDBIDs> clusters, WritableIntegerDataStore assignment, double[] varsum)
relation
- the database to clustermeans
- a list of k meansclusters
- cluster assignmentassignment
- Current cluster assignmentvarsum
- Variance sum outputpublic TypeInformation[] getInputTypeRestriction()
AbstractAlgorithm
getInputTypeRestriction
in interface Algorithm
getInputTypeRestriction
in class AbstractAlgorithm<Clustering<M extends Model>>
protected List<Vector> means(List<? extends DBIDs> clusters, List<? extends NumberVector> means, Relation<V> database)
clusters
- the clusters to compute the meansmeans
- the recent meansdatabase
- the database containing the vectorsprotected List<Vector> medians(List<? extends DBIDs> clusters, List<Vector> medians, Relation<V> database)
clusters
- the clusters to compute the meansmedians
- the recent mediansdatabase
- the database containing the vectorsprotected void incrementalUpdateMean(Vector mean, V vec, int newsize, double op)
mean
- Mean to updatevec
- Object vectornewsize
- (New) size of clusterop
- Cluster size change / Weight changeprotected boolean macQueenIterate(Relation<V> relation, List<Vector> means, List<ModifiableDBIDs> clusters, WritableIntegerDataStore assignment, double[] varsum)
relation
- Relationmeans
- Meansclusters
- Clustersassignment
- Current cluster assignmentvarsum
- Variance sum outputprivate boolean updateMeanAndAssignment(List<ModifiableDBIDs> clusters, List<Vector> means, int minIndex, V fv, DBIDIter iditer, WritableIntegerDataStore assignment)
clusters
- Current clustersmeans
- Means to updateminIndex
- Cluster to assign tofv
- Vectoriditer
- Object IDassignment
- Current cluster assignmenttrue
when assignment changedpublic void setK(int k)
KMeans
public void setDistanceFunction(NumberVectorDistanceFunction<? super V> distanceFunction)
KMeans
setDistanceFunction
in interface KMeans<V extends NumberVector,M extends Model>
distanceFunction
- Distance function.protected void logVarstat(DoubleStatistic varstat, double[] varsum)
varstat
- Statistics log instancevarsum
- Variance sum per clusterCopyright © 2015 ELKI Development Team, Lehr- und Forschungseinheit für Datenbanksysteme, Ludwig-Maximilians-Universität München. License information.