V
- vector datatype@Reference(authors="C. Elkan", title="Using the triangle inequality to accelerate k-means", booktitle="Proc. 20th International Conference on Machine Learning, ICML 2003", url="http://www.aaai.org/Library/ICML/2003/icml03-022.php") public class KMeansElkan<V extends NumberVector> extends AbstractKMeans<V,KMeansModel>
KMeansHamerly
for a close variant that only uses O(n*2)
additional memory for bounds.
Reference:
C. Elkan
Using the triangle inequality to accelerate k-means
Proc. 20th International Conference on Machine Learning, ICML 2003
Modifier and Type | Class and Description |
---|---|
static class |
KMeansElkan.Parameterizer<V extends NumberVector>
Parameterization class.
|
Modifier and Type | Field and Description |
---|---|
private static String |
KEY
Key for statistics logging.
|
private static Logging |
LOG
The logger for this class.
|
private boolean |
varstat
Flag whether to compute the final variance statistic.
|
initializer, k, maxiter
distanceFunction
INIT_ID, K_ID, MAXITER_ID, SEED_ID
DISTANCE_FUNCTION_ID
Constructor and Description |
---|
KMeansElkan(NumberVectorDistanceFunction<? super V> distanceFunction,
int k,
int maxiter,
KMeansInitialization<? super V> initializer,
boolean varstat)
Constructor.
|
Modifier and Type | Method and Description |
---|---|
private int |
assignToNearestCluster(Relation<V> relation,
List<Vector> means,
List<Vector> sums,
List<ModifiableDBIDs> clusters,
WritableIntegerDataStore assignment,
double[] sep,
double[][] cdist,
WritableDoubleDataStore upper,
WritableDataStore<double[]> lower)
Reassign objects, but only if their bounds indicate it is necessary to do
so.
|
protected Logging |
getLogger()
Get the (STATIC) logger for this class.
|
private int |
initialAssignToNearestCluster(Relation<V> relation,
List<Vector> means,
List<Vector> sums,
List<ModifiableDBIDs> clusters,
WritableIntegerDataStore assignment,
WritableDoubleDataStore upper,
WritableDataStore<double[]> lower)
Reassign objects, but only if their bounds indicate it is necessary to do
so.
|
private double |
maxMoved(List<Vector> means,
List<Vector> newmeans,
double[] dists)
Maximum distance moved.
|
private void |
recomputeSeperation(List<Vector> means,
double[] sep,
double[][] cdist)
Recompute the separation of cluster means.
|
Clustering<KMeansModel> |
run(Database database,
Relation<V> relation)
Run the clustering algorithm.
|
private void |
updateBounds(Relation<V> relation,
WritableIntegerDataStore assignment,
WritableDoubleDataStore upper,
WritableDataStore<double[]> lower,
double[] move)
Update the bounds for k-means.
|
assignToNearestCluster, getInputTypeRestriction, incrementalUpdateMean, logVarstat, macQueenIterate, means, medians, setDistanceFunction, setK
getDistanceFunction
makeParameterDistanceFunction, run
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
run
getDistanceFunction
private static final Logging LOG
private static final String KEY
private boolean varstat
public KMeansElkan(NumberVectorDistanceFunction<? super V> distanceFunction, int k, int maxiter, KMeansInitialization<? super V> initializer, boolean varstat)
distanceFunction
- distance functionk
- k parametermaxiter
- Maxiter parameterinitializer
- Initialization methodvarstat
- Compute the variance statisticpublic Clustering<KMeansModel> run(Database database, Relation<V> relation)
KMeans
database
- Database to run on.relation
- Relation to process.private void recomputeSeperation(List<Vector> means, double[] sep, double[][] cdist)
means
- Meanssep
- Output array of separationcdist
- Center-to-Center distancesprivate int initialAssignToNearestCluster(Relation<V> relation, List<Vector> means, List<Vector> sums, List<ModifiableDBIDs> clusters, WritableIntegerDataStore assignment, WritableDoubleDataStore upper, WritableDataStore<double[]> lower)
relation
- Datameans
- Current meanssums
- New meansclusters
- Current clustersassignment
- Cluster assignmentupper
- Upper boundslower
- Lower boundsprivate int assignToNearestCluster(Relation<V> relation, List<Vector> means, List<Vector> sums, List<ModifiableDBIDs> clusters, WritableIntegerDataStore assignment, double[] sep, double[][] cdist, WritableDoubleDataStore upper, WritableDataStore<double[]> lower)
relation
- Datameans
- Current meanssums
- New meansclusters
- Current clustersassignment
- Cluster assignmentsep
- Separation of meanscdist
- Center-to-center distancesupper
- Upper boundslower
- Lower boundsprivate double maxMoved(List<Vector> means, List<Vector> newmeans, double[] dists)
means
- Old meansnewmeans
- New meansdists
- Distances movedprivate void updateBounds(Relation<V> relation, WritableIntegerDataStore assignment, WritableDoubleDataStore upper, WritableDataStore<double[]> lower, double[] move)
relation
- Relationassignment
- Cluster assignmentupper
- Upper boundslower
- Lower boundsmove
- Movement of centersprotected Logging getLogger()
AbstractAlgorithm
getLogger
in class AbstractAlgorithm<Clustering<KMeansModel>>
Copyright © 2015 ELKI Development Team, Lehr- und Forschungseinheit für Datenbanksysteme, Ludwig-Maximilians-Universität München. License information.