V - Data type@Reference(authors="L. Kaufman, P. J. Rousseeuw",title="Clustering Large Data Sets",booktitle="Pattern Recognition in Practice",url="https://doi.org/10.1016/B978-0-444-87877-9.50039-X",bibkey="doi:10.1016/B978-0-444-87877-9.50039-X") @Reference(authors="L. Kaufman, P. J. Rousseeuw",title="Clustering Large Applications (Program CLARA)",booktitle="Finding Groups in Data: An Introduction to Cluster Analysis",url="https://doi.org/10.1002/9780470316801.ch3",bibkey="doi:10.1002/9780470316801.ch3") public class CLARA<V> extends KMedoidsPAM<V>
KMedoidsPAM) based on
sampling.
TODO: use a triangular distance matrix, rather than a hash-map based cache, for a bit better performance and less memory.
Reference:
L. Kaufman, P. J. Rousseeuw
Clustering Large Data Sets
Pattern Recognition in Practice
L. Kaufman, P. J. Rousseeuw
Clustering Large Applications (Program CLARA)
Finding Groups in Data: An Introduction to Cluster Analysis
| Modifier and Type | Class and Description |
|---|---|
(package private) static class |
CLARA.CachedDistanceQuery<V>
Cached distance query.
|
static class |
CLARA.Parameterizer<V>
Parameterization class.
|
KMedoidsPAM.Instance| Modifier and Type | Field and Description |
|---|---|
(package private) boolean |
keepmed
Keep the previous medoids in the sample (see page 145).
|
private static Logging |
LOG
Class logger.
|
(package private) int |
numsamples
Number of samples to draw (i.e. iterations).
|
(package private) RandomFactory |
random
Random factory for initialization.
|
(package private) double |
sampling
Sampling rate.
|
initializer, k, maxiterALGORITHM_IDDISTANCE_FUNCTION_ID| Constructor and Description |
|---|
CLARA(DistanceFunction<? super V> distanceFunction,
int k,
int maxiter,
KMedoidsInitialization<V> initializer,
int numsamples,
double sampling,
boolean keepmed,
RandomFactory random)
Constructor.
|
| Modifier and Type | Method and Description |
|---|---|
(package private) static double |
assignRemainingToNearestCluster(ArrayDBIDs means,
DBIDs ids,
DBIDs rids,
WritableIntegerDataStore assignment,
DistanceQuery<?> distQ)
Returns a list of clusters.
|
(package private) static DBIDs |
randomSample(DBIDs ids,
int samplesize,
java.util.Random rnd,
DBIDs previous)
Draw a random sample of the desired size.
|
Clustering<MedoidModel> |
run(Database database,
Relation<V> relation)
Run k-medoids
|
getInputTypeRestriction, getLogger, initialMedoids, rungetDistanceFunctionrunclone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, waitrunprivate static final Logging LOG
double sampling
int numsamples
boolean keepmed
RandomFactory random
public CLARA(DistanceFunction<? super V> distanceFunction, int k, int maxiter, KMedoidsInitialization<V> initializer, int numsamples, double sampling, boolean keepmed, RandomFactory random)
distanceFunction - Distance function to usek - Number of clusters to producemaxiter - Maximum number of iterationsinitializer - Initialization functionnumsamples - Number of samples (sampling iterations)sampling - Sampling rate (absolute or relative)keepmed - Keep the previous medoids in the next samplerandom - Random generatorpublic Clustering<MedoidModel> run(Database database, Relation<V> relation)
KMedoidsPAMrun in class KMedoidsPAM<V>database - Databaserelation - relation to usestatic DBIDs randomSample(DBIDs ids, int samplesize, java.util.Random rnd, DBIDs previous)
ids - IDs to sample fromsamplesize - Sample sizernd - Random generatorprevious - Previous medoids to always include in the sample.static double assignRemainingToNearestCluster(ArrayDBIDs means, DBIDs ids, DBIDs rids, WritableIntegerDataStore assignment, DistanceQuery<?> distQ)
means - Object centroidsids - Object idsrids - Sample that was already assignedassignment - cluster assignmentdistQ - distance queryCopyright © 2019 ELKI Development Team. License information.