V
- Data type@Reference(authors="L. Kaufman, P. J. Rousseeuw",title="Clustering Large Data Sets",booktitle="Pattern Recognition in Practice",url="https://doi.org/10.1016/B978-0-444-87877-9.50039-X",bibkey="doi:10.1016/B978-0-444-87877-9.50039-X") @Reference(authors="L. Kaufman, P. J. Rousseeuw",title="Clustering Large Applications (Program CLARA)",booktitle="Finding Groups in Data: An Introduction to Cluster Analysis",url="https://doi.org/10.1002/9780470316801.ch3",bibkey="doi:10.1002/9780470316801.ch3") public class CLARA<V> extends KMedoidsPAM<V>
KMedoidsPAM
) based on
sampling.
TODO: use a triangular distance matrix, rather than a hash-map based cache, for a bit better performance and less memory.
Reference:
L. Kaufman, P. J. Rousseeuw
Clustering Large Data Sets
Pattern Recognition in Practice
L. Kaufman, P. J. Rousseeuw
Clustering Large Applications (Program CLARA)
Finding Groups in Data: An Introduction to Cluster Analysis
Modifier and Type | Class and Description |
---|---|
(package private) static class |
CLARA.CachedDistanceQuery<V>
Cached distance query.
|
static class |
CLARA.Parameterizer<V>
Parameterization class.
|
KMedoidsPAM.Instance
Modifier and Type | Field and Description |
---|---|
(package private) boolean |
keepmed
Keep the previous medoids in the sample (see page 145).
|
private static Logging |
LOG
Class logger.
|
(package private) int |
numsamples
Number of samples to draw (i.e. iterations).
|
(package private) RandomFactory |
random
Random factory for initialization.
|
(package private) double |
sampling
Sampling rate.
|
initializer, k, maxiter
ALGORITHM_ID
DISTANCE_FUNCTION_ID
Constructor and Description |
---|
CLARA(DistanceFunction<? super V> distanceFunction,
int k,
int maxiter,
KMedoidsInitialization<V> initializer,
int numsamples,
double sampling,
boolean keepmed,
RandomFactory random)
Constructor.
|
Modifier and Type | Method and Description |
---|---|
(package private) static double |
assignRemainingToNearestCluster(ArrayDBIDs means,
DBIDs ids,
DBIDs rids,
WritableIntegerDataStore assignment,
DistanceQuery<?> distQ)
Returns a list of clusters.
|
(package private) static DBIDs |
randomSample(DBIDs ids,
int samplesize,
java.util.Random rnd,
DBIDs previous)
Draw a random sample of the desired size.
|
Clustering<MedoidModel> |
run(Database database,
Relation<V> relation)
Run k-medoids
|
getInputTypeRestriction, getLogger, initialMedoids, run
getDistanceFunction
run
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
run
private static final Logging LOG
double sampling
int numsamples
boolean keepmed
RandomFactory random
public CLARA(DistanceFunction<? super V> distanceFunction, int k, int maxiter, KMedoidsInitialization<V> initializer, int numsamples, double sampling, boolean keepmed, RandomFactory random)
distanceFunction
- Distance function to usek
- Number of clusters to producemaxiter
- Maximum number of iterationsinitializer
- Initialization functionnumsamples
- Number of samples (sampling iterations)sampling
- Sampling rate (absolute or relative)keepmed
- Keep the previous medoids in the next samplerandom
- Random generatorpublic Clustering<MedoidModel> run(Database database, Relation<V> relation)
KMedoidsPAM
run
in class KMedoidsPAM<V>
database
- Databaserelation
- relation to usestatic DBIDs randomSample(DBIDs ids, int samplesize, java.util.Random rnd, DBIDs previous)
ids
- IDs to sample fromsamplesize
- Sample sizernd
- Random generatorprevious
- Previous medoids to always include in the sample.static double assignRemainingToNearestCluster(ArrayDBIDs means, DBIDs ids, DBIDs rids, WritableIntegerDataStore assignment, DistanceQuery<?> distQ)
means
- Object centroidsids
- Object idsrids
- Sample that was already assignedassignment
- cluster assignmentdistQ
- distance queryCopyright © 2019 ELKI Development Team. License information.