V
- a type of NumberVector
as a suitable datatype for this
algorithm@Title(value="EM-Clustering: Clustering by Expectation Maximization") @Description(value="Provides k Gaussian mixtures maximizing the probability of the given data") @Reference(authors="A. P. Dempster, N. M. Laird, D. B. Rubin", title="Maximum Likelihood from Incomplete Data via the EM algorithm", booktitle="Journal of the Royal Statistical Society, Series B, 39(1), 1977, pp. 1-31", url="http://www.jstor.org/stable/2984875") public class EM<V extends NumberVector<?>> extends AbstractAlgorithm<Clustering<EMModel<V>>> implements ClusteringAlgorithm<Clustering<EMModel<V>>>
Reference: A. P. Dempster, N. M. Laird, D. B. Rubin: Maximum Likelihood from
Incomplete Data via the EM algorithm.
In Journal of the Royal Statistical Society, Series B, 39(1), 1977, pp. 1-31
Modifier and Type | Class and Description |
---|---|
static class |
EM.Parameterizer<V extends NumberVector<?>>
Parameterization class.
|
Modifier and Type | Field and Description |
---|---|
private double |
delta
Holds the value of
DELTA_ID . |
static OptionID |
DELTA_ID
Parameter to specify the termination criterion for maximization of E(M):
E(M) - E(M') < em.delta, must be a double equal to or greater than 0.
|
static OptionID |
INIT_ID
Parameter to specify the initialization method
|
private KMeansInitialization<V> |
initializer
Class to choose the initial means
|
private int |
k
Holds the value of
K_ID . |
static OptionID |
K_ID
Parameter to specify the number of clusters to find, must be an integer
greater than 0.
|
private static Logging |
LOG
The logger for this class.
|
private int |
maxiter
Maximum number of iterations to allow
|
private static double |
MIN_LOGLIKELIHOOD |
private WritableDataStore<double[]> |
probClusterIGivenX
Store the individual probabilities, for use by EMOutlierDetection etc.
|
private static double |
SINGULARITY_CHEAT
Small value to increment diagonally of a matrix in order to avoid
singularity before building the inverse.
|
Constructor and Description |
---|
EM(int k,
double delta,
KMeansInitialization<V> initializer,
int maxiter)
Constructor.
|
Modifier and Type | Method and Description |
---|---|
protected double |
assignProbabilitiesToInstances(Relation<V> database,
double[] normDistrFactor,
List<Vector> means,
List<Matrix> invCovMatr,
double[] clusterWeights,
WritableDataStore<double[]> probClusterIGivenX)
Assigns the current probability values to the instances in the database and
compute the expectation value of the current mixture of distributions.
|
TypeInformation[] |
getInputTypeRestriction()
Get the input type restriction used for negotiating the data query.
|
protected Logging |
getLogger()
Get the (STATIC) logger for this class.
|
double[] |
getProbClusterIGivenX(DBIDRef index)
Get the probabilities for a given point.
|
Clustering<EMModel<V>> |
run(Database database,
Relation<V> relation)
Performs the EM clustering algorithm on the given database.
|
makeParameterDistanceFunction, run
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
run
private static final Logging LOG
private static final double SINGULARITY_CHEAT
public static final OptionID K_ID
private int k
K_ID
.public static final OptionID DELTA_ID
public static final OptionID INIT_ID
private static final double MIN_LOGLIKELIHOOD
private double delta
DELTA_ID
.private WritableDataStore<double[]> probClusterIGivenX
private KMeansInitialization<V extends NumberVector<?>> initializer
private int maxiter
public EM(int k, double delta, KMeansInitialization<V> initializer, int maxiter)
k
- k parameterdelta
- delta parameterinitializer
- Class to choose the initial meansmaxiter
- Maximum number of iterationspublic Clustering<EMModel<V>> run(Database database, Relation<V> relation)
database
- Databaserelation
- Relationprotected double assignProbabilitiesToInstances(Relation<V> database, double[] normDistrFactor, List<Vector> means, List<Matrix> invCovMatr, double[] clusterWeights, WritableDataStore<double[]> probClusterIGivenX)
database
- the database used for assignment to instancesnormDistrFactor
- normalization factor for density function, based on
current covariance matrixmeans
- the current meansinvCovMatr
- the inverse covariance matricesclusterWeights
- the weights of the current clusterspublic double[] getProbClusterIGivenX(DBIDRef index)
index
- Point IDpublic TypeInformation[] getInputTypeRestriction()
AbstractAlgorithm
getInputTypeRestriction
in interface Algorithm
getInputTypeRestriction
in class AbstractAlgorithm<Clustering<EMModel<V extends NumberVector<?>>>>
protected Logging getLogger()
AbstractAlgorithm
getLogger
in class AbstractAlgorithm<Clustering<EMModel<V extends NumberVector<?>>>>