V
- vector type to analyzeM
- model type to produce@Title(value="EM-Clustering: Clustering by Expectation Maximization") @Description(value="Cluster data via Gaussian mixture modeling and the EM algorithm") @Reference(authors="A. P. Dempster, N. M. Laird, D. B. Rubin",title="Maximum Likelihood from Incomplete Data via the EM algorithm",booktitle="Journal of the Royal Statistical Society, Series B, 39(1)",url="http://www.jstor.org/stable/2984875",bibkey="journals/jroyastatsocise2/DempsterLR77") @Reference(title="Bayesian Regularization for Normal Mixture Estimation and Model-Based Clustering",authors="C. Fraley, A. E. Raftery",booktitle="J. Classification 24(2)",url="https://doi.org/10.1007/s00357-007-0004-5",bibkey="DBLP:journals/classification/FraleyR07") @Alias(value="de.lmu.ifi.dbs.elki.algorithm.clustering.EM") @Priority(value=200) public class EM<V extends NumberVector,M extends MeanModel> extends AbstractAlgorithm<Clustering<M>> implements ClusteringAlgorithm<Clustering<M>>
Reference:
A. P. Dempster, N. M. Laird, D. B. Rubin:
Maximum Likelihood from Incomplete Data via the EM algorithm.
Journal of the Royal Statistical Society, Series B, 39(1), 1977, pp. 1-31
The MAP estimation is derived from
C. Fraley and A. E. Raftery
Bayesian Regularization for Normal Mixture Estimation and Model-Based
Clustering
J. Classification 24(2)
Modifier and Type | Class and Description |
---|---|
static class |
EM.Parameterizer<V extends NumberVector,M extends MeanModel>
Parameterization class.
|
Modifier and Type | Field and Description |
---|---|
private double |
delta
Delta parameter
|
private int |
k
Number of clusters
|
private static java.lang.String |
KEY
Key for statistics logging.
|
private static Logging |
LOG
The logger for this class.
|
private int |
maxiter
Maximum number of iterations to allow
|
private EMClusterModelFactory<V,M> |
mfactory
Factory for producing the initial cluster model.
|
private static double |
MIN_LOGLIKELIHOOD
Minimum loglikelihood to avoid -infinity.
|
private double |
prior
Prior to enable MAP estimation (use 0 for MLE)
|
private boolean |
soft
Retain soft assignments.
|
static SimpleTypeInformation<double[]> |
SOFT_TYPE
Soft assignment result type.
|
ALGORITHM_ID
Constructor and Description |
---|
EM(int k,
double delta,
EMClusterModelFactory<V,M> mfactory)
Constructor.
|
EM(int k,
double delta,
EMClusterModelFactory<V,M> mfactory,
int maxiter,
boolean soft)
Constructor.
|
EM(int k,
double delta,
EMClusterModelFactory<V,M> mfactory,
int maxiter,
double prior,
boolean soft)
Constructor.
|
Modifier and Type | Method and Description |
---|---|
static double |
assignProbabilitiesToInstances(Relation<? extends NumberVector> relation,
java.util.List<? extends EMClusterModel<?>> models,
WritableDataStore<double[]> probClusterIGivenX)
Assigns the current probability values to the instances in the database and
compute the expectation value of the current mixture of distributions.
|
TypeInformation[] |
getInputTypeRestriction()
Get the input type restriction used for negotiating the data query.
|
protected Logging |
getLogger()
Get the (STATIC) logger for this class.
|
boolean |
isSoft() |
private static double |
logSumExp(double[] x)
Compute log(sum(exp(x_i)), with attention to numerical issues.
|
static void |
recomputeCovarianceMatrices(Relation<? extends NumberVector> relation,
WritableDataStore<double[]> probClusterIGivenX,
java.util.List<? extends EMClusterModel<?>> models,
double prior)
Recompute the covariance matrixes.
|
Clustering<M> |
run(Database database,
Relation<V> relation)
Performs the EM clustering algorithm on the given database.
|
void |
setSoft(boolean soft) |
run
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
run
private static final Logging LOG
private static final java.lang.String KEY
private int k
private double delta
private EMClusterModelFactory<V extends NumberVector,M extends MeanModel> mfactory
private int maxiter
private double prior
private boolean soft
private static final double MIN_LOGLIKELIHOOD
public static final SimpleTypeInformation<double[]> SOFT_TYPE
public EM(int k, double delta, EMClusterModelFactory<V,M> mfactory)
k
- k parameterdelta
- delta parametermfactory
- EM cluster model factorypublic EM(int k, double delta, EMClusterModelFactory<V,M> mfactory, int maxiter, boolean soft)
k
- k parameterdelta
- delta parametermfactory
- EM cluster model factorymaxiter
- Maximum number of iterationssoft
- Include soft assignmentspublic EM(int k, double delta, EMClusterModelFactory<V,M> mfactory, int maxiter, double prior, boolean soft)
k
- k parameterdelta
- delta parametermfactory
- EM cluster model factorymaxiter
- Maximum number of iterationsprior
- MAP priorsoft
- Include soft assignmentspublic Clustering<M> run(Database database, Relation<V> relation)
database
- Databaserelation
- Relationpublic static void recomputeCovarianceMatrices(Relation<? extends NumberVector> relation, WritableDataStore<double[]> probClusterIGivenX, java.util.List<? extends EMClusterModel<?>> models, double prior)
relation
- Vector dataprobClusterIGivenX
- Object probabilitiesmodels
- Cluster models to updateprior
- MAP prior (use 0 for MLE)public static double assignProbabilitiesToInstances(Relation<? extends NumberVector> relation, java.util.List<? extends EMClusterModel<?>> models, WritableDataStore<double[]> probClusterIGivenX)
relation
- the database used for assignment to instancesmodels
- Cluster modelsprobClusterIGivenX
- Output storage for cluster probabilitiesprivate static double logSumExp(double[] x)
x
- Inputpublic TypeInformation[] getInputTypeRestriction()
AbstractAlgorithm
getInputTypeRestriction
in interface Algorithm
getInputTypeRestriction
in class AbstractAlgorithm<Clustering<M extends MeanModel>>
protected Logging getLogger()
AbstractAlgorithm
getLogger
in class AbstractAlgorithm<Clustering<M extends MeanModel>>
public boolean isSoft()
public void setSoft(boolean soft)
soft
- the soft to setCopyright © 2019 ELKI Development Team. License information.