|
|
|||||||||||||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||||||||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Objectde.lmu.ifi.dbs.elki.logging.AbstractLoggable
de.lmu.ifi.dbs.elki.algorithm.AbstractAlgorithm<V,Clustering<EMModel<V>>>
de.lmu.ifi.dbs.elki.algorithm.clustering.EM<V>
V
- a type of NumberVector
as a suitable datatype for this
algorithm@Title(value="EM-Clustering: Clustering by Expectation Maximization") @Description(value="Provides k Gaussian mixtures maximizing the probability of the given data") @Reference(authors="A. P. Dempster, N. M. Laird, D. B. Rubin", title="Maximum Likelihood from Incomplete Data via the EM algorithm", booktitle="Journal of the Royal Statistical Society, Series B, 39(1), 1977, pp. 1-31", url="http://www.jstor.org/stable/2984875") public class EM<V extends NumberVector<V,?>>
Provides the EM algorithm (clustering by expectation maximization).
Initialization is implemented as random initialization of means (uniformly distributed within the attribute ranges of the given database) and initial zero-covariance and variance=1 in covariance matrices.
Reference: A. P. Dempster, N. M. Laird, D. B. Rubin: Maximum Likelihood from
Incomplete Data via the EM algorithm.
In Journal of the Royal Statistical Society, Series B, 39(1), 1977, pp. 1-31
Field Summary | |
---|---|
private double |
delta
Holds the value of DELTA_PARAM . |
static OptionID |
DELTA_ID
OptionID for DELTA_PARAM |
private DoubleParameter |
DELTA_PARAM
Parameter to specify the termination criterion for maximization of E(M): E(M) - E(M') < em.delta, must be a double equal to or greater than 0. |
private int |
k
Holds the value of K_PARAM . |
static OptionID |
K_ID
OptionID for K_PARAM |
private IntParameter |
K_PARAM
Parameter to specify the number of clusters to find, must be an integer greater than 0. |
private static double |
MIN_LOGLIKELIHOOD
|
private HashMap<Integer,double[]> |
probClusterIGivenX
Store the individual probabilities, for use by EMOutlierDetection etc. |
private static double |
SINGULARITY_CHEAT
Small value to increment diagonally of a matrix in order to avoid singularity before building the inverse. |
Fields inherited from class de.lmu.ifi.dbs.elki.logging.AbstractLoggable |
---|
debug, logger |
Constructor Summary | |
---|---|
EM(Parameterization config)
Constructor, adhering to Parameterizable |
Method Summary | |
---|---|
protected double |
assignProbabilitiesToInstances(Database<V> database,
List<Double> normDistrFactor,
List<V> means,
List<Matrix> invCovMatr,
List<Double> clusterWeights,
HashMap<Integer,double[]> probClusterIGivenX)
Assigns the current probability values to the instances in the database and compute the expectation value of the current mixture of distributions. |
double[] |
getProbClusterIGivenX(Integer index)
Get the probabilities for a given point. |
protected List<V> |
initialMeans(Database<V> database)
Creates k random points distributed uniformly within the
attribute ranges of the given database. |
protected Clustering<EMModel<V>> |
runInTime(Database<V> database)
Performs the EM clustering algorithm on the given database. |
Methods inherited from class de.lmu.ifi.dbs.elki.algorithm.AbstractAlgorithm |
---|
isTime, isVerbose, run, setTime, setVerbose |
Methods inherited from class de.lmu.ifi.dbs.elki.logging.AbstractLoggable |
---|
debugFine, debugFiner, debugFinest, exception, progress, verbose, warning |
Methods inherited from class java.lang.Object |
---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Methods inherited from interface de.lmu.ifi.dbs.elki.algorithm.clustering.ClusteringAlgorithm |
---|
run |
Methods inherited from interface de.lmu.ifi.dbs.elki.algorithm.Algorithm |
---|
setTime, setVerbose |
Field Detail |
---|
private static final double SINGULARITY_CHEAT
public static final OptionID K_ID
K_PARAM
private final IntParameter K_PARAM
Key: -em.k
private int k
K_PARAM
.
public static final OptionID DELTA_ID
DELTA_PARAM
private static final double MIN_LOGLIKELIHOOD
private final DoubleParameter DELTA_PARAM
Default value: 0.0
Key: -em.delta
private double delta
DELTA_PARAM
.
private HashMap<Integer,double[]> probClusterIGivenX
Constructor Detail |
---|
public EM(Parameterization config)
Parameterizable
config
- ParameterizationMethod Detail |
---|
protected Clustering<EMModel<V>> runInTime(Database<V> database) throws IllegalStateException
runInTime
in class AbstractAlgorithm<V extends NumberVector<V,?>,Clustering<EMModel<V extends NumberVector<V,?>>>>
database
- the database to run the algorithm on
IllegalStateException
- if the algorithm has not been initialized
properly (e.g. the setParameters(String[]) method has been failed
to be called).protected double assignProbabilitiesToInstances(Database<V> database, List<Double> normDistrFactor, List<V> means, List<Matrix> invCovMatr, List<Double> clusterWeights, HashMap<Integer,double[]> probClusterIGivenX)
database
- the database used for assignment to instancesnormDistrFactor
- normalization factor for density function, based on
current covariance matrixmeans
- the current meansinvCovMatr
- the inverse covariance matricesclusterWeights
- the weights of the current clusters
protected List<V> initialMeans(Database<V> database)
k
random points distributed uniformly within the
attribute ranges of the given database.
database
- the database must contain enough points in order to
ascertain the range of attribute values. Less than two points would
make no sense. The content of the database is not touched otherwise.
k
random points distributed uniformly within
the attribute ranges of the given databasepublic double[] getProbClusterIGivenX(Integer index)
index
- Point ID
|
|
|||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |