de.lmu.ifi.dbs.elki.algorithm
Class DependencyDerivator<V extends NumberVector<V,?>,D extends Distance<D>>

java.lang.Object
  extended by de.lmu.ifi.dbs.elki.algorithm.AbstractAlgorithm<R>
      extended by de.lmu.ifi.dbs.elki.algorithm.AbstractPrimitiveDistanceBasedAlgorithm<V,D,CorrelationAnalysisSolution<V>>
          extended by de.lmu.ifi.dbs.elki.algorithm.DependencyDerivator<V,D>
Type Parameters:
V - the type of FeatureVector handled by this Algorithm
D - the type of Distance used by this Algorithm
All Implemented Interfaces:
Algorithm, InspectionUtilFrequentlyScanned, Parameterizable

@Title(value="Dependency Derivator: Deriving numerical inter-dependencies on data")
@Description(value="Derives an equality-system describing dependencies between attributes in a correlation-cluster")
@Reference(authors="E. Achtert, C. B\u00f6hm, H.-P. Kriegel, P. Kr\u00f6ger, A. Zimek",
           title="Deriving Quantitative Dependencies for Correlation Clusters",
           booktitle="Proc. 12th Int. Conf. on Knowledge Discovery and Data Mining (KDD \'06), Philadelphia, PA 2006.",
           url="http://dx.doi.org/10.1145/1150402.1150408")
public class DependencyDerivator<V extends NumberVector<V,?>,D extends Distance<D>>
extends AbstractPrimitiveDistanceBasedAlgorithm<V,D,CorrelationAnalysisSolution<V>>

Dependency derivator computes quantitatively linear dependencies among attributes of a given dataset based on a linear correlation PCA.

Reference:
E. Achtert, C. Böhm, H.-P. Kriegel, P. Kröger, A. Zimek: Deriving Quantitative Dependencies for Correlation Clusters.
In Proc. 12th Int. Conf. on Knowledge Discovery and Data Mining (KDD '06), Philadelphia, PA 2006.


Nested Class Summary
static class DependencyDerivator.Parameterizer<V extends NumberVector<V,?>,D extends Distance<D>>
          Parameterization class.
 
Field Summary
static OptionID DEPENDENCY_DERIVATOR_RANDOM_SAMPLE
          Flag to use random sample (use knn query around centroid, if flag is not set).
private static Logging logger
          The logger for this class.
 NumberFormat NF
          Number format for output of solution.
static OptionID OUTPUT_ACCURACY_ID
          Parameter to specify the threshold for output accuracy fraction digits, must be an integer equal to or greater than 0.
private  PCAFilteredRunner<V> pca
          Holds the object performing the pca.
private  boolean randomsample
          Flag for random sampling vs. kNN
static OptionID SAMPLE_SIZE_ID
          Optional parameter to specify the treshold for the size of the random sample to use, must be an integer greater than 0.
private  int sampleSize
          Holds the value of SAMPLE_SIZE_ID.
 
Constructor Summary
DependencyDerivator(PrimitiveDistanceFunction<V,D> distanceFunction, NumberFormat nf, PCAFilteredRunner<V> pca, int sampleSize, boolean randomsample)
          Constructor.
 
Method Summary
 CorrelationAnalysisSolution<V> generateModel(Relation<V> db, DBIDs ids)
          Runs the pca on the given set of IDs.
 CorrelationAnalysisSolution<V> generateModel(Relation<V> db, DBIDs ids, V centroidDV)
          Runs the pca on the given set of IDs and for the given centroid.
 TypeInformation[] getInputTypeRestriction()
          Get the input type restriction used for negotiating the data query.
protected  Logging getLogger()
          Get the (STATIC) logger for this class.
 CorrelationAnalysisSolution<V> run(Database database, Relation<V> relation)
          Computes quantitatively linear dependencies among the attributes of the given database based on a linear correlation PCA.
 
Methods inherited from class de.lmu.ifi.dbs.elki.algorithm.AbstractPrimitiveDistanceBasedAlgorithm
getDistanceFunction
 
Methods inherited from class de.lmu.ifi.dbs.elki.algorithm.AbstractAlgorithm
makeParameterDistanceFunction, run
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

logger

private static final Logging logger
The logger for this class.


DEPENDENCY_DERIVATOR_RANDOM_SAMPLE

public static final OptionID DEPENDENCY_DERIVATOR_RANDOM_SAMPLE
Flag to use random sample (use knn query around centroid, if flag is not set).


OUTPUT_ACCURACY_ID

public static final OptionID OUTPUT_ACCURACY_ID
Parameter to specify the threshold for output accuracy fraction digits, must be an integer equal to or greater than 0.


SAMPLE_SIZE_ID

public static final OptionID SAMPLE_SIZE_ID
Optional parameter to specify the treshold for the size of the random sample to use, must be an integer greater than 0.

Default value: the size of the complete dataset


sampleSize

private final int sampleSize
Holds the value of SAMPLE_SIZE_ID.


pca

private final PCAFilteredRunner<V extends NumberVector<V,?>> pca
Holds the object performing the pca.


NF

public final NumberFormat NF
Number format for output of solution.


randomsample

private final boolean randomsample
Flag for random sampling vs. kNN

Constructor Detail

DependencyDerivator

public DependencyDerivator(PrimitiveDistanceFunction<V,D> distanceFunction,
                           NumberFormat nf,
                           PCAFilteredRunner<V> pca,
                           int sampleSize,
                           boolean randomsample)
Constructor.

Parameters:
distanceFunction - distance function
nf - Number format
pca - PCA runner
sampleSize - sample size
randomsample - flag for random sampling
Method Detail

run

public CorrelationAnalysisSolution<V> run(Database database,
                                          Relation<V> relation)
                                                             throws IllegalStateException
Computes quantitatively linear dependencies among the attributes of the given database based on a linear correlation PCA.

Parameters:
database - the database to run this DependencyDerivator on
relation - the relation to use
Returns:
the CorrelationAnalysisSolution computed by this DependencyDerivator
Throws:
IllegalStateException

generateModel

public CorrelationAnalysisSolution<V> generateModel(Relation<V> db,
                                                    DBIDs ids)
Runs the pca on the given set of IDs. The centroid is computed from the given ids.

Parameters:
db - the database
ids - the set of ids
Returns:
a matrix of equations describing the dependencies

generateModel

public CorrelationAnalysisSolution<V> generateModel(Relation<V> db,
                                                    DBIDs ids,
                                                    V centroidDV)
Runs the pca on the given set of IDs and for the given centroid.

Parameters:
db - the database
ids - the set of ids
centroidDV - the centroid
Returns:
a matrix of equations describing the dependencies

getInputTypeRestriction

public TypeInformation[] getInputTypeRestriction()
Description copied from class: AbstractAlgorithm
Get the input type restriction used for negotiating the data query.

Specified by:
getInputTypeRestriction in interface Algorithm
Specified by:
getInputTypeRestriction in class AbstractAlgorithm<CorrelationAnalysisSolution<V extends NumberVector<V,?>>>
Returns:
Type restriction

getLogger

protected Logging getLogger()
Description copied from class: AbstractAlgorithm
Get the (STATIC) logger for this class.

Specified by:
getLogger in class AbstractAlgorithm<CorrelationAnalysisSolution<V extends NumberVector<V,?>>>
Returns:
the static logger

Release 0.4.0 (2011-09-20_1324)