
V - the type of FeatureVector handled by this AlgorithmD - the type of Distance used by this Algorithm@Title(value="Dependency Derivator: Deriving numerical inter-dependencies on data") @Description(value="Derives an equality-system describing dependencies between attributes in a correlation-cluster") @Reference(authors="E. Achtert, C. B\u00f6hm, H.-P. Kriegel, P. Kr\u00f6ger, A. Zimek", title="Deriving Quantitative Dependencies for Correlation Clusters", booktitle="Proc. 12th Int. Conf. on Knowledge Discovery and Data Mining (KDD \'06), Philadelphia, PA 2006.", url="http://dx.doi.org/10.1145/1150402.1150408") public class DependencyDerivator<V extends NumberVector<?>,D extends Distance<D>> extends AbstractPrimitiveDistanceBasedAlgorithm<V,D,CorrelationAnalysisSolution<V>>
Dependency derivator computes quantitatively linear dependencies among attributes of a given dataset based on a linear correlation PCA.
 Reference: 
 E. Achtert, C. Böhm, H.-P. Kriegel, P. Kröger, A. Zimek: Deriving
 Quantitative Dependencies for Correlation Clusters. 
 In Proc. 12th Int. Conf. on Knowledge Discovery and Data Mining (KDD '06),
 Philadelphia, PA 2006. 
| Modifier and Type | Class and Description | 
|---|---|
| static class  | DependencyDerivator.Parameterizer<V extends NumberVector<?>,D extends Distance<D>>Parameterization class. | 
| Modifier and Type | Field and Description | 
|---|---|
| static OptionID | DEPENDENCY_DERIVATOR_RANDOM_SAMPLEFlag to use random sample (use knn query around centroid, if flag is not
 set). | 
| private static Logging | LOGThe logger for this class. | 
| private NumberFormat | nfNumber format for output of solution. | 
| static OptionID | OUTPUT_ACCURACY_IDParameter to specify the threshold for output accuracy fraction digits,
 must be an integer equal to or greater than 0. | 
| private PCAFilteredRunner<V> | pcaHolds the object performing the pca. | 
| private boolean | randomsampleFlag for random sampling vs. kNN | 
| static OptionID | SAMPLE_SIZE_IDOptional parameter to specify the threshold for the size of the random
 sample to use, must be an integer greater than 0. | 
| private int | sampleSizeHolds the value of  SAMPLE_SIZE_ID. | 
| Constructor and Description | 
|---|
| DependencyDerivator(PrimitiveDistanceFunction<V,D> distanceFunction,
                   NumberFormat nf,
                   PCAFilteredRunner<V> pca,
                   int sampleSize,
                   boolean randomsample)Constructor. | 
| Modifier and Type | Method and Description | 
|---|---|
| CorrelationAnalysisSolution<V> | generateModel(Relation<V> db,
             DBIDs ids)Runs the pca on the given set of IDs. | 
| CorrelationAnalysisSolution<V> | generateModel(Relation<V> db,
             DBIDs ids,
             Vector centroid)Runs the pca on the given set of IDs and for the given centroid. | 
| TypeInformation[] | getInputTypeRestriction()Get the input type restriction used for negotiating the data query. | 
| protected Logging | getLogger()Get the (STATIC) logger for this class. | 
| CorrelationAnalysisSolution<V> | run(Database database,
   Relation<V> relation)Computes quantitatively linear dependencies among the attributes of the
 given database based on a linear correlation PCA. | 
getDistanceFunctionmakeParameterDistanceFunction, runprivate static final Logging LOG
public static final OptionID DEPENDENCY_DERIVATOR_RANDOM_SAMPLE
public static final OptionID OUTPUT_ACCURACY_ID
public static final OptionID SAMPLE_SIZE_ID
Default value: the size of the complete dataset
private final int sampleSize
SAMPLE_SIZE_ID.private final PCAFilteredRunner<V extends NumberVector<?>> pca
private final NumberFormat nf
private final boolean randomsample
public DependencyDerivator(PrimitiveDistanceFunction<V,D> distanceFunction, NumberFormat nf, PCAFilteredRunner<V> pca, int sampleSize, boolean randomsample)
distanceFunction - distance functionnf - Number formatpca - PCA runnersampleSize - sample sizerandomsample - flag for random samplingpublic CorrelationAnalysisSolution<V> run(Database database, Relation<V> relation)
database - the database to run this DependencyDerivator onrelation - the relation to usepublic CorrelationAnalysisSolution<V> generateModel(Relation<V> db, DBIDs ids)
db - the databaseids - the set of idspublic CorrelationAnalysisSolution<V> generateModel(Relation<V> db, DBIDs ids, Vector centroid)
db - the databaseids - the set of idscentroid - the centroidpublic TypeInformation[] getInputTypeRestriction()
AbstractAlgorithmgetInputTypeRestriction in interface AlgorithmgetInputTypeRestriction in class AbstractAlgorithm<CorrelationAnalysisSolution<V extends NumberVector<?>>>protected Logging getLogger()
AbstractAlgorithmgetLogger in class AbstractAlgorithm<CorrelationAnalysisSolution<V extends NumberVector<?>>>