Environment for
DeveLoping
KDD-Applications
Supported by Index-Structures

de.lmu.ifi.dbs.elki.algorithm.statistics
Class DistanceStatisticsWithClasses<V extends DatabaseObject,D extends NumberDistance<D,?>>

java.lang.Object
  extended by de.lmu.ifi.dbs.elki.logging.AbstractLoggable
      extended by de.lmu.ifi.dbs.elki.algorithm.AbstractAlgorithm<O,R>
          extended by de.lmu.ifi.dbs.elki.algorithm.DistanceBasedAlgorithm<V,D,CollectionResult<DoubleVector>>
              extended by de.lmu.ifi.dbs.elki.algorithm.statistics.DistanceStatisticsWithClasses<V,D>
Type Parameters:
V - Vector type
D - Distance type
All Implemented Interfaces:
Algorithm<V,CollectionResult<DoubleVector>>, Parameterizable

@Title(value="Distance Histogram")
@Description(value="Computes a histogram over the distances occurring in the data set.")
public class DistanceStatisticsWithClasses<V extends DatabaseObject,D extends NumberDistance<D,?>>
extends DistanceBasedAlgorithm<V,D,CollectionResult<DoubleVector>>

Algorithm to gather statistics over the distance distribution in the data set.

Author:
Erich Schubert

Field Summary
private  boolean exact
          Sampling
private  Flag EXACT_FLAG
          Flag to enable sampling Key: -diststat.exact
static OptionID EXACT_ID
          OptionID for EXACT_FLAG
static OptionID HISTOGRAM_BINS_ID
          OptionID for HISTOGRAM_BINS_OPTION
private  IntParameter HISTOGRAM_BINS_OPTION
          Option to configure the number of bins to use.
private  int numbin
          Number of bins to use in sampling.
private  boolean sampling
          Sampling
private  Flag SAMPLING_FLAG
          Flag to enable sampling Key: -diststat.sampling
static OptionID SAMPLING_ID
          OptionID for SAMPLING_FLAG
 
Fields inherited from class de.lmu.ifi.dbs.elki.algorithm.DistanceBasedAlgorithm
DISTANCE_FUNCTION_ID, DISTANCE_FUNCTION_PARAM
 
Fields inherited from class de.lmu.ifi.dbs.elki.logging.AbstractLoggable
debug, logger
 
Constructor Summary
DistanceStatisticsWithClasses(Parameterization config)
          Constructor, adhering to Parameterizable
 
Method Summary
private  DoubleMinMax exactMinMax(Database<V> database, DistanceFunction<V,D> distFunc)
           
protected  CollectionResult<DoubleVector> runInTime(Database<V> database)
          Iterates over all points in the database.
private  DoubleMinMax sampleMinMax(Database<V> database, DistanceFunction<V,D> distFunc)
           
private  void shrinkHeap(TreeSet<FCPair<Double,Integer>> hotset, int k)
           
 
Methods inherited from class de.lmu.ifi.dbs.elki.algorithm.DistanceBasedAlgorithm
getDistanceFactory, getDistanceFunction
 
Methods inherited from class de.lmu.ifi.dbs.elki.algorithm.AbstractAlgorithm
isTime, isVerbose, run, setTime, setVerbose
 
Methods inherited from class de.lmu.ifi.dbs.elki.logging.AbstractLoggable
debugFine, debugFiner, debugFinest, exception, progress, verbose, warning
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

EXACT_ID

public static final OptionID EXACT_ID
OptionID for EXACT_FLAG


EXACT_FLAG

private final Flag EXACT_FLAG
Flag to enable sampling

Key: -diststat.exact


SAMPLING_ID

public static final OptionID SAMPLING_ID
OptionID for SAMPLING_FLAG


SAMPLING_FLAG

private final Flag SAMPLING_FLAG
Flag to enable sampling

Key: -diststat.sampling


HISTOGRAM_BINS_ID

public static final OptionID HISTOGRAM_BINS_ID
OptionID for HISTOGRAM_BINS_OPTION


HISTOGRAM_BINS_OPTION

private final IntParameter HISTOGRAM_BINS_OPTION
Option to configure the number of bins to use.

Key: -diststat.bins


numbin

private int numbin
Number of bins to use in sampling.


sampling

private boolean sampling
Sampling


exact

private boolean exact
Sampling

Constructor Detail

DistanceStatisticsWithClasses

public DistanceStatisticsWithClasses(Parameterization config)
Constructor, adhering to Parameterizable

Parameters:
config - Parameterization
Method Detail

runInTime

protected CollectionResult<DoubleVector> runInTime(Database<V> database)
                                            throws IllegalStateException
Iterates over all points in the database.

Specified by:
runInTime in class AbstractAlgorithm<V extends DatabaseObject,CollectionResult<DoubleVector>>
Parameters:
database - the database to run the algorithm on
Returns:
the Result computed by this algorithm
Throws:
IllegalStateException - if the algorithm has not been initialized properly (e.g. the setParameters(String[]) method has been failed to be called).

sampleMinMax

private DoubleMinMax sampleMinMax(Database<V> database,
                                  DistanceFunction<V,D> distFunc)

exactMinMax

private DoubleMinMax exactMinMax(Database<V> database,
                                 DistanceFunction<V,D> distFunc)

shrinkHeap

private void shrinkHeap(TreeSet<FCPair<Double,Integer>> hotset,
                        int k)

Release 0.3 (2010-03-31_1612)