O
- Object typeD
- Distance type@Title(value="Distance Histogram") @Description(value="Computes a histogram over the distances occurring in the data set.") public class DistanceStatisticsWithClasses<O,D extends NumberDistance<D,?>> extends AbstractDistanceBasedAlgorithm<O,D,CollectionResult<DoubleVector>>
Modifier and Type | Class and Description |
---|---|
static class |
DistanceStatisticsWithClasses.Parameterizer<O,D extends NumberDistance<D,?>>
Parameterization class.
|
Modifier and Type | Field and Description |
---|---|
private boolean |
exact
Compute exactly (slower).
|
static OptionID |
EXACT_ID
Flag to compute exact value range for binning.
|
static OptionID |
HISTOGRAM_BINS_ID
Option to configure the number of bins to use.
|
private static Logging |
LOG
The logger for this class.
|
private int |
numbin
Number of bins to use in sampling.
|
private boolean |
sampling
Sampling flag.
|
static OptionID |
SAMPLING_ID
Flag to enable sampling.
|
DISTANCE_FUNCTION_ID
Constructor and Description |
---|
DistanceStatisticsWithClasses(DistanceFunction<? super O,D> distanceFunction,
int numbins,
boolean exact,
boolean sampling)
Constructor.
|
Modifier and Type | Method and Description |
---|---|
private DoubleMinMax |
exactMinMax(Relation<O> relation,
DistanceQuery<O,D> distFunc)
Compute the exact maximum and minimum.
|
TypeInformation[] |
getInputTypeRestriction()
Get the input type restriction used for negotiating the data query.
|
protected Logging |
getLogger()
Get the (STATIC) logger for this class.
|
HistogramResult<DoubleVector> |
run(Database database)
Runs the algorithm.
|
private DoubleMinMax |
sampleMinMax(Relation<O> relation,
DistanceQuery<O,D> distFunc)
Estimate minimum and maximum via sampling.
|
private static void |
shrinkHeap(TreeSet<DoubleDBIDPair> hotset,
int k)
Shrink the heap of "hot" (extreme) items.
|
getDistanceFunction
makeParameterDistanceFunction
private static final Logging LOG
public static final OptionID EXACT_ID
public static final OptionID SAMPLING_ID
public static final OptionID HISTOGRAM_BINS_ID
private int numbin
private boolean sampling
private boolean exact
public DistanceStatisticsWithClasses(DistanceFunction<? super O,D> distanceFunction, int numbins, boolean exact, boolean sampling)
distanceFunction
- Distance function to usenumbins
- Number of binsexact
- Exactness flagsampling
- Sampling flagpublic HistogramResult<DoubleVector> run(Database database)
Algorithm
run
in interface Algorithm
run
in class AbstractAlgorithm<CollectionResult<DoubleVector>>
database
- the database to run the algorithm onprivate DoubleMinMax sampleMinMax(Relation<O> relation, DistanceQuery<O,D> distFunc)
relation
- Relation to processdistFunc
- Distance function to useprivate DoubleMinMax exactMinMax(Relation<O> relation, DistanceQuery<O,D> distFunc)
relation
- Relation to processdistFunc
- Distance functionprivate static void shrinkHeap(TreeSet<DoubleDBIDPair> hotset, int k)
hotset
- Set of hot itemsk
- target sizepublic TypeInformation[] getInputTypeRestriction()
AbstractAlgorithm
getInputTypeRestriction
in interface Algorithm
getInputTypeRestriction
in class AbstractAlgorithm<CollectionResult<DoubleVector>>
protected Logging getLogger()
AbstractAlgorithm
getLogger
in class AbstractAlgorithm<CollectionResult<DoubleVector>>