@Reference(authors="E. Hellinger",title="Neue Begr\u00fcndung der Theorie quadratischer Formen von unendlichvielen Ver\u00e4nderlichen",booktitle="Journal f\u00fcr die reine und angewandte Mathematik",url="http://resolver.sub.uni-goettingen.de/purl?GDZPPN002166941",bibkey="journals/mathematik/Hellinger1909") @Reference(authors="M.-M. Deza, E. Deza",title="Dictionary of distances",booktitle="Dictionary of distances",url="https://doi.org/10.1007/978-3-642-00234-2",bibkey="doi:10.1007/978-3-642-00234-2") @Alias(value={"hellinger","bhattacharyya"}) public class HellingerDistanceFunction extends AbstractNumberVectorDistanceFunction implements SpatialPrimitiveDistanceFunction<NumberVector>, NormalizedPrimitiveSimilarityFunction<NumberVector>
We assume vectors represent normalized probability distributions. Then \[\text{Hellinger}(\vec{x},\vec{y}):= \sqrt{\tfrac12\sum\nolimits_i \left(\sqrt{x_i}-\sqrt{y_i}\right)^2 } \]
The corresponding kernel / similarity is \[ K_{\text{Hellinger}}(\vec{x},\vec{y}) := \sum\nolimits_i \sqrt{x_i y_i} \]
If we have normalized probability distributions, we have the nice property that \( K_{\text{Hellinger}}(\vec{x},\vec{x}) = \sum\nolimits_i x_i = 1\). and therefore \( K_{\text{Hellinger}}(\vec{x},\vec{y}) \in [0:1] \).
Furthermore, we have the following relationship between this variant of the distance and this kernel: \[ \text{Hellinger}^2(\vec{x},\vec{y}) = \tfrac12\sum\nolimits_i \left(\sqrt{x_i}-\sqrt{y_i}\right)^2 = \tfrac12\sum\nolimits_i x_i + y_i - 2 \sqrt{x_i y_i} \] \[ \text{Hellinger}^2(\vec{x},\vec{y}) = \tfrac12K_{\text{Hellinger}}(\vec{x},\vec{x}) + \tfrac12K_{\text{Hellinger}}(\vec{y},\vec{y}) - K_{\text{Hellinger}}(\vec{x},\vec{y}) = 1 - K_{\text{Hellinger}}(\vec{x},\vec{y}) \] which implies \(\text{Hellinger}(\vec{x},\vec{y}) \in [0;1]\), and is very similar to the Euclidean distance and the linear kernel.
From this, it follows trivially that Hellinger distance corresponds to the kernel transformation \(\phi:\vec{x}\mapsto(\tfrac12\sqrt{x_1},\ldots,\tfrac12\sqrt{x_d})\).
Deza and Deza unfortunately also give a second definition, as: \[\text{Hellinger-Deza}(\vec{x},\vec{y}):=\sqrt{2\sum\nolimits_i \left(\sqrt{\tfrac{x_i}{\bar{x}}}-\sqrt{\tfrac{y_i}{\bar{y}}}\right)^2}\] which has a built-in normalization, and a different scaling that is no longer bound to $[0;1]$. The 2 in this definition likely should be a \(\frac12\).
This distance is well suited for histograms, but it is then more efficient to once normalize the histograms, apply the square roots, and then use Euclidean distance (i.e., use the "kernel trick" in reverse, materializing the transformation \(\phi\) given above).
Reference:
E. Hellinger
Neue Begründung der Theorie quadratischer Formen von unendlichvielen
Veränderlichen
Journal für die reine und angewandte Mathematik
M.-M. Deza, E. Deza
Dictionary of distances
TODO: support acceleration for sparse vectors.
TODO: add a second variant, with built-in normalization.
Modifier and Type | Class and Description |
---|---|
static class |
HellingerDistanceFunction.Parameterizer
Parameterization class.
|
Modifier and Type | Field and Description |
---|---|
private static java.lang.String |
NON_NEGATIVE
Assertion error message.
|
static HellingerDistanceFunction |
STATIC
Static instance.
|
Constructor and Description |
---|
HellingerDistanceFunction()
Deprecated.
|
Modifier and Type | Method and Description |
---|---|
double |
distance(NumberVector fv1,
NumberVector fv2)
Computes the distance between two given DatabaseObjects according to this
distance function.
|
boolean |
equals(java.lang.Object obj) |
SimpleTypeInformation<? super NumberVector> |
getInputTypeRestriction()
Get the input data type of the function.
|
int |
hashCode() |
<T extends NumberVector> |
instantiate(Relation<T> database)
Instantiate with a database to get the actual distance query.
|
boolean |
isMetric()
Is this distance function metric (satisfy the triangle inequality)
|
boolean |
isSymmetric()
Is this function symmetric?
|
double |
minDist(SpatialComparable mbr1,
SpatialComparable mbr2)
Computes the distance between the two given MBRs according to this distance
function.
|
double |
similarity(NumberVector o1,
NumberVector o2)
Computes the similarity between two given DatabaseObjects according to this
similarity function.
|
dimensionality, dimensionality, dimensionality, dimensionality, dimensionality, dimensionality, dimensionality, dimensionality
clone, finalize, getClass, notify, notifyAll, toString, wait, wait, wait
isSquared
public static final HellingerDistanceFunction STATIC
private static final java.lang.String NON_NEGATIVE
@Deprecated public HellingerDistanceFunction()
STATIC
!public double distance(NumberVector fv1, NumberVector fv2)
PrimitiveDistanceFunction
distance
in interface NumberVectorDistanceFunction<NumberVector>
distance
in interface PrimitiveDistanceFunction<NumberVector>
fv1
- first DatabaseObjectfv2
- second DatabaseObjectpublic double minDist(SpatialComparable mbr1, SpatialComparable mbr2)
SpatialPrimitiveDistanceFunction
minDist
in interface SpatialPrimitiveDistanceFunction<NumberVector>
mbr1
- the first MBR objectmbr2
- the second MBR objectpublic double similarity(NumberVector o1, NumberVector o2)
PrimitiveSimilarityFunction
similarity
in interface PrimitiveSimilarityFunction<NumberVector>
o1
- first DatabaseObjecto2
- second DatabaseObjectpublic boolean isSymmetric()
DistanceFunction
isSymmetric
in interface DistanceFunction<NumberVector>
isSymmetric
in interface SimilarityFunction<NumberVector>
true
when symmetricpublic boolean isMetric()
DistanceFunction
isMetric
in interface DistanceFunction<NumberVector>
true
when metric.public <T extends NumberVector> SpatialPrimitiveDistanceSimilarityQuery<T> instantiate(Relation<T> database)
DistanceFunction
instantiate
in interface DistanceFunction<NumberVector>
instantiate
in interface PrimitiveDistanceFunction<NumberVector>
instantiate
in interface SpatialPrimitiveDistanceFunction<NumberVector>
instantiate
in interface PrimitiveSimilarityFunction<NumberVector>
instantiate
in interface SimilarityFunction<NumberVector>
database
- The representation to usepublic SimpleTypeInformation<? super NumberVector> getInputTypeRestriction()
DistanceFunction
getInputTypeRestriction
in interface DistanceFunction<NumberVector>
getInputTypeRestriction
in interface PrimitiveDistanceFunction<NumberVector>
getInputTypeRestriction
in interface SimilarityFunction<NumberVector>
getInputTypeRestriction
in class AbstractNumberVectorDistanceFunction
public boolean equals(java.lang.Object obj)
equals
in class java.lang.Object
public int hashCode()
hashCode
in class java.lang.Object
Copyright © 2019 ELKI Development Team. License information.