See: Description
Package | Description |
---|---|
de.lmu.ifi.dbs.elki.algorithm |
Algorithms suitable as a task for the
KDDTask
main routine. |
de.lmu.ifi.dbs.elki.algorithm.benchmark |
Benchmarking pseudo algorithms.
|
de.lmu.ifi.dbs.elki.algorithm.classification |
Classification algorithms.
|
de.lmu.ifi.dbs.elki.algorithm.clustering |
Clustering algorithms
Clustering algorithms are supposed to implement the
Algorithm -Interface. |
de.lmu.ifi.dbs.elki.algorithm.clustering.affinitypropagation |
Affinity Propagation (AP) clustering.
|
de.lmu.ifi.dbs.elki.algorithm.clustering.biclustering |
Biclustering algorithms
|
de.lmu.ifi.dbs.elki.algorithm.clustering.correlation |
Correlation clustering algorithms
|
de.lmu.ifi.dbs.elki.algorithm.clustering.correlation.cash |
Helper classes for the
CASH
algorithm. |
de.lmu.ifi.dbs.elki.algorithm.clustering.em |
Expectation-Maximization clustering algorithm.
|
de.lmu.ifi.dbs.elki.algorithm.clustering.gdbscan |
Generalized DBSCAN
Generalized DBSCAN is an abstraction of the original DBSCAN idea,
that allows the use of arbitrary "neighborhood" and "core point" predicates.
|
de.lmu.ifi.dbs.elki.algorithm.clustering.gdbscan.parallel |
Parallel versions of Generalized DBSCAN.
|
de.lmu.ifi.dbs.elki.algorithm.clustering.gdbscan.util |
Utility classes for specialized DBSCAN implementations.
|
de.lmu.ifi.dbs.elki.algorithm.clustering.hierarchical |
Hierarchical agglomerative clustering (HAC).
|
de.lmu.ifi.dbs.elki.algorithm.clustering.hierarchical.birch |
BIRCH clustering.
|
de.lmu.ifi.dbs.elki.algorithm.clustering.hierarchical.extraction |
Extraction of partitional clusterings from hierarchical results.
|
de.lmu.ifi.dbs.elki.algorithm.clustering.hierarchical.linkage |
Linkages for hierarchical clustering.
|
de.lmu.ifi.dbs.elki.algorithm.clustering.kmeans |
K-means clustering and variations
|
de.lmu.ifi.dbs.elki.algorithm.clustering.kmeans.initialization |
Initialization strategies for k-means.
|
de.lmu.ifi.dbs.elki.algorithm.clustering.kmeans.parallel |
Parallelized implementations of k-means.
|
de.lmu.ifi.dbs.elki.algorithm.clustering.kmeans.quality |
Quality measures for k-Means results.
|
de.lmu.ifi.dbs.elki.algorithm.clustering.meta |
Meta clustering algorithms, that get their result from other clusterings or external sources.
|
de.lmu.ifi.dbs.elki.algorithm.clustering.onedimensional |
Clustering algorithms for one-dimensional data.
|
de.lmu.ifi.dbs.elki.algorithm.clustering.optics |
OPTICS family of clustering algorithms.
|
de.lmu.ifi.dbs.elki.algorithm.clustering.subspace |
Axis-parallel subspace clustering algorithms
The clustering algorithms in this package are instances of both, projected
clustering algorithms or subspace clustering algorithms according to the
classical but somewhat obsolete classification schema of clustering
algorithms for axis-parallel subspaces.
|
de.lmu.ifi.dbs.elki.algorithm.clustering.subspace.clique |
Helper classes for the
CLIQUE
algorithm. |
de.lmu.ifi.dbs.elki.algorithm.clustering.trivial |
Trivial clustering algorithms: all in one, no clusters, label clusterings
These methods are mostly useful for providing a reference result in
evaluation.
|
de.lmu.ifi.dbs.elki.algorithm.clustering.uncertain |
Clustering algorithms for uncertain data.
|
de.lmu.ifi.dbs.elki.algorithm.itemsetmining |
Algorithms for frequent itemset mining such as APRIORI.
|
de.lmu.ifi.dbs.elki.algorithm.itemsetmining.associationrules |
Association rule mining.
|
de.lmu.ifi.dbs.elki.algorithm.itemsetmining.associationrules.interest |
Association rule interestingness measures.
|
de.lmu.ifi.dbs.elki.algorithm.outlier |
Outlier detection algorithms
|
de.lmu.ifi.dbs.elki.algorithm.outlier.anglebased |
Angle-based outlier detection algorithms.
|
de.lmu.ifi.dbs.elki.algorithm.outlier.clustering |
Clustering based outlier detection.
|
de.lmu.ifi.dbs.elki.algorithm.outlier.distance |
Distance-based outlier detection algorithms, such as DBOutlier and kNN.
|
de.lmu.ifi.dbs.elki.algorithm.outlier.distance.parallel |
Parallel implementations of distance-based outlier detectors.
|
de.lmu.ifi.dbs.elki.algorithm.outlier.intrinsic |
Outlier detection algorithms based on intrinsic dimensionality.
|
de.lmu.ifi.dbs.elki.algorithm.outlier.lof |
LOF family of outlier detection algorithms
|
de.lmu.ifi.dbs.elki.algorithm.outlier.lof.parallel |
Parallelized variants of LOF.
|
de.lmu.ifi.dbs.elki.algorithm.outlier.meta |
Meta outlier detection algorithms: external scores, score rescaling
|
de.lmu.ifi.dbs.elki.algorithm.outlier.spatial |
Spatial outlier detection algorithms
|
de.lmu.ifi.dbs.elki.algorithm.outlier.spatial.neighborhood |
Spatial outlier neighborhood classes
|
de.lmu.ifi.dbs.elki.algorithm.outlier.spatial.neighborhood.weighted |
Weighted Neighborhood definitions
|
de.lmu.ifi.dbs.elki.algorithm.outlier.subspace |
Subspace outlier detection methods
Methods that detect outliers in subspaces (projections) of the data set.
|
de.lmu.ifi.dbs.elki.algorithm.outlier.svm |
Support-Vector-Machines for outlier detection.
|
de.lmu.ifi.dbs.elki.algorithm.outlier.trivial |
Trivial outlier detection algorithms: no outliers, all outliers, label
outliers.
|
de.lmu.ifi.dbs.elki.algorithm.projection |
Data projections (see also preprocessing filters for basic projections).
|
de.lmu.ifi.dbs.elki.algorithm.statistics |
Statistical analysis algorithms.
|
de.lmu.ifi.dbs.elki.algorithm.timeseries |
Algorithms for change point detection in time series.
|
Package | Description |
---|---|
de.lmu.ifi.dbs.elki.data |
Basic classes for different data types, database object types and label types
|
de.lmu.ifi.dbs.elki.data.model |
Cluster models classes for various algorithms
|
de.lmu.ifi.dbs.elki.data.projection |
Data projections
|
de.lmu.ifi.dbs.elki.data.projection.random |
Random projection families
|
de.lmu.ifi.dbs.elki.data.spatial |
Spatial data types - interfaces and utilities
|
de.lmu.ifi.dbs.elki.data.synthetic.bymodel |
Generator using a distribution model specified in an XML configuration file
GeneratorXMLSpec is a standalone
application that loads an XML specification file and generates a synthetic
data set according to the specifications given. |
de.lmu.ifi.dbs.elki.data.type |
Data type information, also used for type restrictions
|
de.lmu.ifi.dbs.elki.data.uncertain |
Uncertain data objects.
|
de.lmu.ifi.dbs.elki.data.uncertain.uncertainifier |
Classes to generate uncertain objects from existing certain data.
|
de.lmu.ifi.dbs.elki.distance.distancefunction |
Distance functions for use within ELKI.
|
de.lmu.ifi.dbs.elki.distance.distancefunction.adapter |
Distance functions deriving distances from, e.g., similarity measures
|
de.lmu.ifi.dbs.elki.distance.distancefunction.colorhistogram |
Distance functions using correlations
|
de.lmu.ifi.dbs.elki.distance.distancefunction.correlation |
Distance functions using correlations
|
de.lmu.ifi.dbs.elki.distance.distancefunction.external |
Distance functions using external data sources
|
de.lmu.ifi.dbs.elki.distance.distancefunction.geo |
Geographic (earth) distance functions
|
de.lmu.ifi.dbs.elki.distance.distancefunction.histogram |
Distance functions for one-dimensional histograms.
|
de.lmu.ifi.dbs.elki.distance.distancefunction.minkowski |
Minkowski space Lp norms such as the popular Euclidean and
Manhattan distances.
|
de.lmu.ifi.dbs.elki.distance.distancefunction.probabilistic |
Distance from probability theory, mostly divergences such as K-L-divergence,
J-divergence, F-divergence, χ²-divergence, etc.
|
de.lmu.ifi.dbs.elki.distance.distancefunction.set |
Distance functions for binary and set type data.
|
de.lmu.ifi.dbs.elki.distance.distancefunction.strings |
Distance functions for strings
|
de.lmu.ifi.dbs.elki.distance.distancefunction.subspace |
Distance functions based on subspaces
|
de.lmu.ifi.dbs.elki.distance.distancefunction.timeseries |
Distance functions designed for time series
Note that some regular distance functions (e.g., Euclidean) are also used on
time series.
|
de.lmu.ifi.dbs.elki.distance.similarityfunction |
Similarity functions
|
de.lmu.ifi.dbs.elki.distance.similarityfunction.cluster |
Similarity measures for comparing clusters.
|
de.lmu.ifi.dbs.elki.distance.similarityfunction.kernel |
Kernel functions.
|
Package | Description |
---|---|
de.lmu.ifi.dbs.elki.evaluation |
Functionality for the evaluation of algorithms.
|
de.lmu.ifi.dbs.elki.evaluation.classification |
Evaluation of classification algorithms.
|
de.lmu.ifi.dbs.elki.evaluation.classification.holdout |
Holdout and cross-validation strategies for evaluating classifiers.
|
de.lmu.ifi.dbs.elki.evaluation.clustering |
Evaluation of clustering results
|
de.lmu.ifi.dbs.elki.evaluation.clustering.extractor |
Classes to extract clusterings from hierarchical clustering.
|
de.lmu.ifi.dbs.elki.evaluation.clustering.internal |
Internal evaluation measures for clusterings.
|
de.lmu.ifi.dbs.elki.evaluation.clustering.pairsegments |
Pair-segment analysis of multiple clusterings
|
de.lmu.ifi.dbs.elki.evaluation.index |
Simple index evaluation methods
|
de.lmu.ifi.dbs.elki.evaluation.outlier |
Evaluate an outlier score using a misclassification based cost model
|
de.lmu.ifi.dbs.elki.evaluation.scores |
Evaluation of rankings and scorings
|
de.lmu.ifi.dbs.elki.evaluation.scores.adapter |
Adapter classes for ranking and scoring measures.
|
de.lmu.ifi.dbs.elki.evaluation.similaritymatrix |
Render a distance matrix to visualize a clustering-distance-combination.
|
Package | Description |
---|---|
de.lmu.ifi.dbs.elki |
ELKI framework "Environment for Developing KDD-Applications Supported by
Index-Structures".
|
de.lmu.ifi.dbs.elki.application |
Base classes for standalone applications.
|
de.lmu.ifi.dbs.elki.application.cache |
Utility applications for the persistence layer such as distance cache
builders.
|
de.lmu.ifi.dbs.elki.application.experiments |
Packaged experiments to make them easy to reproduce.
|
de.lmu.ifi.dbs.elki.application.greedyensemble |
Greedy ensembles for outlier detection.
|
de.lmu.ifi.dbs.elki.application.internal |
Internal utilities for development
|
de.lmu.ifi.dbs.elki.logging |
Logging facility for controlling logging behavior of the complete framework.
|
de.lmu.ifi.dbs.elki.logging.progress |
Progress status objects (for UI)
|
de.lmu.ifi.dbs.elki.logging.statistics |
Classes for logging various statistics.
|
de.lmu.ifi.dbs.elki.math |
Mathematical operations and utilities used throughout the framework
|
de.lmu.ifi.dbs.elki.math.geodesy |
Functions for computing on the sphere / earth.
|
de.lmu.ifi.dbs.elki.math.geometry |
Algorithms from computational geometry
|
de.lmu.ifi.dbs.elki.math.linearalgebra |
The linear algebra package provides classes and computational methods for
operations on matrices and vectors.
|
de.lmu.ifi.dbs.elki.math.linearalgebra.fitting |
Function to numerically fit a function (such as a
Gaussian distribution ) to given data. |
de.lmu.ifi.dbs.elki.math.linearalgebra.pca |
Principal Component Analysis (PCA) and Eigenvector processing
|
de.lmu.ifi.dbs.elki.math.linearalgebra.pca.filter |
Filter eigenvectors based on their eigenvalues.
|
de.lmu.ifi.dbs.elki.math.linearalgebra.pca.weightfunctions |
Weight functions used in weighted PCA via
WeightedCovarianceMatrixBuilder |
de.lmu.ifi.dbs.elki.math.scales |
Scales handling for plotting
|
de.lmu.ifi.dbs.elki.math.spacefillingcurves |
Space filling curves
|
de.lmu.ifi.dbs.elki.math.statistics |
Statistical tests and methods
|
de.lmu.ifi.dbs.elki.math.statistics.dependence |
Statistical measures of dependence, such as correlation
|
de.lmu.ifi.dbs.elki.math.statistics.distribution |
Standard distributions, with random generation functionalities
|
de.lmu.ifi.dbs.elki.math.statistics.distribution.estimator |
Estimators for statistical distributions.
|
de.lmu.ifi.dbs.elki.math.statistics.distribution.estimator.meta |
Meta estimators: estimators that do not actually estimate themselves, but instead use other estimators, e.g. on a trimmed data set, or as an ensemble.
|
de.lmu.ifi.dbs.elki.math.statistics.intrinsicdimensionality |
Methods for estimating the intrinsic dimensionality.
|
de.lmu.ifi.dbs.elki.math.statistics.kernelfunctions |
Kernel functions from statistics.
|
de.lmu.ifi.dbs.elki.math.statistics.tests |
Statistical tests
|
de.lmu.ifi.dbs.elki.parallel |
Parallel processing core for ELKI.
|
de.lmu.ifi.dbs.elki.parallel.processor |
Processor API of ELKI, and some essential shared processors.
|
de.lmu.ifi.dbs.elki.parallel.variables |
Variables are instantiated for each thread, and allow passing values from
one processor to another within the same thread.
|
de.lmu.ifi.dbs.elki.result |
Result types, representation and handling
|
de.lmu.ifi.dbs.elki.result.outlier |
Outlier result classes
|
de.lmu.ifi.dbs.elki.result.textwriter |
Text serialization (CSV, Gnuplot, Console, ...)
|
de.lmu.ifi.dbs.elki.result.textwriter.naming |
Naming schemes for clusters (for output when an algorithm doesn't generate
cluster names).
|
de.lmu.ifi.dbs.elki.result.textwriter.writers |
Serialization handlers for individual data types.
|
de.lmu.ifi.dbs.elki.utilities |
Utility and helper classes - commonly used data structures, output
formatting, exceptions, ...
|
de.lmu.ifi.dbs.elki.utilities.datastructures |
Basic memory structures such as heaps and object hierarchies
|
de.lmu.ifi.dbs.elki.utilities.datastructures.arraylike |
Common API for accessing objects that are "array-like", including lists,
numerical vectors, database vectors and arrays.
|
de.lmu.ifi.dbs.elki.utilities.datastructures.arrays |
Utilities for arrays: advanced sorting for primitvie arrays
|
de.lmu.ifi.dbs.elki.utilities.datastructures.heap |
Heap structures and variations such as bounded priority heaps
|
de.lmu.ifi.dbs.elki.utilities.datastructures.hierarchy |
Delegate implementation of a hierarchy
|
de.lmu.ifi.dbs.elki.utilities.datastructures.histogram |
Classes for computing histograms
This package contains two families of histograms.
|
de.lmu.ifi.dbs.elki.utilities.datastructures.iterator |
ELKI Iterator API
ELKI uses a custom iterator API instead of the usual
Iterator classes (the "Java Collections API"). |
de.lmu.ifi.dbs.elki.utilities.datastructures.range |
Ranges of values.
|
de.lmu.ifi.dbs.elki.utilities.datastructures.unionfind |
Union-find data structures.
|
de.lmu.ifi.dbs.elki.utilities.documentation |
Documentation utilities: Annotations for Title, Description, Reference
|
de.lmu.ifi.dbs.elki.utilities.ensemble |
Utility classes for simple ensembles
|
de.lmu.ifi.dbs.elki.utilities.exceptions |
Exception classes and common exception messages.
|
de.lmu.ifi.dbs.elki.utilities.io |
Utility classes for input/output.
|
de.lmu.ifi.dbs.elki.utilities.optionhandling |
Parameter handling and option descriptions.
|
de.lmu.ifi.dbs.elki.utilities.optionhandling.constraints |
Constraints allow to restrict possible values for parameters
|
de.lmu.ifi.dbs.elki.utilities.optionhandling.parameterization |
Configuration managers
See the
de.lmu.ifi.dbs.elki.utilities.optionhandling package for
documentation! |
de.lmu.ifi.dbs.elki.utilities.optionhandling.parameters |
Classes for various typed parameters
See the
de.lmu.ifi.dbs.elki.utilities.optionhandling package for
documentation! |
de.lmu.ifi.dbs.elki.utilities.pairs |
Pairs utility classes
A number of commonly needed primitive pairs are the following:
IntIntPair storing
two int values
DoubleIntPair
storing one double and one int value. |
de.lmu.ifi.dbs.elki.utilities.random |
Random number generation.
|
de.lmu.ifi.dbs.elki.utilities.referencepoints |
Package containing strategies to obtain reference points
Shared code for various algorithms that use reference points
|
de.lmu.ifi.dbs.elki.utilities.scaling |
Scaling functions: linear, logarithmic, gamma, clipping, ...
|
de.lmu.ifi.dbs.elki.utilities.scaling.outlier |
Scaling of outlier scores, that require a statistical analysis of the
occurring values
|
de.lmu.ifi.dbs.elki.utilities.xml |
XML and XHTML utilities
|
de.lmu.ifi.dbs.elki.workflow |
Work flow packages, e.g., following the usual KDD model.
|
Package | Description |
---|---|
tutorial.clustering |
Classes from the tutorial on implementing a custom k-means variation
|
tutorial.distancefunction |
Classes from the tutorial on implementing distance functions
|
tutorial.javaapi |
Examples how to invoke ELKI from Java.
|
tutorial.outlier |
Tutorials on implementing outlier detection methods in ELKI.
|
ELKI: Environment for DeveLoping KDD-Applications Supported by Index-Structures.
ELKI is a generic framework for a broad range of KDD-applications and their development. For background, contact-information, and contributors see https://elki-project.github.io/.
This is the documentation for version 0.7.5, published as:
Erich Schubert and Arthur Zimek:
ELKI: A large open-source library for data analysis
CoRR arXiv 1902.03616
The ELKI website contains additional documentation. A Tutorial exported is included with this documentation and a good place to start.
To use the KDD-Framework we recommend an executable .jar-file:
elki.jar. Since release 0.3 it will by default invoke a minimalistic GUI called MiniGUI when
you call java -jar elki.jar
. For command line use (for example for batch processing and scripted operation),
you can get a description of usage by calling java -jar elki.jar KDDCLIApplication -h
.
The MiniGUI can also serve as a utility for building command lines, as it will print the full command line to the log window.
For more information on using files and available formats
as data input see de.lmu.ifi.dbs.elki.datasource.parser
. ELKI uses
a whitespace separated vector format by default, but there also is a parser for
ARFF files included that can read most ARFF files (mixing sparse and dense vectors is currently not allowed).
An extensive list of parameters can be browsed sorted by class or sorted by option ID.
Some examples of completely parameterized calls for different algorithms are described at example calls.
A list of related publications, giving details on many implemented algorithms, can be found in the class article references list.
The database connection manages reading of input files or databases and provides a
Database
-Object - including index structures - as a virtual database to the KDDTask.
The KDDTask applies a specified algorithm on this database and collects the result from the algorithm.
Finally, KDDTask hands on the obtained result to a ResultHandler
.
The default-handler is ResultWriter
, writing the result to STDOUT or,
if specified, into a file.
The database and indexing layer is a key component of ELKI.
This is not just a storage for double[]
, as with many other frameworks.
It can store various types of objects, and the integrated index structures provide access to fast
distance
,
similarity
,
kNN
,
RkNN
and
range
query methods
for a variety of distance functions.
The standard flow for initializing a database is as depicted here:
The standard stream-based data sources such as
FileBasedDatabaseConnection
will open the stream, feed the contents through a
Parser
to obtain an initial
MultipleObjectsBundle
. This is
a temporary container for the data, which can then be modified by arbitrary
ObjectFilter
s.
In the end, the
MultipleObjectsBundle
is bulk-inserted into a Database
, which will then
invoke its IndexFactory
s to add
Index
instances to the appropriate relations.
When a request for a
distance
,
similarity
,
kNN
,
RkNN
or
range
query is received by the database,
it queries all indexes if they have support for this query. If so, an optimized query is returned,
otherwise a linear scan query can be returned unless
DatabaseQuery.HINT_OPTIMIZED_ONLY
was given.
For this optimization to work, you should be using the proper APIs of the
Database
interface or
QueryUtil
helper where possible, instead of
initializing low level classes such as an explicit linear scan query.
For efficiency, try to instantiate the query only once per algorithm run, and avoid running the optimization step for every object.
A good place to get started is to have a look at some of the existing algorithms,
and see how they are implemented.
For example the DummyAlgorithm
while it does not produce any result, will teach you how to perform
k-nearest-neighbor queries properly. It does however have a hard dependency on the
Euclidean distance and the datatypes supported by it. In order to support arbitrary
distance functions, extend the class
AbstractDistanceBasedAlgorithm
instead. This is another simple example, this time for obtaining a class parameter.
Visit the ELKI Wiki, which has a growing amount of documentation. You are also welcome to contribute, of course!
ELKI is designed for command-line, GUI and Java operation. For command-line and GUI, an extensive help functionality is provided along with input assistance. Therefore, you should also support the parameterizable API. The requirements are quite different from regular Java constructors, and cannot be expressed in terms of a Java API.
For useful error reporting and input assistance in the GUI we need to have more extensive
typing than Java uses (for example we might need numerical constraints) and we also want to be able
to report more than one error at a time. In ELKI 0.4, much of the parameterization was
refactored to static helper classes usually found as a public static class Parameterizer
and subclasses of
AbstractParameterizer
.
Keep the complexity of Parameterizer classes and constructors invoked by these classes low, since these may be heavily used during the parameterization step. Postpone any extensive initialization to the main algorithm invocation step!
Copyright © 2019 ELKI Development Team. License information.