ELKI: Environment for Developing KDD-Applications Supported by Index-Structures

Background

Data mining research leads to many algorithms for similar tasks. A fair and useful comparison of these algorithms is difficult due to several reasons:

  • Implementations of comparison partners are not at hand.
  • If implementations of different authors are provided, an evaluation in terms of efficiency is biased to evaluate the efforts of different authors in efficient programming instead of evaluating algorithmic merits.

On the other hand, efficient data management tools like index-structures can show considerable impact on data mining tasks and are therefore useful for a broad variety of algorithms.

In ELKI, data mining algorithms and data management tasks are separated and allow for an independent evaluation. This separation makes ELKI unique among data mining frameworks like Weka or YALE and frameworks for index structures like GiST. At the same time, ELKI is open to arbitrary data types, distance or similarity measures, or file formats. The fundamental approach is the independence of file parsers or database connections, data types, distances, distance functions, and data mining algorithms. Helper classes, e.g. for algebraic or analytic computations are available for all algorithms on equal terms.

With the development and publication of ELKI, we humbly hope to serve the data mining and database research community beneficially. The framework is essentially free for scientific usage ("free" as in "open source", see License for details). In case of application of ELKI in scientific publications, we would appreciate a citation of the most recent publication (see Publications).

The people behind ELKI are documented on the Team page.

Please note: Algorithms in ELKI are not tuned by implementation for individual efficiency but for fair comparability within ELKI. Runtime comparisons of algorithms using ELKI implementations with algorithms using implementations not based on the ELKI framework are likely to produce misleading results. See also Benchmarking.

The ELKI wiki

This website serves as future community development hub and task tracker for both bug reports, Tutorials, FAQ, general issues and development tasks.

Here are some pages to start reading at: Tutorial, FAQ, Algorithms, RelatedPublications, InputFormat, DataTypes, DistanceFunctions, DataSets, Development, Parameterization, JavaDoc, Visualization, Benchmarking, GeoMining

Please note that editing and some contents are only available for logged in users. Logins are available to active contributors only. However, since we appreciate all feedback, you are of course welcome to also contribute contents by email and other means!

Getting ELKI

You can download ELKI including source code on the Releases page. The release 0.4.0 now also has an explicit License statement: it is AGPLv3 licensed, a well-known open source license.

There is a list of Publications that accompany the ELKI releases. When using ELKI in your scientific work, it is appreciated if you cite one of these publications to give credit.

Bug Reports

You can browse the open tickets or create a new ticket.

We also appreciate any comments and suggestions. You can contact the core developers by e-mail: elki () dbs ifi lmu de

You can also subscribe the mailing list for users of ELKI via  https://tools.rz.ifi.lmu.de/mailman/listinfo/elki-user, to exchange questions and ideas among other users or to get announcements (e.g., new releases, major changes) by the ELKI team.