See: Description
Interface | Description |
---|---|
DatabaseQuery |
General interface for database queries.
|
DistanceSimilarityQuery<O,D extends Distance<D>> |
Interface that is a combination of distance and a similarity function.
|
LinearScanQuery |
Marker interface for linear scan (slow, non-accelerated) queries.
|
Class | Description |
---|---|
AbstractDataBasedQuery<O> |
Abstract query bound to a certain representation.
|
Database queries - computing distances, neighbors, similarities - API and general documentation.
The database query API is designed around the concept of prepared statements.
When working with index structures, preprocessors, caches and external data, computing a distance or a neighborhood is not as simple as running a constant time function. Some functions may only be defined on a subset of the data, others can be computed much more efficiently by performing a batch operation. When plenty of memory is available, caching can be faster than recomputing distances all the time. And often there will be more than one way of computing the same data (for example by using an index or doing a linear scan).
Usually, these operations are invoked very often. Even deciding which method to use at every iteration can prove quite costly when the number of iterations becomes large. Therefore the goal is to "optimize" once, then invoke the same handler cheaply. This can be achieved by using "prepared statements" as this would be called in a traditional RDBMS context.
Prepared statements in ELKI are currently available for:
DistanceQuery
SimilarityQuery
KNNQuery
RangeQuery
RKNNQuery
The general process of obtaining a Query is to retrieve it from the database using:
Database.getDistanceQuery(distance)
Database.getSimilarityQuery(similarity)
Database.getKNNQuery(distance)
Database.getRangeQuery(distance)
Database.getRKNNQuery(distance)
The query can then be evaluated on objects as needed.
In order to assist the database layer to choose the most suitable implementation, one should also provide so called "hints" as available. In general, any object could be a "hint" to the database layer (for extensibility), but the following are commonly used:
DatabaseQuery.HINT_BULK
to request support for bulk operationsDatabaseQuery.HINT_EXACT
to exclude approximate answersDatabaseQuery.HINT_HEAVY_USE
to suggest the use of a cache or preprocessorDatabaseQuery.HINT_OPTIMIZED_ONLY
to disallow linear scansDatabaseQuery.HINT_SINGLE
to disallow expensive optimizations, since the query will only be used onceDatabaseQuery.HINT_NO_CACHE
to disallow retrieving a cache classPlease set these hints appropriately, since this can effect your algorithms performance!
// Get a kNN query with maxk = 10 KNNQuery<V, DoubleDistance> knnQuery = database.getKNNQuery(EuclideanDistanceFunction.STATIC, 10); // run a 10NN query for each point, discarding the results for(DBID id : database) { knnQuery.getKNNForDBID(id, 10); }