See: Description
Interface | Description |
---|---|
Parser |
A Parser shall provide a ParsingResult by parsing an InputStream.
|
StreamingParser |
Interface for streaming parsers, that may be much more efficient in
combination with filters.
|
Class | Description |
---|---|
AbstractStreamingParser |
Base class for streaming parsers.
|
AbstractStreamingParser.Parameterizer |
Parameterization class.
|
ArffParser |
Parser to load WEKA .arff files into ELKI.
|
ArffParser.Parameterizer |
Parameterization class.
|
BitVectorLabelParser |
Parser for parsing one BitVector per line, bits separated by whitespace.
|
BitVectorLabelParser.Parameterizer |
Parameterization class.
|
CategorialDataAsNumberVectorParser<V extends NumberVector> |
A very simple parser for categorial data, which will then be encoded as
numbers.
|
CategorialDataAsNumberVectorParser.Parameterizer<V extends NumberVector> |
Parameterization class.
|
ClusteringVectorParser |
Parser for simple clustering results in vector form, as written by
ClusteringVectorDumper . |
ClusteringVectorParser.Parameterizer |
Parameterization class.
|
CSVReaderFormat |
Basic format factory for parsing CSV-like formats.
|
CSVReaderFormat.Parameterizer |
Parameterization class.
|
LibSVMFormatParser<V extends SparseNumberVector> |
Parser to read libSVM format files.
|
LibSVMFormatParser.Parameterizer<V extends SparseNumberVector> |
Parameterization class.
|
NumberVectorLabelParser<V extends NumberVector> |
Parser for a simple CSV type of format, with columns separated by the given
pattern (default: whitespace).
|
NumberVectorLabelParser.Parameterizer<V extends NumberVector> |
Parameterization class.
|
SimplePolygonParser |
Parser to load polygon data (2D and 3D only) from a simple format.
|
SimplePolygonParser.Parameterizer |
Parameterization class.
|
SimpleTransactionParser |
Simple parser for transactional data, such as market baskets.
|
SimpleTransactionParser.Parameterizer |
Parameterization class.
|
SparseNumberVectorLabelParser<V extends SparseNumberVector> |
Parser for parsing one point per line, attributes separated by whitespace.
|
SparseNumberVectorLabelParser.Parameterizer<V extends SparseNumberVector> |
Parameterization class.
|
StringParser |
Parser that loads a text file for use with string similarity measures.
|
StringParser.Parameterizer |
Parameterization class.
|
TermFrequencyParser<V extends SparseNumberVector> |
A parser to load term frequency data, which essentially are sparse vectors
with text keys.
|
TermFrequencyParser.Parameterizer<V extends SparseNumberVector> |
Parameterization class.
|
Parsers for different file formats and data types.
The general use-case for any parser is to create objects out of an
InputStream
(e.g. by reading a data file).
The objects are packed in a
MultipleObjectsBundle
which,
in turn, is used by a DatabaseConnection
-Object
to fill a Database
containing the corresponding objects.
By default (i.e., if the user does not specify any specific requests),
any KDDTask
will
use the StaticArrayDatabase
which,
in turn, will use a FileBasedDatabaseConnection
and a NumberVectorLabelParser
to parse a specified data file creating
a StaticArrayDatabase
containing DoubleVector
-Objects.
Thus, the standard procedure to use a data set of a real-valued vector space
is to prepare the data set in a file of the following format
(as suitable to NumberVectorLabelParser
):
As an example file following these requirements consider e.g.: exampledata.txt
Copyright © 2015 ELKI Development Team, Lehr- und Forschungseinheit für Datenbanksysteme, Ludwig-Maximilians-Universität München. License information.