V
- vector type@Title(value="Sparse Vector Label Parser") @Description(value="Parser for the following line format:\nA single line provides a single point. Entries are separated by whitespace. The values will be parsed as floats (resulting in a set of SparseFloatVectors).\nA line is expected in the following format:\nThe first entry of each line is the number of attributes with coordinate value not zero. Subsequent entries are of the form (index, value), where index is the number of the corresponding dimension, and value is the value of the corresponding attribute. Any pair of two subsequent substrings not containing whitespace is tried to be read as int and float. If this fails for the first of the pair (interpreted ans index), it will be appended to a label. (Thus, any label must not be parseable as Integer.) If the float component is not parseable, an exception will be thrown. Empty lines and lines beginning with \"#\" will be ignored.") public class SparseNumberVectorLabelParser<V extends SparseNumberVector> extends NumberVectorLabelParser<V>
Several labels may be given per point. A label must not be parseable as double. Lines starting with "#" will be ignored.
A line is expected in the following format: The first entry of each line is
the number of attributes with coordinate value not zero. Subsequent entries
are of the form index value
each, where index is the number of
the corresponding dimension, and value is the value of the corresponding
attribute. A complete line then could look like this:
3 7 12.34 8 56.78 11 1.234 objectlabelwhere
3
indicates there are three attributes set,
7,8,11
are the attributes indexes and there is a non-numerical
object label.
An index can be specified to identify an entry to be treated as class label. This index counts all entries (numeric and labels as well) starting with 0.
Modifier and Type | Class and Description |
---|---|
static class |
SparseNumberVectorLabelParser.Parameterizer<V extends SparseNumberVector>
Parameterization class.
|
BundleStreamSource.Event
Modifier and Type | Field and Description |
---|---|
(package private) java.util.ArrayList<java.lang.String> |
labels
(Reused) label buffer.
|
private static Logging |
LOG
Class logger.
|
protected SparseNumberVector.Factory<V> |
sparsefactory
Same as
NumberVectorLabelParser.factory , but subtype. |
(package private) it.unimi.dsi.fastutil.ints.Int2DoubleOpenHashMap |
values
(Reused) set of values for the number vector.
|
attributes, columnnames, curlbl, curvec, factory, haslabels, maxdim, meta, mindim, nextevent, unique, warnedPrecision
reader, tokenizer
Constructor and Description |
---|
SparseNumberVectorLabelParser(CSVReaderFormat format,
long[] labelIndices,
SparseNumberVector.Factory<V> factory)
Constructor.
|
SparseNumberVectorLabelParser(java.util.regex.Pattern colSep,
java.lang.String quoteChars,
java.util.regex.Pattern comment,
long[] labelIndices,
SparseNumberVector.Factory<V> factory)
Constructor.
|
Modifier and Type | Method and Description |
---|---|
protected Logging |
getLogger()
Get the logger for this class.
|
protected SimpleTypeInformation<V> |
getTypeInformation(int mindim,
int maxdim)
Get a prototype object for the given dimensionality.
|
protected boolean |
parseLineInternal()
Internal method for parsing a single line.
|
buildMeta, cleanup, createVector, data, getMeta, initStream, isLabelColumn, nextEvent
asMultipleObjectsBundle, assignDBID, hasDBIDs, parse
private static final Logging LOG
protected SparseNumberVector.Factory<V extends SparseNumberVector> sparsefactory
NumberVectorLabelParser.factory
, but subtype.it.unimi.dsi.fastutil.ints.Int2DoubleOpenHashMap values
java.util.ArrayList<java.lang.String> labels
public SparseNumberVectorLabelParser(CSVReaderFormat format, long[] labelIndices, SparseNumberVector.Factory<V> factory)
format
- Input formatlabelIndices
- Indices to use as labelsfactory
- Vector factorypublic SparseNumberVectorLabelParser(java.util.regex.Pattern colSep, java.lang.String quoteChars, java.util.regex.Pattern comment, long[] labelIndices, SparseNumberVector.Factory<V> factory)
colSep
- Column separatorquoteChars
- Quotation charactercomment
- Comment patternlabelIndices
- Indices to use as labelsfactory
- Vector factoryprotected boolean parseLineInternal()
NumberVectorLabelParser
parseLineInternal
in class NumberVectorLabelParser<V extends SparseNumberVector>
true
when a valid line was read, false
on a label
row.protected SimpleTypeInformation<V> getTypeInformation(int mindim, int maxdim)
NumberVectorLabelParser
getTypeInformation
in class NumberVectorLabelParser<V extends SparseNumberVector>
mindim
- Minimum dimensionalitymaxdim
- Maximum dimensionalityprotected Logging getLogger()
AbstractStreamingParser
getLogger
in class NumberVectorLabelParser<V extends SparseNumberVector>
Copyright © 2019 ELKI Development Team. License information.