V
- the type of NumberVector usedpublic class NumberVectorLabelParser<V extends NumberVector<?>> extends AbstractStreamingParser
Provides a parser for parsing one point per line, attributes separated by whitespace.
Several labels may be given per point. A label must not be parseable as double. Lines starting with "#" will be ignored.
An index can be specified to identify an entry to be treated as class label. This index counts all entries (numeric and labels as well) starting with 0.
Modifier and Type | Class and Description |
---|---|
static class |
NumberVectorLabelParser.Parameterizer<V extends NumberVector<?>>
Parameterization class.
|
BundleStreamSource.Event
Modifier and Type | Field and Description |
---|---|
(package private) gnu.trove.list.array.TDoubleArrayList |
attributes
(Reused) store for numerical attributes.
|
protected List<String> |
columnnames
Column names.
|
protected LabelList |
curlbl
Current labels.
|
protected V |
curvec
Current vector.
|
protected NumberVector.Factory<V,?> |
factory
Vector factory class.
|
protected boolean |
haslabels
Whether or not the data set has labels.
|
protected BitSet |
labelcolumns
Bitset to indicate which columns are not numeric.
|
protected BitSet |
labelIndices
Keeps the indices of the attributes to be treated as a string label.
|
(package private) ArrayList<String> |
labels
(Reused) store for labels.
|
protected int |
lineNumber
Current line number.
|
private static Logging |
LOG
Logging class.
|
protected int |
maxdim
Dimensionality reported.
|
protected BundleMeta |
meta
Metadata.
|
protected int |
mindim
Dimensionality reported.
|
(package private) BundleStreamSource.Event |
nextevent
Event to report next.
|
private BufferedReader |
reader
Buffer reader.
|
(package private) HashMap<String,String> |
unique
For String unification.
|
ATTRIBUTE_CONCATENATION, comment, COMMENT_PATTERN, DEFAULT_SEPARATOR, NUMBER_PATTERN, QUOTE_CHARS, tokenizer
Constructor and Description |
---|
NumberVectorLabelParser(NumberVector.Factory<V,?> factory)
Constructor with defaults.
|
NumberVectorLabelParser(Pattern colSep,
String quoteChars,
Pattern comment,
BitSet labelIndices,
NumberVector.Factory<V,?> factory)
Constructor.
|
Modifier and Type | Method and Description |
---|---|
protected void |
buildMeta()
Update the meta element.
|
protected <A> V |
createDBObject(A attributes,
NumberArrayAdapter<?,A> adapter)
Creates a database object of type V.
|
Object |
data(int rnum)
Access a particular object and representation.
|
protected Logging |
getLogger()
Get the logger for this class.
|
BundleMeta |
getMeta()
Get the current meta data.
|
(package private) SimpleTypeInformation<V> |
getTypeInformation(int mindim,
int maxdim)
Get a prototype object for the given dimensionality.
|
void |
initStream(InputStream in)
Init the streaming parser for the given input stream.
|
BundleStreamSource.Event |
nextEvent()
Get the next event
|
protected void |
parseLineInternal(String line)
Internal method for parsing a single line.
|
parse
lengthWithoutLinefeed, toString
private static final Logging LOG
protected BitSet labelIndices
protected NumberVector.Factory<V extends NumberVector<?>,?> factory
private BufferedReader reader
protected int lineNumber
protected int mindim
protected int maxdim
protected BundleMeta meta
protected BitSet labelcolumns
protected boolean haslabels
protected V extends NumberVector<?> curvec
protected LabelList curlbl
final gnu.trove.list.array.TDoubleArrayList attributes
BundleStreamSource.Event nextevent
public NumberVectorLabelParser(NumberVector.Factory<V,?> factory)
factory
- Vector factorypublic NumberVectorLabelParser(Pattern colSep, String quoteChars, Pattern comment, BitSet labelIndices, NumberVector.Factory<V,?> factory)
colSep
- Column separatorquoteChars
- Quote charactercomment
- Comment patternlabelIndices
- Column indexes that are numeric.factory
- Vector factorypublic void initStream(InputStream in)
StreamingParser
in
- the stream to parse objects frompublic BundleMeta getMeta()
BundleStreamSource
public BundleStreamSource.Event nextEvent()
BundleStreamSource
protected void buildMeta()
public Object data(int rnum)
BundleStreamSource
rnum
- Representation numberprotected void parseLineInternal(String line)
line
- Line to processprotected <A> V createDBObject(A attributes, NumberArrayAdapter<?,A> adapter)
A
- attribute typeattributes
- the attributes of the vector to create.adapter
- Array adapterSimpleTypeInformation<V> getTypeInformation(int mindim, int maxdim)
mindim
- Minimum dimensionalitymaxdim
- Maximum dimensionalityprotected Logging getLogger()
AbstractParser
getLogger
in class AbstractParser