@Title(value="Term frequency parser") @Description(value="Parse a file containing term frequencies. The expected format is \'label term1term2 ...\'. Terms must not contain the separator character!") public class TermFrequencyParser<V extends SparseNumberVector> extends NumberVectorLabelParser<V>
SimpleTransactionParser
instead.Modifier and Type | Class and Description |
---|---|
static class |
TermFrequencyParser.Parameterizer<V extends SparseNumberVector>
Parameterization class.
|
BundleStreamSource.Event
Modifier and Type | Field and Description |
---|---|
(package private) TObjectIntMap<String> |
keymap
Map.
|
(package private) ArrayList<String> |
labels
(Reused) label buffer.
|
private static Logging |
LOG
Class logger.
|
(package private) boolean |
normalize
Normalize.
|
(package private) int |
numterms
Number of different terms observed.
|
private SparseNumberVector.Factory<V> |
sparsefactory
Same as
NumberVectorLabelParser.factory , but subtype. |
(package private) TIntDoubleHashMap |
values
(Reused) set of values for the number vector.
|
attributes, columnnames, curlbl, curvec, factory, haslabels, maxdim, meta, mindim, nextevent, unique
reader, tokenizer
Constructor and Description |
---|
TermFrequencyParser(boolean normalize,
CSVReaderFormat format,
long[] labelIndices,
SparseNumberVector.Factory<V> factory)
Constructor.
|
Modifier and Type | Method and Description |
---|---|
protected Logging |
getLogger()
Get the logger for this class.
|
protected SimpleTypeInformation<V> |
getTypeInformation(int mindim,
int maxdim)
Get a prototype object for the given dimensionality.
|
protected boolean |
parseLineInternal()
Internal method for parsing a single line.
|
buildMeta, cleanup, createVector, data, getMeta, initStream, isLabelColumn, nextEvent
asMultipleObjectsBundle, assignDBID, hasDBIDs, parse
private static final Logging LOG
int numterms
TObjectIntMap<String> keymap
boolean normalize
private SparseNumberVector.Factory<V extends SparseNumberVector> sparsefactory
NumberVectorLabelParser.factory
, but subtype.TIntDoubleHashMap values
public TermFrequencyParser(boolean normalize, CSVReaderFormat format, long[] labelIndices, SparseNumberVector.Factory<V> factory)
normalize
- Normalizeformat
- Input formatlabelIndices
- Indices to use as labelsprotected boolean parseLineInternal()
NumberVectorLabelParser
parseLineInternal
in class NumberVectorLabelParser<V extends SparseNumberVector>
true
when a valid line was read, false
on a label
row.protected SimpleTypeInformation<V> getTypeInformation(int mindim, int maxdim)
NumberVectorLabelParser
getTypeInformation
in class NumberVectorLabelParser<V extends SparseNumberVector>
mindim
- Minimum dimensionalitymaxdim
- Maximum dimensionalityprotected Logging getLogger()
AbstractStreamingParser
getLogger
in class NumberVectorLabelParser<V extends SparseNumberVector>
Copyright © 2015 ELKI Development Team, Lehr- und Forschungseinheit für Datenbanksysteme, Ludwig-Maximilians-Universität München. License information.