
@Title(value="Term frequency parser") @Description(value="Parse a file containing term frequencies. The expected format is \'label term1term2 ...\'. Terms must not contain the separator character!") public class TermFrequencyParser<V extends SparseNumberVector<?>> extends NumberVectorLabelParser<V>
| Modifier and Type | Class and Description |
|---|---|
static class |
TermFrequencyParser.Parameterizer<V extends SparseNumberVector<?>>
Parameterization class.
|
BundleStreamSource.Event| Modifier and Type | Field and Description |
|---|---|
(package private) gnu.trove.map.TObjectIntMap<String> |
keymap
Map.
|
(package private) ArrayList<String> |
labels
(Reused) label buffer.
|
private static Logging |
LOG
Class logger.
|
(package private) boolean |
normalize
Normalize.
|
(package private) int |
numterms
Number of different terms observed.
|
private SparseNumberVector.Factory<V,?> |
sparsefactory
Same as
NumberVectorLabelParser.factory, but subtype. |
(package private) gnu.trove.map.hash.TIntDoubleHashMap |
values
(Reused) set of values for the number vector.
|
attributes, columnnames, curlbl, curvec, factory, haslabels, labelcolumns, labelIndices, lineNumber, maxdim, meta, mindim, nextevent, uniqueATTRIBUTE_CONCATENATION, comment, COMMENT_PATTERN, DEFAULT_SEPARATOR, NUMBER_PATTERN, QUOTE_CHARS, tokenizer| Constructor and Description |
|---|
TermFrequencyParser(boolean normalize,
Pattern colSep,
String quoteChars,
Pattern comment,
BitSet labelIndices,
SparseNumberVector.Factory<V,?> factory)
Constructor.
|
| Modifier and Type | Method and Description |
|---|---|
protected Logging |
getLogger()
Get the logger for this class.
|
protected SimpleTypeInformation<V> |
getTypeInformation(int mindim,
int maxdim)
Get a prototype object for the given dimensionality.
|
protected void |
parseLineInternal(String line)
Internal method for parsing a single line.
|
buildMeta, createDBObject, data, getMeta, initStream, nextEventparselengthWithoutLinefeed, toStringprivate static final Logging LOG
int numterms
gnu.trove.map.TObjectIntMap<String> keymap
boolean normalize
private SparseNumberVector.Factory<V extends SparseNumberVector<?>,?> sparsefactory
NumberVectorLabelParser.factory, but subtype.gnu.trove.map.hash.TIntDoubleHashMap values
public TermFrequencyParser(boolean normalize,
Pattern colSep,
String quoteChars,
Pattern comment,
BitSet labelIndices,
SparseNumberVector.Factory<V,?> factory)
normalize - NormalizecolSep - Column separatorquoteChars - Quotation charactercomment - Comment patternlabelIndices - Indices to use as labelsprotected void parseLineInternal(String line)
NumberVectorLabelParserparseLineInternal in class NumberVectorLabelParser<V extends SparseNumberVector<?>>line - Line to processprotected SimpleTypeInformation<V> getTypeInformation(int mindim, int maxdim)
NumberVectorLabelParsergetTypeInformation in class NumberVectorLabelParser<V extends SparseNumberVector<?>>mindim - Minimum dimensionalitymaxdim - Maximum dimensionalityprotected Logging getLogger()
AbstractParsergetLogger in class NumberVectorLabelParser<V extends SparseNumberVector<?>>