|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Object de.lmu.ifi.dbs.elki.datasource.parser.ArffParser
public class ArffParser
Parser to load WEKA .arff files into ELKI. This parser is quite hackish, and contains lots of not yet configurable magic. TODO: Sparse vectors are not yet supported.
Nested Class Summary | |
---|---|
static class |
ArffParser.Parameterizer
Parameterization class. |
Field Summary | |
---|---|
static Pattern |
ARFF_COMMENT
Comment pattern. |
static Pattern |
ARFF_HEADER_ATTRIBUTE
Arff attribute declaration marker |
static Pattern |
ARFF_HEADER_DATA
Arff data marker |
static Pattern |
ARFF_HEADER_RELATION
Arff file marker |
static Pattern |
ARFF_NUMERIC
Pattern for numeric columns |
static String |
DEFAULT_ARFF_MAGIC_CLASS
Pattern to auto-convert columns to class labels. |
static String |
DEFAULT_ARFF_MAGIC_EID
Pattern to auto-convert columns to external ids. |
static Pattern |
EMPTY
Empty line pattern. |
private static Logging |
logger
Logger |
(package private) Pattern |
magic_class
Pattern to recognize class label columns |
(package private) Pattern |
magic_eid
Pattern to recognize external ids |
Constructor Summary | |
---|---|
ArffParser(Pattern magic_eid,
Pattern magic_class)
Constructor. |
|
ArffParser(String magic_eid,
String magic_class)
Constructor. |
Method Summary | |
---|---|
private Object[] |
loadDenseInstance(StreamTokenizer tokenizer,
int[] dimsize,
TypeInformation[] etyp,
int outdim)
|
private Object[] |
loadSparseInstance(StreamTokenizer tokenizer,
int[] targ,
int[] dimsize,
TypeInformation[] elkitypes,
int metaLength)
|
private StreamTokenizer |
makeArffTokenizer(BufferedReader br)
Make a StreamTokenizer for the ARFF format. |
private void |
nextToken(StreamTokenizer tokenizer)
Helper function for token handling. |
MultipleObjectsBundle |
parse(InputStream instream)
Returns a list of the objects parsed from the specified input stream. |
private void |
parseAttributeStatements(BufferedReader br,
ArrayList<String> names,
ArrayList<String> types)
Parse the "@attribute" section of the ARFF file. |
private void |
processColumnTypes(ArrayList<String> names,
ArrayList<String> types,
int[] targ,
TypeInformation[] etyp,
int[] dims)
Process the column types (and names!) |
private void |
readHeader(BufferedReader br)
Read the dataset header part of the ARFF file, to ensure consistency. |
private void |
setupBundleHeaders(ArrayList<String> names,
int[] targ,
TypeInformation[] etyp,
int[] dimsize,
MultipleObjectsBundle bundle,
boolean sparse)
Setup the headers for the object bundle. |
Methods inherited from class java.lang.Object |
---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Field Detail |
---|
private static final Logging logger
public static final Pattern ARFF_HEADER_RELATION
public static final Pattern ARFF_HEADER_ATTRIBUTE
public static final Pattern ARFF_HEADER_DATA
public static final Pattern ARFF_COMMENT
public static final String DEFAULT_ARFF_MAGIC_EID
public static final String DEFAULT_ARFF_MAGIC_CLASS
public static final Pattern ARFF_NUMERIC
public static final Pattern EMPTY
Pattern magic_eid
Pattern magic_class
Constructor Detail |
---|
public ArffParser(Pattern magic_eid, Pattern magic_class)
magic_eid
- Magic to recognize external IDsmagic_class
- Magic to recognize class labelspublic ArffParser(String magic_eid, String magic_class)
magic_eid
- Magic to recognize external IDsmagic_class
- Magic to recognize class labelsMethod Detail |
---|
public MultipleObjectsBundle parse(InputStream instream)
Parser
parse
in interface Parser
instream
- the stream to parse objects from
private Object[] loadSparseInstance(StreamTokenizer tokenizer, int[] targ, int[] dimsize, TypeInformation[] elkitypes, int metaLength) throws IOException
IOException
private Object[] loadDenseInstance(StreamTokenizer tokenizer, int[] dimsize, TypeInformation[] etyp, int outdim) throws IOException
IOException
private StreamTokenizer makeArffTokenizer(BufferedReader br)
br
- Buffered reader
private void setupBundleHeaders(ArrayList<String> names, int[] targ, TypeInformation[] etyp, int[] dimsize, MultipleObjectsBundle bundle, boolean sparse)
names
- Attribute namestarg
- Target columnsetyp
- ELKI type informationdimsize
- Number of dimensions in the individual typesbundle
- Output bundlesparse
- Flag to create sparse vectorsprivate void readHeader(BufferedReader br) throws IOException
br
- Buffered Reader
IOException
private void parseAttributeStatements(BufferedReader br, ArrayList<String> names, ArrayList<String> types) throws IOException
br
- Inputnames
- List (to fill) of attribute namestypes
- List (to fill) of attribute types
IOException
private void processColumnTypes(ArrayList<String> names, ArrayList<String> types, int[] targ, TypeInformation[] etyp, int[] dims)
names
- Attribute namestypes
- Attribute typestarg
- Target dimension mapping (ARFF to ELKI), return valueetyp
- ELKI type information, return valuedims
- Number of successive dimensions, return valueprivate void nextToken(StreamTokenizer tokenizer) throws IOException
tokenizer
- Tokenizer
IOException
|
|
|||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |