
public class ArffParser extends Object implements Parser
| Modifier and Type | Class and Description | 
|---|---|
| static class  | ArffParser.ParameterizerParameterization class. | 
| Modifier and Type | Field and Description | 
|---|---|
| static Pattern | ARFF_COMMENTComment pattern. | 
| static Pattern | ARFF_HEADER_ATTRIBUTEArff attribute declaration marker. | 
| static Pattern | ARFF_HEADER_DATAArff data marker. | 
| static Pattern | ARFF_HEADER_RELATIONArff file marker. | 
| static Pattern | ARFF_NUMERICPattern for numeric columns. | 
| static String | DEFAULT_ARFF_MAGIC_CLASSPattern to auto-convert columns to class labels. | 
| static String | DEFAULT_ARFF_MAGIC_EIDPattern to auto-convert columns to external ids. | 
| static Pattern | EMPTYEmpty line pattern. | 
| private static Logging | LOGLogger. | 
| (package private) Pattern | magic_classPattern to recognize class label columns. | 
| (package private) Pattern | magic_eidPattern to recognize external ids. | 
| Constructor and Description | 
|---|
| ArffParser(Pattern magic_eid,
          Pattern magic_class)Constructor. | 
| ArffParser(String magic_eid,
          String magic_class)Constructor. | 
| Modifier and Type | Method and Description | 
|---|---|
| private Object[] | loadDenseInstance(StreamTokenizer tokenizer,
                 int[] dimsize,
                 TypeInformation[] etyp,
                 int outdim) | 
| private Object[] | loadSparseInstance(StreamTokenizer tokenizer,
                  int[] targ,
                  int[] dimsize,
                  TypeInformation[] elkitypes,
                  int metaLength) | 
| private StreamTokenizer | makeArffTokenizer(BufferedReader br)Make a StreamTokenizer for the ARFF format. | 
| private void | nextToken(StreamTokenizer tokenizer)Helper function for token handling. | 
| MultipleObjectsBundle | parse(InputStream instream)Returns a list of the objects parsed from the specified input stream. | 
| private void | parseAttributeStatements(BufferedReader br,
                        ArrayList<String> names,
                        ArrayList<String> types)Parse the "@attribute" section of the ARFF file. | 
| private void | processColumnTypes(ArrayList<String> names,
                  ArrayList<String> types,
                  int[] targ,
                  TypeInformation[] etyp,
                  int[] dims)Process the column types (and names!) | 
| private void | readHeader(BufferedReader br)Read the dataset header part of the ARFF file, to ensure consistency. | 
| private void | setupBundleHeaders(ArrayList<String> names,
                  int[] targ,
                  TypeInformation[] etyp,
                  int[] dimsize,
                  MultipleObjectsBundle bundle,
                  boolean sparse)Setup the headers for the object bundle. | 
private static final Logging LOG
public static final Pattern ARFF_HEADER_RELATION
public static final Pattern ARFF_HEADER_ATTRIBUTE
public static final Pattern ARFF_HEADER_DATA
public static final Pattern ARFF_COMMENT
public static final String DEFAULT_ARFF_MAGIC_EID
public static final String DEFAULT_ARFF_MAGIC_CLASS
public static final Pattern ARFF_NUMERIC
public static final Pattern EMPTY
Pattern magic_eid
Pattern magic_class
public ArffParser(Pattern magic_eid, Pattern magic_class)
magic_eid - Magic to recognize external IDsmagic_class - Magic to recognize class labelspublic MultipleObjectsBundle parse(InputStream instream)
Parserprivate Object[] loadSparseInstance(StreamTokenizer tokenizer, int[] targ, int[] dimsize, TypeInformation[] elkitypes, int metaLength) throws IOException
IOExceptionprivate Object[] loadDenseInstance(StreamTokenizer tokenizer, int[] dimsize, TypeInformation[] etyp, int outdim) throws IOException
IOExceptionprivate StreamTokenizer makeArffTokenizer(BufferedReader br)
br - Buffered readerprivate void setupBundleHeaders(ArrayList<String> names, int[] targ, TypeInformation[] etyp, int[] dimsize, MultipleObjectsBundle bundle, boolean sparse)
names - Attribute namestarg - Target columnsetyp - ELKI type informationdimsize - Number of dimensions in the individual typesbundle - Output bundlesparse - Flag to create sparse vectorsprivate void readHeader(BufferedReader br) throws IOException
br - Buffered ReaderIOExceptionprivate void parseAttributeStatements(BufferedReader br, ArrayList<String> names, ArrayList<String> types) throws IOException
br - Inputnames - List (to fill) of attribute namestypes - List (to fill) of attribute typesIOExceptionprivate void processColumnTypes(ArrayList<String> names, ArrayList<String> types, int[] targ, TypeInformation[] etyp, int[] dims)
names - Attribute namestypes - Attribute typestarg - Target dimension mapping (ARFF to ELKI), return valueetyp - ELKI type information, return valuedims - Number of successive dimensions, return valueprivate void nextToken(StreamTokenizer tokenizer) throws IOException
tokenizer - TokenizerIOException