@Reference(authors="J. Han, J. Pei, Y. Yin", title="Mining frequent patterns without candidate generation", booktitle="Proceedings of the 2000 ACM SIGMOD international conference on Management of data ", url="http://dx.doi.org/10.1145/342009.335372") public class FPGrowth extends AbstractFrequentItemsetAlgorithm
FPGrowth.FPTree
.
FP-Growth first sorts items by the overall frequency, since having high
frequent items appear first in the tree leads to a much smaller tree since
frequent subsets will likely share the same path in the tree. FP-Growth is
beneficial when you have a lot of (near-) duplicate transactions, and are
using a not too high support threshold, as it only prunes single items, not
item combinations.
This implementation is in-memory only, and has not yet been carefully
optimized.
The worst case memory use probably is O(min(n*l,i^l)) where i is the number
of items, l the average itemset length, and n the number of items. The worst
case scenario is when every item is frequent, and every transaction is
unique. The resulting tree will then be larger than the original data.
Reference:
J. Han, J. Pei, Y. Yin
Mining frequent patterns without candidate generation
In proceedings of the 2000 ACM SIGMOD international conference on Management
of data.
Modifier and Type | Class and Description |
---|---|
static class |
FPGrowth.FPNode
A single node of the FP tree.
|
static class |
FPGrowth.FPTree
FP-Tree data structure
|
static class |
FPGrowth.Parameterizer
Parameterization class.
|
Modifier and Type | Field and Description |
---|---|
private static Logging |
LOG
Class logger.
|
private static String |
STAT
Prefix for statistics.
|
maxlength, minlength
Constructor and Description |
---|
FPGrowth(double minsupp,
int minlength,
int maxlength)
Constructor.
|
Modifier and Type | Method and Description |
---|---|
private FPGrowth.FPTree |
buildFPTree(Relation<BitVector> relation,
int[] iidx,
int items)
Build the actual FP-tree structure.
|
private int[] |
buildIndex(int[] counts,
int[] positions,
int minsupp)
Build a forward map, item id (dimension) to frequency position
|
private int[] |
countItemSupport(Relation<BitVector> relation,
int dim)
Count the support of each 1-item.
|
TypeInformation[] |
getInputTypeRestriction()
Get the input type restriction used for negotiating the data query.
|
protected Logging |
getLogger()
Get the (STATIC) logger for this class.
|
FrequentItemsetsResult |
run(Database db,
Relation<BitVector> relation)
Run the FP-Growth algorithm
|
getMinimumSupport
makeParameterDistanceFunction, run
private static final Logging LOG
private static final String STAT
public FPGrowth(double minsupp, int minlength, int maxlength)
minsupp
- Minimum support (relative or absolute)minlength
- Minimum lengthmaxlength
- Maximum lengthpublic FrequentItemsetsResult run(Database db, Relation<BitVector> relation)
db
- Database to processrelation
- Bit vector relationprivate int[] countItemSupport(Relation<BitVector> relation, int dim)
relation
- Datadim
- Maximum dimensionalityprivate FPGrowth.FPTree buildFPTree(Relation<BitVector> relation, int[] iidx, int items)
relation
- Dataiidx
- Inverse index (dimension to item rank)items
- Number of itemsprivate int[] buildIndex(int[] counts, int[] positions, int minsupp)
counts
- Item countspositions
- Position index (output)minsupp
- Minimum supportpublic TypeInformation[] getInputTypeRestriction()
AbstractAlgorithm
getInputTypeRestriction
in interface Algorithm
getInputTypeRestriction
in class AbstractAlgorithm<FrequentItemsetsResult>
protected Logging getLogger()
AbstractAlgorithm
getLogger
in class AbstractAlgorithm<FrequentItemsetsResult>
Copyright © 2015 ELKI Development Team, Lehr- und Forschungseinheit für Datenbanksysteme, Ludwig-Maximilians-Universität München. License information.