@Reference(authors="J. Han, J. Pei, Y. Yin", title="Mining frequent patterns without candidate generation", booktitle="Proc. ACM SIGMOD Int. Conf. Management of Data (SIGMOD 2000)", url="https://doi.org/10.1145/342009.335372", bibkey="DBLP:conf/sigmod/HanPY00") @Priority(value=200) public class FPGrowth extends AbstractFrequentItemsetAlgorithm
FPGrowth.FPTree
.
FP-Growth first sorts items by the overall frequency, since having high frequent items appear first in the tree leads to a much smaller tree since frequent subsets will likely share the same path in the tree. FP-Growth is beneficial when you have a lot of (near-) duplicate transactions, and are using a not too high support threshold, as it only prunes single items, not item combinations.
This implementation is in-memory only, and has not yet been carefully optimized.
The worst case memory use probably is \(O(\min(n\cdot l,i^l))\) where i is the number of items, l the average itemset length, and n the number of items. The worst case scenario is when every item is frequent, and every transaction is unique. The resulting tree will then be larger than the original data.
Reference:
J. Han, J. Pei, Y. Yin
Mining frequent patterns without candidate generation
In Proc. ACM SIGMOD Int. Conf. Management of Data (SIGMOD 2000)
Modifier and Type | Class and Description |
---|---|
static class |
FPGrowth.FPNode
A single node of the FP tree.
|
static class |
FPGrowth.FPTree
FP-Tree data structure
|
static class |
FPGrowth.Parameterizer
Parameterization class.
|
Modifier and Type | Field and Description |
---|---|
private static Logging |
LOG
Class logger.
|
private static java.lang.String |
STAT
Prefix for statistics.
|
maxlength, minlength
ALGORITHM_ID
Constructor and Description |
---|
FPGrowth(double minsupp,
int minlength,
int maxlength)
Constructor.
|
Modifier and Type | Method and Description |
---|---|
private FPGrowth.FPTree |
buildFPTree(Relation<BitVector> relation,
int[] iidx,
int items)
Build the actual FP-tree structure.
|
private int[] |
buildIndex(int[] counts,
int[] positions,
int minsupp)
Build a forward map, item id (dimension) to frequency position
|
private int[] |
countItemSupport(Relation<BitVector> relation,
int dim)
Count the support of each 1-item.
|
TypeInformation[] |
getInputTypeRestriction()
Get the input type restriction used for negotiating the data query.
|
protected Logging |
getLogger()
Get the (STATIC) logger for this class.
|
FrequentItemsetsResult |
run(Database db,
Relation<BitVector> relation)
Run the FP-Growth algorithm
|
getMinimumSupport
run
private static final Logging LOG
private static final java.lang.String STAT
public FPGrowth(double minsupp, int minlength, int maxlength)
minsupp
- Minimum support (relative or absolute)minlength
- Minimum lengthmaxlength
- Maximum lengthpublic FrequentItemsetsResult run(Database db, Relation<BitVector> relation)
db
- Database to processrelation
- Bit vector relationprivate int[] countItemSupport(Relation<BitVector> relation, int dim)
relation
- Datadim
- Maximum dimensionalityprivate FPGrowth.FPTree buildFPTree(Relation<BitVector> relation, int[] iidx, int items)
relation
- Dataiidx
- Inverse index (dimension to item rank)items
- Number of itemsprivate int[] buildIndex(int[] counts, int[] positions, int minsupp)
counts
- Item countspositions
- Position index (output)minsupp
- Minimum supportpublic TypeInformation[] getInputTypeRestriction()
AbstractAlgorithm
getInputTypeRestriction
in interface Algorithm
getInputTypeRestriction
in class AbstractAlgorithm<FrequentItemsetsResult>
protected Logging getLogger()
AbstractAlgorithm
getLogger
in class AbstractAlgorithm<FrequentItemsetsResult>
Copyright © 2019 ELKI Development Team. License information.