Ana səhifə

Association Rules

Yüklə 124.32 Kb.
ölçüsü124.32 Kb.
Association Rules

1. Objectives 2

2. Definitions 2

3. Type of Association Rules 7

4. Frequent Itemset generation 10

5. Apriori Algorithm: Mining Single-Dimension Boolean AR 14

5.1. Join Step: 16

5.2. Prune step 18

5.3. Example 19

5.4. Pseudo-code 20

5.5. Challenges 20

5.6. Improving the Efficiency of Apriori 21

6. Mining Frequent Itemsets without Candidate Generation 23

6.1. Mining Frequent patterns using FP-Tree 26

6.2. Major steps to mine FP-trees 26

7. Multiple-Level Association Rules 32

7.1. Approach 32

7.2. Redundancy Filtering 37


  • Increase sales and reduce costs

  • What products were often purchased together?

          • Beer and diapers?!

  • What are the subsequent purchases after buying a PC?

  • What kinds of DNA are sensitive to this new drug?

  • Can we automatically classify web documents?

  • Broad applications:

          • Basket data analysis, cross-marketing, catalog design, sale campaign analysis

          • Web log (click stream) analysis, DNA sequence analysis, etc.

      • Example: Items frequently purchased together:

      • Bread  PeanutButter

      • Why associations:

          • Placement

          • Advertising

          • Sales

          • Coupons


  • Finding frequent patterns, associations, correlations, or causal structures among sets of items or objects in transaction databases, relational databases, and other information repositories.

  • Frequent pattern: pattern (set of items, sequence, etc.) that occurs frequently in a database.

  • Basic Concepts:

    • A set of items: I={x1, …, xk}

    • Transactions: D={t1,t2, …, tn}, tj  I

  • A k-Itemset: {Ii1,Ii2, …, Iik}  I

  • Support of an itemset: Percentage of transactions that contain that itemset.

    • Large (Frequent) itemset: Itemset whose number of occurrences is above a threshold.

    • Example:

I = { Beer, Bread, Jelly, Milk, PeanutButter}

Support of {Bread,PeanutButter} = 3/5 = 60%

  • Association Rules

    • Implication: X  Y where X,Y  I and X Y = ;

    • Support of AR (s) X  Y:

      • Percentage of transactions that contain

X  Y

      • Probability that a transaction contains XY.

    • Confidence of AR (a) X  Y:

      • Ratio of number of transactions that contain X  Y to the number that contain X

      • Conditional probability that a transaction having X also contains Y.

    • Example:

  • For rule AC:

Support(AC) = P(AC) = support({A}{C}) = 50%

confidence (A C) = P(C|A)

= support({A}{C})/support({A}) = 66.6%

  • Another Example:

X Y



Bread  Peanutbutter

= 3/5 %= 60%

= (3/5)/(4/5)%=75%

Peanutbutter  Bread


= (3/5)/(3/5)%=100%

Jelly  Milk



Jelly  Peanutbutter

=1/5 % = 20%

= (1/5)/(1/5) % = 100%

  • Association Rule Problem:

    • Given a set of items I={I1,I2,…,Im} and a database of transactions D={t1,t2, …, tn} where ti={Ii1,Ii2, …, Iik} and Iij  I, the Association Rule Problem is to identify all association rules X  Y with a minimum support and confidence.

    • NOTE: Support of X  Y is same as support of X  Y.

  • Association Rules techniques:

      • Find all frequent itemsets.

      • Generate strong association rules from the frequent itemsets: those rules must satisfy minimum support and minimum confidence.

3.Type of Association Rules

  • Boolean AR:

          • It is a rule that checks whether an item is present or absent.

          • All the examples we have seen so far are Boolean AR.

  • Quantitative AR:

          • It describes associations between quantitative items or attributes.

          • Generally, quantitative values are partitioned into intervals.

          • Example:

Age(X,”30..39”)  income(X,”80K..100K”)

 buys(X, High Resolution TV)

  • Single-Dimension AR:

          • It is a rule that references only one dimension.

          • Example:


 buys(X,”financial_software”)

The single dimension is “buys”

          • The following rule is a multi-dimensional AR:

Age(X,”30..39”)  income(X,”80K..100K”)

 buys(X, High Resolution TV)

  • Multi-level AR

          • It is a set of rules that reference different levels of abstraction.

          • Example:

Age(X,”30..39”)  buys(X, “desktop”)

Age(X,”20..29”)  buys(X, “laptop”)

Laptop  desktop  computer

4.Frequent Itemset generation

  • Given d items, there are 2d possible candidate itemsets

  • Brute-force approach:

    • Each itemset in the lattice is a candidate frequent itemset

    • Count the support of each candidate by scanning the database

    • Match each transaction against every candidate

    • Complexity ~ O(NMw) => Expensive since M = 2d !!!

  • Complexiy:

    • Given d unique items:

    • Total number of itemsets = 2d

    • Total number of possible association rules:

    • If d=6, R = 602 rules

  • Frequent Itemset Generation Strategies

    • Reduce the number of candidates (M)

    • Reduce the number of transactions (N)

      • Reduce size of N as the size of itemset increases

    • Reduce the number of comparisons (NM)

      • Use efficient data structures to store the candidates or transactions

      • No need to match every candidate against every transaction

5.Apriori Algorithm: Mining Single-Dimension Boolean AR

  • It is used to mine Boolean, single-level, and single-dimension ARs.

  • Apriori Principle

  • Apriori algorithm:

          • Uses prior knowledge of frequent itemset properties.

          • It is an iterative algorithm known as level-wise search.

          • The search proceeds level-by-level as follows:

            • First determine the set of frequent 1-itemset; L1

            • Second determine the set of frequent 2-itemset using L1: L2

            • Etc.

          • The complexity of computing Li is O(n) where n is the number of transactions in the transaction database.

          • Reduction of search space:

            • In the worst case what is the number of itemsets in a level Li?

            • Apriori uses “Apriori Property”:

          • Apriori Property:

            • It is an anti-monotone property: if a set cannot pass a test, all of its supersets will fail the same test as well.

            • It is called anti-monotone because the property is monotonic in the context of failing a test.

            • All nonempty subsets of a frequent itemset must also be frequent.

            • An itemset I is not frequent if it does not satisfy the minimum support threshold:

P(I) < min_sup

            • If an item A is added to the itemset I, then the resulting itemset I  A cannot occur more frequently than I:

I  A is not frequent
Therefore, P(I  A) < min_sup

        • How Apriori algorithm uses “Apriori property”?

          • In the computation of the itemsets in Lk using Lk-1

          • It is done in two steps:

            • Join

            • Prune

5.1.Join Step:

        • The set of candidate k-itemsets (element of Lk), Ck, is generated by joining Lk-1 with itself:

Lk-1 ∞ Lk-1

        • Given l1 and l2 of Lk-1


Where Li and Lj are sorted.

        • Li and Lj are joined if there are different (no duplicate generation). Assume the following:

li1=lj1, li2=lj1, …, li(k-2)=lj(k-2) and li(k-1) < lj(k-1)

  • The resulting itemset is:


  • Example of Candidate-generation:

L3={abc, abd, acd, ace, bcd}

Self-joining: L3 ∞ L3

abcd from abc and abd

acde from acd and ace

5.2.Prune step

  • Ck is a superset of Lk  some itemset in Ck may or may not be frequent.

  • Lk: Test each generated itemset against the database:

      • Scan the database to determine the count of each generated itemset and include those that have a count no less than the minimum support count.

      • This may require intensive computation.

  • Use Apriori property to reduce the search space:

      • Any (k-1)-itemset that is not frequent cannot be a subset of a frequent k-itemset.

      • Remove from Ck any k-itemset that has a (k-1)-subset not in Lk-1 (itemsets that are not frequent)

      • Efficiently implemented: maintain a hash table of all frequent itemset.

  • Example of Candidate-generation and Pruning:

L3={abc, abd, acd, ace, bcd}

Self-joining: L3 ∞ L3

abcd from abc and abd

acde from acd and ace

acde is removed because ade is not in L3




Ck: Candidate itemset of size k

Lk : frequent itemset of size k

L1 = {frequent items};

for (k = 1; Lk !=Æ; k++) do


Ck+1 = candidates generated from Lk;

- for each transaction t in database do

increment the count of all candidates in Ck+1 that are contained in t;


- Lk+1 = candidates in Ck+1 with min_support


returnk Lk;


  • Multiple scans of transaction database

  • Huge number of candidates

  • Tedious workload of support counting for candidates

  • Improving Apriori:

          • general ideas

          • Reduce passes of transaction database scans

          • Shrink number of candidates

          • Facilitate support counting of candidates

  • Easily parallelized

5.6.Improving the Efficiency of Apriori

  • Several attempts have been introduced to improve the efficiency of Apriori:

    • Hash-based technique

      • Hashing itemset counts

      • Example:

        • Transaction DB:


List of Transactions



















    • Create a hash table for candidate 2-itemsets:

      • Generate all 2-itemsets for each transaction in the transaction DB

      • H(x,y) = ((order of x) * 10 + (order of y)) mod 7

      • A 2-itemset whose corresponding bucket count is below the support threshold cannot be frequent.

Bucket @








Bucket count





























  • Remember: support(xy) = percentage number of transactions that contain x and y. Therefore, if the minimum support is 3, then the itemsets in buckets 0, 1, 3, and 4 cannot be frequent and so they should not be included in C2.

    • Transaction reduction

      • Reduce the number of transactions scanned in future iterations.

      • A transaction that does not contain any frequent k-itemsets cannot contain any frequent (k+1)-itemsets: Do not include such transaction in subsequent scans.

    • Other techniques include:

      • Partitioning (partition the data to find candidate itemsets)

      • Sampling (Mining on a subset of the given data)

      • Dynamic itemset counting (Adding candidate itemsets at different points during a scan)

6.Mining Frequent Itemsets without Candidate Generation

  • Objectives:

    • The bottleneck of Apriori: candidate generation

    • Huge candidate sets:

      • For 104 frequent 1-itemset, Apriori will generate 107 candidate 2-itemsets.

      • To discover a frequent pattern of size 100, e.g., {a1, a2, …, a100}, one needs to generate

2100  1030 candidates.

    • Multiple scans of database:

      • Needs (n +1) scans, n is the length of the longest pattern.

  • Principal

    • Compress a large database into a compact, Frequent-Pattern tree (FP-tree) structure

      • Highly condensed, but complete for frequent pattern mining

      • Avoid costly database scans

    • Develop an efficient, FP-tree-based frequent pattern mining method

      • A divide-and-conquer methodology: decompose mining tasks into smaller ones

      • Avoid candidate generation: sub-database test only!

  • Algorithm:

  1. Scan DB once, find frequent 1-itemset (single item pattern)

  2. Order frequent items in frequency descending order, called L order: (in the example below: F(4), c(4), a(3), etc.)

  3. Scan DB again and construct FP-tree

    1. Create the root of the tree and label it null or {}

    2. The items in each transaction are processed in the L order (sorted according to descending support count).

    3. Create a branch for each transaction

    4. Branches share common prefixes

  • Example: min_support = 0.5


Items bought

(Ordered) frequent items


{f, a, c, d, g, i, m, p}

{f, c, a, m, p}


{a, b, c, f, l, m, o}

{f, c, a, b, m}


{b, f, h, j, o}

{f, b}


{b, c, k, s, p}

{c, b, p}


{a, f, c, e, l, p, m, n}

{f, c, a, m, p}

  • Node Structure:



node pointer

child pointers

parent pointer

6.1.Mining Frequent patterns using FP-Tree

  • General idea (divide-and-conquer)

    • Recursively grow frequent pattern path using the FP-tree

  • Method

    • For each item, construct its conditional pattern-base, and then its conditional FP-tree

    • Recursion: Repeat the process on each newly created conditional FP-tree

    • Until the resulting FP-tree is empty, or it contains only one path (single path will generate all the combinations of its sub-paths, each of which is a frequent pattern)

6.2.Major steps to mine FP-trees

  • Main Steps:

  1. Construct conditional pattern base for each node in the FP-tree

  2. Construct conditional FP-tree from each conditional pattern-base

  3. Recursively mine conditional FP-trees and grow frequent patterns obtained so far If the conditional FP-tree contains a single path, simply enumerate all the patterns

  • Step 1: From FP-tree to Conditional Pattern Base

      • Starting at the frequent header table in the FP-tree

      • Traverse the FP-tree by following the link of each frequent item, starting by the item with the highest frequency.

      • Accumulate all of transformed prefix paths of that item to form a conditional pattern base

  • Example:

Conditional pattern bases


Conditional pattern base






fca:1, f:1, c:1


fca:2, fcab:1


fcam:2, cb:1

  • Properties of FP-tree for Conditional Pattern Base Construction:

    • Node-link property

      • For any frequent item ai, all the possible frequent patterns that contain ai can be obtained by following ai's node-links, starting from ai's head in the FP-tree header.

    • Prefix path property

      • To calculate the frequent patterns for a node ai in a path P, only the prefix sub-path of ai in P need to be accumulated and its frequency count should carry the same count as node ai.

  • Step 2: Construct Conditional FP-tree

    • For each pattern-base

      • Accumulate the count for each item in the base

      • Construct the FP-tree for the frequent items of the pattern base

    • Example:

m-conditional pattern base: fca:2, fcab:1

All frequent patterns concerning m:


fm, cm, am,

fcm, fam, cam,


  • Mining Frequent Patterns by Creating Conditional Pattern-Bases:


Conditional pattern-base

Conditional FP-tree


{(fcam:2), (cb:1)}



{(fca:2), (fcab:1)}

{(f:3, c:3, a:3)}|m


{(fca:1), (f:1), (c:1)}




{(f:3, c:3)}|a







  • Why is FP-Tree mining fast?

    • The performance study shows FP-growth is an order of magnitude faster than Apriori

    • Reasoning:

      • No candidate generation, no candidate test

      • Use compact data structure

      • Eliminate repeated database scan

      • Basic operation is counting and FP-tree building

  • FP-Growth vs. Apriori: Scalability with the support Threshold [Jiawei Han and Micheline Kamber]

7.Multiple-Level Association Rules

  • Items often form hierarchy.

  • Items at the lower level are expected to have lower support.

  • Rules regarding itemsets at appropriate levels could be quite useful.

  • Transaction database can be encoded based on dimensions and levels

  • We can explore shared multi-level mining


  • A top-down, progressive deepening approach:

      • First find high-level strong rules:

milk  bread [20%, 60%].

      • Then find their lower-level “weaker” rules:

2% milk wheat bread [6%, 50%].

  • Variations at mining multiple-level association rules.

      • Level-crossed association rules:

2% milk Wonder wheat bread

  • Association rules with multiple, alternative hierarchies:

2% milk Wonder bread

  • Two multiple-level mining associations strategies:

      • Uniform Support

      • Reduced support

  • Uniform Support: the same minimum support for all levels

      • One minimum support threshold.

      • No need to examine itemsets containing any item whose ancestors do not have minimum support.

      • Drawback:

          • Lower level items do not occur as frequently. If support threshold

too high  miss low level associations

too low  generate too many high level assoc.

  • Reduced Support: reduced minimum support at lower levels

      • There are 4 search strategies:

          • Level-by-level independent

          • Level-cross filtering by k-itemset

          • Level-cross filtering by single item

          • Controlled level-cross filtering by single item

      • Level-by-Level independent:

          • Full-breadth search

          • No background knowledge is used.

          • Each node is examined regardless the frequency of its parent.

      • Level-cross filtering by single item:

          • An item at the ith level is examined if and only if its parent node at the (i-1)th level is frequent.

      • Level-cross filtering by k-itemset:

          • A k-itemset at the ith level is examined if and only if its corresponding parent k-itemset at the (i-1)th level is frequent.

          • This restriction is stronger than the one in level-cross filtering by single item

          • They are not usually many k-itemsets that, when combined, are also frequent:

 Many valuable patterns can be mined

      • Controlled level-cross filtering by single item:

          • A variation of the level-cross filtering by single item: Relax the constraint in this approach

          • Allow the children of items that do not satisfy the minimum support threshold to be examined if these items satisfy the level passage threshold:


          • level_passage_sup Value: It is typically set between the min_sup value of the given level and the min_sup of the next level.

          • Example:

7.2.Redundancy Filtering

  • Some rules may be redundant due to “ancestor” relationships between items.

  • Definition: A rule R1 is an ancestor of a rule, R2, if R1 can be obtained by replacing the items in R2 by their ancestors in a concept hierarchy.

  • Example

R1: milk  wheat bread [support = 8%, confidence = 70%]

R2: 2% milk  wheat bread [support = 2%, confidence = 72%]
Milk in R1 is an ancestor of 2% milk in R2.

  • We say the first rule is an ancestor of the second rule.

  • A rule is redundant if its support is close to the “expected” value, based on the rule’s ancestor:

      • R2 is redundant since its confidence is close to the confidence of R1 (kind of expected) and its support is around 2% = (8% * ¼)

      • R2 does not add any additional information.

A. Bellaachia Page:

Verilənlər bazası müəlliflik hüququ ilə müdafiə olunur © 2016
rəhbərliyinə müraciət