Association Rules
1. Objectives 2
2. Definitions 2
3. Type of Association Rules 7
4. Frequent Itemset generation 10
5. Apriori Algorithm: Mining SingleDimension Boolean AR 14
5.1. Join Step: 16
5.2. Prune step 18
5.3. Example 19
5.4. Pseudocode 20
5.5. Challenges 20
5.6. Improving the Efficiency of Apriori 21
6. Mining Frequent Itemsets without Candidate Generation 23
6.1. Mining Frequent patterns using FPTree 26
6.2. Major steps to mine FPtrees 26
7. MultipleLevel Association Rules 32
7.1. Approach 32
7.2. Redundancy Filtering 37
1.Objectives

Increase sales and reduce costs

What products were often purchased together?

What are the subsequent purchases after buying a PC?

What kinds of DNA are sensitive to this new drug?

Can we automatically classify web documents?

Broad applications:

Basket data analysis, crossmarketing, catalog design, sale campaign analysis

Web log (click stream) analysis, DNA sequence analysis, etc.

Example: Items frequently purchased together:

Bread PeanutButter


Why associations:

Placement

Advertising

Sales

Coupons
2.Definitions

Finding frequent patterns, associations, correlations, or causal structures among sets of items or objects in transaction databases, relational databases, and other information repositories.



Frequent pattern: pattern (set of items, sequence, etc.) that occurs frequently in a database.

Basic Concepts:

A set of items: I={x_{1}, …, x_{k}}

Transactions: D={t_{1},t_{2}, …, t_{n}}, t_{j} I

A kItemset: {I_{i1},I_{i2}, …, I_{ik}} I

Support of an itemset: Percentage of transactions that contain that itemset.

Large (Frequent) itemset: Itemset whose number of occurrences is above a threshold.
I = { Beer, Bread, Jelly, Milk, PeanutButter}
Support of {Bread,PeanutButter} = 3/5 = 60%

Association Rules

Implication: X Y where X,Y I and X Y = ;

Support of AR (s) X Y:

Percentage of transactions that contain
X Y

Probability that a transaction contains XY.

Confidence of AR (a) X Y:

Ratio of number of transactions that contain X Y to the number that contain X

Conditional probability that a transaction having X also contains Y.
Support(AC) = P(AC) = support({A}{C}) = 50%
confidence (A C) = P(CA)
= support({A}{C})/support({A}) = 66.6%
X Y

Support

Confidence

Bread Peanutbutter

= 3/5 %= 60%

= (3/5)/(4/5)%=75%

Peanutbutter Bread

60%

= (3/5)/(3/5)%=100%

Jelly Milk

0%

0%

Jelly Peanutbutter

=1/5 % = 20%

= (1/5)/(1/5) % = 100%


Association Rule Problem:

Given a set of items I={I_{1},I_{2},…,I_{m}} and a database of transactions D={t_{1},t_{2}, …, t_{n}} where t_{i}={I_{i1},I_{i2}, …, I_{ik}} and I_{ij} I, the Association Rule Problem is to identify all association rules X Y with a minimum support and confidence.

NOTE: Support of X Y is same as support of X Y.

Association Rules techniques:

Find all frequent itemsets.

Generate strong association rules from the frequent itemsets: those rules must satisfy minimum support and minimum confidence.
3.Type of Association Rules

Boolean AR:

It is a rule that checks whether an item is present or absent.

All the examples we have seen so far are Boolean AR.

Quantitative AR:

It describes associations between quantitative items or attributes.

Generally, quantitative values are partitioned into intervals.

Example:
Age(X,”30..39”) income(X,”80K..100K”)
buys(X, High Resolution TV)

SingleDimension AR:

It is a rule that references only one dimension.

Example:
buys(X,”computer”)
buys(X,”financial_software”)
The single dimension is “buys”

The following rule is a multidimensional AR:
Age(X,”30..39”) income(X,”80K..100K”)
buys(X, High Resolution TV)

Multilevel AR

It is a set of rules that reference different levels of abstraction.

Example:
Age(X,”30..39”) buys(X, “desktop”)
Age(X,”20..29”) buys(X, “laptop”)
Laptop desktop computer
4.Frequent Itemset generation

Given d items, there are 2^{d} possible candidate itemsets

Bruteforce approach:

Each itemset in the lattice is a candidate frequent itemset

Count the support of each candidate by scanning the database

Match each transaction against every candidate

Complexity ~ O(NMw) => Expensive since M = 2^{d} !!!

Complexiy:

Given d unique items:

Total number of itemsets = 2^{d}

Total number of possible association rules:

Frequent Itemset Generation Strategies

Reduce the number of candidates (M)

Reduce the number of transactions (N)

Reduce size of N as the size of itemset increases

Reduce the number of comparisons (NM)

Use efficient data structures to store the candidates or transactions

No need to match every candidate against every transaction
5.Apriori Algorithm: Mining SingleDimension Boolean AR

It is used to mine Boolean, singlelevel, and singledimension ARs.

Apriori Principle

Uses prior knowledge of frequent itemset properties.

It is an iterative algorithm known as levelwise search.

The search proceeds levelbylevel as follows:

First determine the set of frequent 1itemset; L1

Second determine the set of frequent 2itemset using L1: L2

Etc.

The complexity of computing Li is O(n) where n is the number of transactions in the transaction database.

Reduction of search space:

In the worst case what is the number of itemsets in a level Li?

Apriori uses “Apriori Property”:

Apriori Property:

It is an antimonotone property: if a set cannot pass a test, all of its supersets will fail the same test as well.

It is called antimonotone because the property is monotonic in the context of failing a test.

All nonempty subsets of a frequent itemset must also be frequent.

An itemset I is not frequent if it does not satisfy the minimum support threshold:
P(I) < min_sup

If an item A is added to the itemset I, then the resulting itemset I A cannot occur more frequently than I:
I A is not frequent
Therefore, P(I A) < min_sup

How Apriori algorithm uses “Apriori property”?

In the computation of the itemsets in L_{k} using L_{k1}

It is done in two steps:
5.1.Join Step: 
The set of candidate kitemsets (element of L_{k}), C_{k}, is generated by joining L_{k1} with itself:
L_{k1} ∞ L_{k1}

Given l_{1} and l_{2} of L_{k1}
L_{i}=l_{i1},l_{i2},l_{i3},…,l_{i(k2)},l_{i(k1)}
L_{j}=l_{j1},l_{j2},l_{j3},…,l_{j(k2)},l_{j(k1)}
Where L_{i} and L_{j} are sorted.

L_{i} and L_{j} are joined if there are different (no duplicate generation). Assume the following:
l_{i1}=l_{j1}, l_{i2}=l_{j1}, …, l_{i(k2)}=l_{j(k2)} and l_{i(k1) }< l_{j(k1)}

The resulting itemset is:
l_{i1},l_{i2},l_{i3},…,l_{i(k1)},l_{j(k1)}

Example of Candidategeneration:
L_{3}={abc, abd, acd, ace, bcd}
Selfjoining: L_{3 }∞ L_{3 }
abcd from abc and abd
acde from acd and ace
5.2.Prune step

C_{k} is a superset of L_{k} some itemset in C_{k} may or may not be frequent.

L_{k}: Test each generated itemset against the database:

Scan the database to determine the count of each generated itemset and include those that have a count no less than the minimum support count.

This may require intensive computation.

Use Apriori property to reduce the search space:

Any (k1)itemset that is not frequent cannot be a subset of a frequent kitemset.

Remove from C_{k} any kitemset that has a (k1)subset not in L_{k1} (itemsets that are not frequent)

Efficiently implemented: maintain a hash table of all frequent itemset.

Example of Candidategeneration and Pruning:
L_{3}={abc, abd, acd, ace, bcd}
Selfjoining: L_{3} ∞ L_{3 }
abcd from abc and abd
acde from acd and ace
Pruning:
acde is removed because ade is not in L_{3}
C_{4}={abcd}
5.3.Example
5.4.Pseudocode
C_{k}: Candidate itemset of size k
L_{k} : frequent itemset of size k
L_{1} = {frequent items};
for (k = 1; L_{k} !=Æ; k++) do

C_{k+1} = candidates generated from L_{k};
 for each transaction t in database do
increment the count of all candidates in C_{k+1} that are contained in t;
endfor;
 L_{k+1} = candidates in C_{k+1} with min_support
endfor;
return _{k} L_{k};
5.5.Challenges

Multiple scans of transaction database

Huge number of candidates

Tedious workload of support counting for candidates

Improving Apriori:

general ideas

Reduce passes of transaction database scans

Shrink number of candidates

Facilitate support counting of candidates
5.6.Improving the Efficiency of Apriori

Several attempts have been introduced to improve the efficiency of Apriori:

Hashbased technique

Hashing itemset counts

Example:

TID

List of Transactions

T100

I1,I2,I5

T200

I2,I4

T300

I2,I3

T400

I1,I2,I4

T500

I1,I3

T600

I2,I3

T700

I1,I3

T800

I1,I2,I3,I5

T900

I1,I2,I3


Create a hash table for candidate 2itemsets:

Generate all 2itemsets for each transaction in the transaction DB

H(x,y) = ((order of x) * 10 + (order of y)) mod 7

A 2itemset whose corresponding bucket count is below the support threshold cannot be frequent.

Bucket @

0

1

2

3

4

5

6

Bucket count

2

2

4

2

2

4

4

Content

{I1,I4}

{I1,I5}

{I2,I3}

{I2,I4}

{I2,I5}

{I1,I2}

{I1,I3}


{I3,I5}

{I1,I5}

{I2,I3}

{I2,I4}

{I2,I5}

{I1,I2}

{I1,I3}




{I2,I3}



{I1,I2}

{I1,I3}




{I2,I3}



{I1,I2}

{I1,I3}


Remember: support(xy) = percentage number of transactions that contain x and y. Therefore, if the minimum support is 3, then the itemsets in buckets 0, 1, 3, and 4 cannot be frequent and so they should not be included in C_{2}.

Reduce the number of transactions scanned in future iterations.

A transaction that does not contain any frequent kitemsets cannot contain any frequent (k+1)itemsets: Do not include such transaction in subsequent scans.

Other techniques include:

Partitioning (partition the data to find candidate itemsets)

Sampling (Mining on a subset of the given data)

Dynamic itemset counting (Adding candidate itemsets at different points during a scan)
6.Mining Frequent Itemsets without Candidate Generation

The bottleneck of Apriori: candidate generation

Huge candidate sets:

For 10^{4} frequent 1itemset, Apriori will generate 10^{7} candidate 2itemsets.

To discover a frequent pattern of size 100, e.g., {a_{1}, a_{2}, …, a_{100}}, one needs to generate
2^{100 } 10^{30} candidates.

Multiple scans of database:

Needs (n +1) scans, n is the length of the longest pattern.

Principal

Compress a large database into a compact, FrequentPattern tree (FPtree) structure

Highly condensed, but complete for frequent pattern mining

Avoid costly database scans

Develop an efficient, FPtreebased frequent pattern mining method

A divideandconquer methodology: decompose mining tasks into smaller ones

Avoid candidate generation: subdatabase test only!

Algorithm:

Scan DB once, find frequent 1itemset (single item pattern)

Order frequent items in frequency descending order, called L order: (in the example below: F(4), c(4), a(3), etc.)

Scan DB again and construct FPtree

Create the root of the tree and label it null or {}

The items in each transaction are processed in the L order (sorted according to descending support count).

Create a branch for each transaction

Branches share common prefixes

Example: min_support = 0.5


TID

Items bought

(Ordered) frequent items

100

{f, a, c, d, g, i, m, p}

{f, c, a, m, p}

200

{a, b, c, f, l, m, o}

{f, c, a, b, m}

300

{b, f, h, j, o}

{f, b}

400

{b, c, k, s, p}

{c, b, p}

500

{a, f, c, e, l, p, m, n}

{f, c, a, m, p}

Item

count

node pointer

child pointers

parent pointer

6.1.Mining Frequent patterns using FPTree 
General idea (divideandconquer)

Recursively grow frequent pattern path using the FPtree

For each item, construct its conditional patternbase, and then its conditional FPtree

Recursion: Repeat the process on each newly created conditional FPtree

Until the resulting FPtree is empty, or it contains only one path (single path will generate all the combinations of its subpaths, each of which is a frequent pattern)
6.2.Major steps to mine FPtrees

Construct conditional pattern base for each node in the FPtree

Construct conditional FPtree from each conditional patternbase

Recursively mine conditional FPtrees and grow frequent patterns obtained so far If the conditional FPtree contains a single path, simply enumerate all the patterns

Step 1: From FPtree to Conditional Pattern Base

Starting at the frequent header table in the FPtree

Traverse the FPtree by following the link of each frequent item, starting by the item with the highest frequency.

Accumulate all of transformed prefix paths of that item to form a conditional pattern base

Example:
Conditional pattern bases

Item

Conditional pattern base

c

f:3

a

fc:3

b

fca:1, f:1, c:1

m

fca:2, fcab:1

p

fcam:2, cb:1


Properties of FPtree for Conditional Pattern Base Construction:

Nodelink property

For any frequent item a_{i},_{ }all the possible frequent patterns that contain a_{i} can be obtained by following a_{i}'s nodelinks, starting from a_{i}'s head in the FPtree header.

Prefix path property

To calculate the frequent patterns for a node a_{i} in a path P, only the prefix subpath of a_{i} in P need to be accumulated and its frequency count should carry the same count as node a_{i}.

Step 2: Construct Conditional FPtree

For each patternbase

Accumulate the count for each item in the base

Construct the FPtree for the frequent items of the pattern base

Example:
mconditional pattern base: fca:2, fcab:1
All frequent patterns concerning m:
m,
fm, cm, am,
fcm, fam, cam,
fcam

Mining Frequent Patterns by Creating Conditional PatternBases:

Item

Conditional patternbase

Conditional FPtree

p

{(fcam:2), (cb:1)}

{(c:3)}p

m

{(fca:2), (fcab:1)}

{(f:3, c:3, a:3)}m

b

{(fca:1), (f:1), (c:1)}

Empty

a

{(fc:3)}

{(f:3, c:3)}a

c

{(f:3)}

{(f:3)}c

f

Empty

Empty


Why is FPTree mining fast?

The performance study shows FPgrowth is an order of magnitude faster than Apriori

Reasoning:

No candidate generation, no candidate test

Use compact data structure

Eliminate repeated database scan

Basic operation is counting and FPtree building

FPGrowth vs. Apriori: Scalability with the support Threshold [Jiawei Han and Micheline Kamber]
7.MultipleLevel Association Rules

Items often form hierarchy.

Items at the lower level are expected to have lower support.

Rules regarding itemsets at appropriate levels could be quite useful.

Transaction database can be encoded based on dimensions and levels

We can explore shared multilevel mining
7.1.Approach

A topdown, progressive deepening approach:

First find highlevel strong rules:
milk bread [20%, 60%].

Then find their lowerlevel “weaker” rules:
2% milk wheat bread [6%, 50%].

Variations at mining multiplelevel association rules.

Levelcrossed association rules:
2% milk Wonder wheat bread

Association rules with multiple, alternative hierarchies:
2% milk Wonder bread

Two multiplelevel mining associations strategies:

Uniform Support

Reduced support

Uniform Support: the same minimum support for all levels

One minimum support threshold.

No need to examine itemsets containing any item whose ancestors do not have minimum support.

Drawback:

Lower level items do not occur as frequently. If support threshold
too high miss low level associations
too low generate too many high level assoc.

Reduced Support: reduced minimum support at lower levels

There are 4 search strategies:

Levelbylevel independent

Levelcross filtering by kitemset

Levelcross filtering by single item

Controlled levelcross filtering by single item

LevelbyLevel independent:

Fullbreadth search

No background knowledge is used.

Each node is examined regardless the frequency of its parent.

Levelcross filtering by single item:

An item at the ith level is examined if and only if its parent node at the (i1)th level is frequent.

Levelcross filtering by kitemset:

A kitemset at the ith level is examined if and only if its corresponding parent kitemset at the (i1)th level is frequent.

This restriction is stronger than the one in levelcross filtering by single item

They are not usually many kitemsets that, when combined, are also frequent:
Many valuable patterns can be mined

Controlled levelcross filtering by single item:

A variation of the levelcross filtering by single item: Relax the constraint in this approach

Allow the children of items that do not satisfy the minimum support threshold to be examined if these items satisfy the level passage threshold:
level_passage_supp

level_passage_sup Value: It is typically set between the min_sup value of the given level and the min_sup of the next level.

Example:
7.2.Redundancy Filtering

Some rules may be redundant due to “ancestor” relationships between items.

Definition: A rule R1 is an ancestor of a rule, R2, if R1 can be obtained by replacing the items in R2 by their ancestors in a concept hierarchy.
R1: milk wheat bread [support = 8%, confidence = 70%]
R2: 2% milk wheat bread [support = 2%, confidence = 72%]
Milk in R1 is an ancestor of 2% milk in R2.

We say the first rule is an ancestor of the second rule.

A rule is redundant if its support is close to the “expected” value, based on the rule’s ancestor:

R2 is redundant since its confidence is close to the confidence of R1 (kind of expected) and its support is around 2% = (8% * ¼)

R2 does not add any additional information.
A . Bellaachia Page:
