Overview of Association Rules


<aside> <img src="/icons/table_red.svg" alt="/icons/table_red.svg" width="40px" /> Table of Contents

</aside>

<aside> 💡

  1. Association Rule Learning </aside>

<aside> 💡

  1. Market Basket Analysis Examples </aside>

Association Rule Learning


  1. Association Rule Learning: A Comprehensive Explanation

    1. Association Rule Learning is a data mining technique used to identify relationships or patterns between items in transactional data. It is widely used in Market Basket Analysis to gain insights into customer buying behaviors.

  2. Market Basket Analysis

    1. Market Basket Analysis analyzes customer transactions to identify patterns. For example:
      1. Mothers with babies often buy milk and diapers.
      2. Bachelors may purchase shaving cream and chips.
      3. Damsels may frequently buy makeup products.
    2. Understanding these associations allows retailers to:
      1. Design targeted promotions (e.g., discounts on diapers if milk is purchased).
      2. Optimize shelf placements by placing frequently purchased items together.

  3. Components of Association Rule Learning

    1. Frequent Itemsets
      1. A frequent itemset refers to a group of items frequently bought together in transactions. For example:
      2. Milk and Bread are frequently purchased together.
      3. Bread and Butter are common combinations.
    2. Association Rules
      1. Association Rules are if-then relationships between items:
      2. Example: If Bread, Then Butter.
        1. Antecedent (If part): Bread.
        2. Consequent (Then part): Butter.
    3. Metrics to Evaluate Association Rules
      1. Association rules are evaluated based on Support, Confidence, and Lift:
      2. Support tells how frequently a rule applies across all transactions.
      3. Confidence tells how often the rule is correct when the antecedent occurs.
      4. Lift tells the strength of the rule compared to random co-occurrence.
      5. Support (S)
        1. Measures how frequently an itemset occurs in the dataset.

        2. It is a probability that a transaction contains {X U Y}.

        3. i.e. It is the percentage of transaction in which item X and Y occurred together.

        4. Formula:

          $$ \text{Support (S)} = \frac{\text{Number of transactions containing X ∪ Y}}{\text{Total number of transactions}} $$

        5. Example: For a rule {Milk, Bread} → Butter, Support indicates the percentage of transactions where Milk, Bread, and Butter were bought together.

      6. Confidence (C)
        1. Measures the likelihood of Consequent being purchased when the Antecedent is purchased.

        2. Or we can say It is a conditional probability that transaction having X also contains Y.

        3. i.e. It is the probability that if the L.H.S. appears in a transaction then also the R.H.S. will.

        4. Formula:

          $$ \text{Confidence (C)} = \frac{\text{Number of transactions containing X ∪ Y}}{\text{Number of transactions containing X}} $$

        5. Example: For the rule {Milk, Bread} → Butter, Confidence tells us the percentage of transactions with Milk and Bread that also include Butter.

      7. Lift (L)
        1. Measures the strength of a rule by comparing its Confidence to the overall probability of the Consequent.

        2. Formula:

          $$ \text{Lift (L)} = \frac{\text{Confidence (X → Y)}}{\text{Support (Y)}} $$

        3. Lift > 1: Rule is stronger than random chance.

        4. Lift = 1: Rule is independent.

        5. Lift < 1: Rule is weaker than random chance.


  4. Apriori Algorithm Overview

    1. The Apriori algorithm is divided into two parts:
    2. Finding Frequent Item Sets:
      1. Iteratively identify frequent itemsets by filtering out those that do not meet the minimum support threshold.
      2. Use previously identified frequent itemsets to generate candidate itemsets for the next level.
    3. Finding Association Rules:
      1. For each frequent itemset, generate all possible rules.
      2. Calculate confidence for each rule.
      3. Retain the rules that meet the minimum confidence threshold.