We begin with the unlabeled data, representing customer purchases of different products. A value of 1 indicates a purchase, while 0 indicates no purchase.
Unlabeled data
Unlabeled customer purchase data
Customer 1 |
0 |
1 |
1 |
0 |
0 |
0 |
Customer 2 |
1 |
0 |
0 |
0 |
0 |
1 |
Customer 3 |
1 |
0 |
0 |
0 |
1 |
1 |
Customer 4 |
0 |
0 |
1 |
1 |
0 |
0 |
Customer 5 |
0 |
0 |
0 |
1 |
1 |
1 |
Customer 6 |
1 |
1 |
1 |
0 |
0 |
0 |
Customer 7 |
0 |
1 |
1 |
1 |
0 |
0 |
Customer 8 |
1 |
1 |
0 |
0 |
0 |
0 |
First, we apply clustering to group customers with similar product purchasing behavior. Customers with similar rows (purchase patterns) are grouped together.
Customers 1, 6, 7, and 8 tend to buy products 1–3, often in combination.
Cluster A: customers 1, 6, 7, 8
Customer 1 |
0 |
1 |
1 |
0 |
0 |
0 |
Customer 6 |
1 |
1 |
1 |
0 |
0 |
0 |
Customer 7 |
0 |
1 |
1 |
1 |
0 |
0 |
Customer 8 |
1 |
1 |
0 |
0 |
0 |
0 |
Customers 2, 3, and 5 show a preference for product 6 and product 5.
Cluster B: customers 2, 3, 5
Customer 2 |
1 |
0 |
0 |
0 |
0 |
1 |
Customer 3 |
1 |
0 |
0 |
0 |
1 |
1 |
Customer 5 |
0 |
0 |
1 |
1 |
1 |
1 |
Customer 4 has a unique pattern and does not fit well into either cluster.
Second, we perform association rule mining to find relationships between product purchases, i.e., which products are frequently bought together.
Support |
% of transactions that contain both items (how common is the combo?) |
Confidence |
% of times Product B is bought when Product A is bought (A → B) |
Lift |
How much more likely A and B are bought together vs. by chance |
Confidence: table of association rules
P1 → P2 |
2 |
4 |
50.0% |
50.0% |
1.00 |
P1 → P3 |
1 |
4 |
25.0% |
50.0% |
0.50 |
P1 → P4 |
0 |
4 |
0.0% |
37.5% |
0.00 |
P1 → P5 |
1 |
4 |
25.0% |
25.0% |
1.00 |
P1 → P6 |
2 |
4 |
50.0% |
37.5% |
1.33 |
P2 → P1 |
2 |
4 |
50.0% |
50.0% |
1.00 |
P2 → P3 |
3 |
4 |
75.0% |
50.0% |
1.50 |
P2 → P4 |
1 |
4 |
25.0% |
37.5% |
0.67 |
P2 → P5 |
0 |
4 |
0.0% |
25.0% |
0.00 |
P2 → P6 |
0 |
4 |
0.0% |
37.5% |
0.00 |
P3 → P1 |
1 |
4 |
25.0% |
50.0% |
0.50 |
P3 → P2 |
3 |
4 |
75.0% |
50.0% |
1.50 |
P3 → P4 |
2 |
4 |
50.0% |
37.5% |
1.33 |
P3 → P5 |
0 |
4 |
0.0% |
25.0% |
0.00 |
P3 → P6 |
0 |
4 |
0.0% |
37.5% |
0.00 |
P4 → P1 |
0 |
3 |
0.0% |
50.0% |
0.00 |
P4 → P2 |
1 |
3 |
33.3% |
50.0% |
0.67 |
P4 → P3 |
2 |
3 |
66.7% |
50.0% |
1.33 |
P4 → P5 |
1 |
3 |
33.3% |
25.0% |
1.33 |
P4 → P6 |
1 |
3 |
33.3% |
37.5% |
0.89 |
P5 → P1 |
1 |
2 |
50.0% |
50.0% |
1.00 |
P5 → P2 |
0 |
2 |
0.0% |
50.0% |
0.00 |
P5 → P3 |
0 |
2 |
0.0% |
50.0% |
0.00 |
P5 → P4 |
1 |
2 |
50.0% |
37.5% |
1.33 |
P5 → P6 |
2 |
2 |
100.0% |
37.5% |
2.67 |
P6 → P1 |
2 |
3 |
66.7% |
50.0% |
1.33 |
P6 → P2 |
0 |
3 |
0.0% |
50.0% |
0.00 |
P6 → P3 |
0 |
3 |
0.0% |
50.0% |
0.00 |
P6 → P4 |
1 |
3 |
33.3% |
37.5% |
0.89 |
P6 → P5 |
2 |
3 |
66.7% |
25.0% |
2.67 |
Insights
Based on the association rules analysis, we can derive the following insights:
- Strong product pairings:
- Products 5 and 6 have the strongest association with the highest lift value (2.67) in both directions. Customers who buy either one are much more likely to buy the other, suggesting these products strongly complement each other.
- Products 2 and 3 also show a strong positive association (lift = 1.50), with 75% of product 2 buyers also purchasing product 3.
- One-way associations:
- The rule P4 → P3 shows that 66.7% of customers who buy product 4 also buy product 3 (lift = 1.33), but the reverse isn’t as strong (P3 → P4 has only 50% confidence).
- This suggests Product 4 buyers are a subset of product 3 buyers, but not vice versa.
- Product clustering:
- Group 1: Products 5 and 6 (strongest association)
- Group 2: Products 2 and 3 (strong association)
- Group 3: Products 1 and 6 (moderate association)
- Product Independence and Negative Associations:
- Several products never appear together (lift = 0), such as P1 and P4, P2 and P5, P3 and P5.
- This suggests potential product incompatibility or different customer segments.
- Product Popularity:
- Products 1, 2, and 3 have the highest support (each purchased by 50% of customers)
- Product 5 has the lowest support (only 25%)
- Business Applications:
- Bundle marketing: The P5 → P6 rule with 100% confidence and 2.67 lift suggests these products could be effectively bundled.
- Recommendation systems: When a customer buys product 4, recommending product 3 would be logical (66.7% confidence).