Market Basket Analysis Using Association Rules in R
来源:互联网 发布:模拟关注软件app 编辑:程序博客网 时间:2024/06/05 22:38
learn to use association rules in R, refer to the book: Machine Learning with R
- Basic idea:
Apriori property: all subsets of a frequent itemset must also be frequent.to reduce theassociation rule search space.
Whether or not an association rule is deemed interesting is determined by two statistical measures:support andconfidence measures.
Measuring rule interest –support and confidence:
The support of an itemset or rule measures how frequently it occurs in the data:
A rule's confidence is a measurement of its predictive power or accuracy:
- Practice:
Step 1 – collecting data
install "arules" R package and use the data “Groceries
Step 2 – exploring and preparing the data
a) summary the Groceries:
library(arules)library(Matrix)summary(Groceries)
outcome:
transactions as itemMatrix in sparse format with
9835 rows (elements/itemsets/transactions) and
169 columns (items) and a density of 0.02609146
most frequent items:
whole milk other vegetables rolls/buns soda yogurt (Other)
2513 1903 1809 1715 1372 34055
element (itemset/transaction) length distribution:
sizes
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
2159 1643 1299 1005 855 645 545 438 350 246 182 117 78 77 55 46 29 14 14
20 21 22 23 24 26 27 28 29 32
9 11 4 6 1 1 1 1 3 1
Min. 1st Qu. Median Mean 3rd Qu. Max.
1.000 2.000 3.000 4.409 6.000 32.000
includes extended item information - examples:
labels level2 level1
1 frankfurter sausage meat and sausage
2 sausage sausage meat and sausage
3 liver loaf sausage meat and sausage
b) inspect
inspect(Groceries[1:5])
outcome:
items
[1] {citrus fruit,semi-finished bread,margarine,ready soups}
[2] {tropical fruit,yogurt,coffee}
[3] {whole milk}
[4] {pip fruit,yogurt,cream cheese ,meat spreads}
[5] {other vegetables,whole milk,condensed milk,long life bakery product}
c) To examine a particular item
itemFrequency(Groceries[, 1:3])
d) Plot fuction
1) Support>=0.10 frequency items plot
itemFrequencyPlot(Groceries, support = 0.1)
2) TOP 20 frequency items plot
c) plotting the sparse matrix
image(Groceries[1:5])
image(sample(Groceries,100))
Step 3 – training a model on the data
a) train model
Groceriesrules<-apriori(Groceries,parameter = list(support=0.01,confidence=0.25, minlen=2))inspect(Groceriesrules[1:3])
Outcome:
b) Set confidence as 0.25 then
Groceriesrules<-apriori(Groceries,parameter =list(support=0.006,confidence=0.25, minlen=2))
outcome:
See the rules:
Groceriesrules
Outcome:
set of 463 rules
Step 4 – evaluating model performance
a) See the summary of rules:
summary(Groceriesrules)
outcome:
Lift: how much more likely one item or itemset is purchased relative to its typical rate of purchase.
e.g. if lift(milk → bread) is greater than one, it implies that the two items are found together more often than one would expect by chance.
b) rule mining
inspect(Groceriesrules[1:3])
outcome:
The first rule can be read in plain language as, "if a customer buys pot plants, they will also buy whole milk." With support of 0.007 and confidence of 0.400, we can determine that this rule covers 0.7 percent of the transactions andis correct in 40 percent of purchases involving potted plants. The lift value tells ushow much more likely a customer is to buy whole milkrelative to the average customer, given thathe or she bought a potted plant. Since we know that about 25.6 percent of the customers bought whole milk (support), while 40 percent of the customers buying a potted plant bought whole milk (confidence), we can compute the lift value as 0.40 / 0.256 = 1.56, which matches the value shown.
Step 5 – improving model performance
To sort rules due to different criteria:
a) find the rules with high support, confidence, or lift
inspect(sort(groceryrules, by = "lift")[1:5])outcome:
The first rule, with a lift of about 3.96, implies that people who buy herbs are nearly four times more likely to buy root vegetables than the typical customer.
Order by desc:
parameterdecreasing = FALSE
b) Taking subsets of association rules
investigate whether berries are often purchased with other items:
berryrules <- subset(Groceriesrules, items %in% "berries")inspect(berryrules)
Outcome:
- When you need “berries” only appear in the left or right side, replace “items” with “lhs/rhs”.
- Chose rules contain “berries” or “yogurt”, replace “%in% "berries"” with “items %in%c("berries", "yogurt”)”
- Additional operators are available for partial matching (%pin%) and complete matching (%ain%)
- “confidence > 0.50” would limit you to the rules with confidence greater than 50 percent
- Matching criteria can be combined with the standard R logical operators such as and (&), or (|), and not (!)
Saving association rules to a file or data frame
a) Save rules as .csv format
setwd("C:\\myprojectR\\Groceries")write(Groceriesrules, file = "Groceriesrules.csv", sep = ",", quote = TRUE, row.names = FALSE)
b) Save rules as R data frame
groceryrules_df <- as(Groceriesrules, "data.frame")
see the results:
str(groceryrules_df)
You might choose to do this if you want to perform additional processing on the rules or need to export them to another database
阅读全文
0 0
- Market Basket Analysis Using Association Rules in R
- Market Basket Analysis with R
- 市场购物篮分析 Market Basket Analysis
- kaggle案例--Instacart Market Basket Analysis
- 【Data Algorithms_Recipes for Scaling up with Hadoop and Spark】Chapter 7 Market Basket Analysis
- Association Rules
- 【论文笔记】Mining Association Rules in Spatio-Temporal Data
- Association Rules--Apriori Algorithm
- laser processing equipment market analysis in the world
- Time Serise Analysis[Using R]
- 关联规则(Association Rules)
- Association Rules--FP-growth Algorithm
- interactive association rules exploration app
- Interactive association rules exploration app
- Denoising Autoencoder for Collaborative Filtering on Market Basket Data
- 学习Introduction to Data Analysis using R系列
- Data Mining Week3 - Association Rules & Decision List
- Association Rules and the apriori algorithm
- 282. Expression Add Operators
- MySQL主从分离
- 第十周作业
- LeetCode-49. Group Anagrams
- 数值的整数次方
- Market Basket Analysis Using Association Rules in R
- 20171108_linux_ubuntu16.04.3LTS_nasm源码安装过程
- mips 指令对指定的数组进行排序
- 安卓手机才会出现的“该证书并非来自可信赖的授权中心”
- 50个台阶走法
- 软键盘引起的布局问题
- 矩阵乘法优化DP
- <!DOCTYPE html>作用
- Python appium selenium当页面元素 没有加载出来时 ,等待元素加载出来的方法