Python3.4 Apriori算法
来源:互联网 发布:知乎什么话题最火 编辑:程序博客网 时间:2024/06/06 09:39
'''思路: 1、生成L1; 2、Ck生成Lk; 3、Lk生成Ck+1; 4、得到频繁项集;(前面只用到支持度); 5、通过频繁项集生成关联规则。(用到置信度).'''import collectionsdatasets = [ ["I1","I2","I5"], ["I2","I4"], ["I2","I3"], ["I1","I2","I4"], ["I1","I3"], ["I2","I3"], ["I1","I3"], ["I1","I2","I3","I5"], ["I1","I2","I3"] ]#使用绝对最小支持度,即为频数,相对支持度表示概率,用前面的频数除以记录条数nmin_support = 2min_confidence = 0.5Lk_all = {}#将每条记录内容进行连接,最终以字符串形式连接每条记录里的各个元素,它们之间使用逗号隔开def getdatasetsString(): result = [] for transation in datasets: tmp = "" for item in transation: tmp+=item if item != transation[-1]: tmp += "," result.append(tmp) return result#print(getdatasetsString())#得到L1def getL1(): tmp = [] L1 = {} for transaction in datasets: for item in transaction: tmp.append(item) rs = collections.Counter(tmp) #使用库函数直接统计出现的次数,避免手动计数。 for key in sorted(rs): if(rs[key] >= min_support): #通过最小支持度进行剪枝操作,得到L1 L1[key] = rs[key] return L1#L1 = getL1()#print("L1:",L1)#Ck生成Lkdef getLk(Ck): Lk = {} result = getdatasetsString() #print(result) for items in Ck: tmpitems = items.split(",") count = 0 for r in result: flag = True for item in tmpitems: if item in r: continue else: flag = False break if flag: count = count + 1 if (count >= min_support): Lk[items] = count return Lk#由Lk生成Ck+1def getCk1(Lk): Ck= [] keys = list(Lk.keys()) keys1 = list(Lk.keys()) for key in keys: keys1.remove(key) #自连接时去掉前面出现过的元素 for key1 in keys1: key_last = key.split(",")[-1] #获取最后一个元素 key1_last = key1.split(",")[-1] key_pre = key.replace(key_last,"") key1_pre = key1.replace(key1_last,"") if( key_pre == key1_pre): if (key_last>key1_last): s = key_pre+key1_last+","+key_last #还需要进一步判断s所有少一个元素的子串是否是频繁项集 else: s = key_pre+key_last+","+key1_last #还需要进一步判断s所有少一个元素的子串是否是频繁项集 subc = getSubString(s) #获取所有少一个元素的子串,以list形式存起来,与候选项集相同形式存储 subl = getLk(subc) #通过Ck生成Lk if(len(subc) == len(subl)): #如果Ck与Lk元素个数相同,则说明ck里所有元素都是频繁的 Ck.append(s) return Ck; #得到最终频繁项集def getFiallyLk(L1): Lk = L1 while(True): Ck1 = getCk1(Lk) Lk1 = getLk(Ck1) #print("Lk:",Lk) for key in Lk: Lk_all[key] = Lk[key] if (len(Lk1) == 0): break Lk = Lk1 return Lk#获取所有少一个元素的子串,以list形式存起来,与候选项集相同形式存储,比如'abc',则得到'ab'、'ac'和'bc'def getSubString(s): substring = [] items = s.split(",") for item in items: if item == items[-1]: substring.append(s.replace(","+item,"")) else: substring.append(s.replace(item+",","")) return substring#Lk = getFiallyLk(L1)#print("最终频繁项集:",Lk)#下面将根据频繁项集获取强关联规则from itertools import combinationstry: from functools import reduceexcept ImportError: pass#通过函数f就可以获取列表的所有非空子集(不包括本身)f = lambda s: reduce(list.__add__, ( list(map(list, combinations(s, l))) for l in range(1, len(s))), [])#获取强关联规则def getRule(Lk): rules = {} for Lk_item in Lk: items = Lk_item.split(",") #n = len(datasets) #由于概率过程中都除以n,所以可以约掉而不用计算 N = Lk[Lk_item] #P(X->Y) = p(XY)/P(X) = N(XY)/N(X),这里的N就是计算出了N(XY) pre_item_list = f(items) for pre_item in pre_item_list: back_item = sorted(list(set(items)^set(pre_item)) ) #得到2个list的差集 #print(pre_item,"=>",back_item) pre_s = "" for item in pre_item: if len(pre_item) == 1: pre_s += item else: if item == pre_item[-1]: pre_s += item else: pre_s += item + "," back_s = "" for item in back_item: if len(back_item) == 1: back_s += item else: if item == back_item[-1]: back_s += item else: back_s += item + "," #下面计算N(X) N1 = Lk_all[pre_s] p = N/N1 if(p >= min_confidence): rule = pre_s+" => "+back_s #print(rule) rules[rule] = p return rules #print(Lk_all) #存放所有频繁项集里的元素出现的次数if __name__ == "__main__": L1 = getL1() Lk = getFiallyLk(L1) print("最终频繁项集:",Lk) rules = getRule(Lk) for rule in rules: print(rule,":",rules[rule]) ''' C2 = getCk1(L1) print("C2:",C2) L2 = getLk(C2) print("L2:",L2) C3 = getCk1(L2) print("C3:",C3) L3 = getLk(C3) print("L3:",L3) '''
最终输出结果截图:
0 0
- Python3.4 Apriori算法
- Python3.6-Apriori算法进行关联分析
- 【机器学习实战-python3】使用Apriori算法进行关联 分析
- Apriori算法
- Apriori算法
- Apriori算法
- apriori 算法
- Apriori算法
- Apriori算法
- Apriori算法
- Apriori算法
- Apriori算法
- Apriori算法
- Apriori算法
- Apriori算法
- Apriori算法
- Apriori算法
- Apriori算法
- LinkedHashMap源码探究
- 正确区分标识(zhi)符、关键字与保留字
- NULL的陷阱:Merge
- 报错记录
- 显示一个计时器
- Python3.4 Apriori算法
- Centos 6.x部署FreeSWITCH
- angular JS 基于ionic框架 开发移动端项目 实现进入前台 判断用户权限 控制项目UI布局和tab的部门显示和隐藏
- 添加css的方式:link与@import区别
- overflow溢出处理
- 连接服务器上的mysql
- 唯快不破:提升Web 应用的 13 个优化
- Android Studio 2.2导入eclipse版Android工程
- Freesclae i.MX6 Linux PCIE驱动源码分析