mahout itemCF 简单使用
来源:互联网 发布:sumts是什么网络类型 编辑:程序博客网 时间:2024/06/07 09:20
一、itemCF 测试
mahout版本 0.10.0
mahout 提供了很多的算法,其中比较常用的算是itemCF了这里记录一下itemcf的使用方法
1、数据准备,这里是使用自己采集的一些行为数据 ,数据不多,但是可以测试出结果:
下面三列分别是 user_id , item_id , perfence
把以下数据存放到hdfs上,我存放的路径是/mahout/itemcf/data1/itemdata.data
0162381440670851711,4,7.0
0162381440670851711,11,4.0
0162381440670851711,32,1.0
0162381440670851711,176,27.0
0162381440670851711,183,11.0
0162381440670851711,184,5.0
0162381440670851711,207,9.0
0162381440670851711,256,3.0
0162381440670851711,258,4.0
0162381440670851711,259,16.0
0162381440670851711,260,8.0
0162381440670851711,261,18.0
0162381440670851711,301,1.0
0162381440670851711,307,1.0
0162381440670851711,477,1.0
0162381440670851711,518,1.0
0162381440670851711,549,3.0
0162381440670851711,570,1.0
0162381440670851711,826,2.0
0357211441096952115,207,1.0
0617721441096186493,184,1.0
0617721441096186493,207,1.0
1205421441071459451,5,1.0
1214361441096861254,207,1.0
1401731441095483081,258,1.0
1401731441095483081,814,4.0
1401731441095483081,826,1.0
1917281441163686119,259,10.0
1917281441163686119,260,1.0
1917281441163686119,261,3.0
1966141441163860798,176,1.0
2294491441095342047,176,1.0
2441031440670827430,4,13.0
2441031440670827430,259,29.0
2441031440670827430,261,14.0
2441031440670827430,460,2.0
2441031440670827430,477,6.0
2441031440670827430,570,1.0
2441031440670827430,577,6.0
2441031440670827430,702,1.0
2441031440670827430,758,2.0
2441031440670827430,809,1.0
2475791441161318569,176,1.0
2987091441068878630,261,1.0
3114261440726814722,549,1.0
3445831441096810087,207,1.0
3846061441096937902,207,1.0
4266911441160164599,176,1.0
4698311441097046150,176,2.0
4698311441097046150,183,2.0
4698311441097046150,184,4.0
4698311441097046150,207,6.0
4946291441097563245,183,1.0
4956331440750398178,159,1.0
4956331440750398178,160,1.0
5307571441160362208,4,1.0
5307571441160362208,176,1.0
5719691441098504387,176,5.0
5719691441098504387,184,1.0
5719691441098504387,207,1.0
5813281441095425044,184,2.0
5813281441095425044,258,1.0
5894601441095265604,184,1.0
5981521441096106535,207,1.0
6292291441096870187,207,1.0
6533651441161410910,176,1.0
6810691441096902907,207,1.0
6836071440729632252,4,3.0
6836071440729632252,49,1.0
6836071440729632252,259,2.0
6836071440729632252,570,1.0
6836071440729632252,577,2.0
6964141441160527746,176,1.0
7495291441096796843,207,1.0
7616681441095305067,183,1.0
7616681441095305067,184,2.0
7616681441095305067,258,2.0
7616681441095305067,261,1.0
7732211441095211112,183,1.0
7732211441095211112,259,2.0
7732211441095211112,260,9.0
7732211441095211112,261,1.0
7732211441095211112,632,6.0
8211761441096060717,176,1.0
8211761441096060717,183,1.0
8305691441168039389,259,3.0
8305691441168039389,260,2.0
8305691441168039389,261,1.0
8375281440837772178,527,1.0
8432311440724457499,290,1.0
8641451441097297246,183,1.0
8641451441097297246,184,1.0
8641451441097297246,207,1.0
8641451441097297246,259,1.0
8641451441097297246,263,1.0
8641451441097297246,838,1.0
8641451441097297246,839,1.0
8641451441097297246,840,1.0
8651081441095283643,176,2.0
8651081441095283643,183,7.0
8753221441095342356,176,1.0
2、使用mahout自带的算法 实现协同过滤:
语句如下:
bin/hadoop jar /home/lin/hadoop/mahout-distribution-0.10.0/mahout-examples-0.10.0-job.jar org.apache.mahout.cf.taste.hadoop.item.RecommenderJob -i /mahout/itemcf/data1 -o /mahout/itemcf/result1 -s SIMILARITY_LOGLIKELIHOOD --tempDir /mahout/itemcf/temp1
其中 -i 后面是输入数据存放地址,也就是上面给的测试数据;
-o 后面是结果输出地址,这个文件夹不用建立,mahout会自动建立,若是已经存在则会报错;
--tempDir 是指临时存放的一些输出数据,mahout自己的一些输出 ,这个路径mahout自动创建,若是存在会报错;
-s 是指定使用算法;可以根据自己的需要选择;
具体的help如下
Job-Specific Options:
--input (-i) input Path to job input
directory.
--output (-o) output The directory
pathname for output.
--similarityClassname (-s) similarityClassname Name of distributed
similarity measures
class to instantiate,
alternatively use one
of the predefined
similarities
([SIMILARITY_COOCCURRE
NCE,
SIMILARITY_LOGLIKELIHO
OD,
SIMILARITY_TANIMOTO_CO
EFFICIENT,
SIMILARITY_CITY_BLOCK,
SIMILARITY_COSINE,
SIMILARITY_PEARSON_COR
RELATION,
SIMILARITY_EUCLIDEAN_D
ISTANCE])
--maxSimilaritiesPerItem (-m) maxSimilaritiesPerItem try to cap the number
of similar items per
item to this number
(default: 100)
--maxPrefs (-mppu) maxPrefs max number of
preferences to
consider per user or
item, users or items
with more preferences
will be sampled down
(default: 500)
--minPrefsPerUser (-mp) minPrefsPerUser ignore users with
less preferences than
this (default: 1)
--booleanData (-b) booleanData Treat input as
without pref values
--threshold (-tr) threshold discard item pairs
with a similarity
value below this
--randomSeed randomSeed use this seed for
sampling
--help (-h) Print out help
--tempDir tempDir Intermediate output
directory
--startPhase startPhase First phase to run
--endPhase endPhase Last phase to run
3、执行上述命令后,等待执行完毕,在目录 /mahout/itemcf/result1 可以看到如下数据:
162381440670851711[809:13.535571,702:13.535571,460:13.535571,758:13.535571,632:13.182321,577:12.929438,49:11.368558,307:10.562227,32:10.562227,518:10.562227]
617721441096186493[839:1.0,259:1.0,518:1.0,826:1.0,11:1.0,260:1.0,4:1.0,32:1.0,176:1.0,840:1.0]
1401731441095483081[11:1.0,570:1.0,518:1.0,307:1.0,260:1.0,259:1.0,549:1.0,32:1.0,207:1.0,184:1.0]
1917281441163686119[577:7.365086,702:6.5,809:6.5,758:6.5,460:6.5,184:5.9840446,176:5.981493,4:5.577299,570:5.3220325,477:4.9567957]
2441031440670827430[632:21.5,176:18.084661,183:15.684914,260:14.2175,207:13.510652,11:12.28147,307:12.28147,32:12.28147,518:12.28147,256:12.28147]
4698311441097046150[263:3.9337947,839:3.9337947,840:3.9337947,838:3.9337947,11:3.4747553,307:3.4747553,32:3.4747553,518:3.4747553,256:3.4747553,301:3.4747553]
5307571441160362208[826:1.0,259:1.0,518:1.0,307:1.0,11:1.0,260:1.0,549:1.0,32:1.0,207:1.0,184:1.0]
5719691441098504387[4:3.6454906,259:3.6147578,260:2.67091,261:2.6694102,183:2.517088,307:2.2876854,11:2.2876854,32:2.2876854,518:2.2876854,256:2.2876854]
5813281441095425044[207:1.8607497,259:1.6642486,183:1.5539461,301:1.4806436,11:1.4806436,307:1.4806436,32:1.4806436,518:1.4806436,256:1.4806436,549:1.4099455]
6836071440729632252[207:2.6088793,176:2.3617313,477:1.9966183,460:1.9945599,758:1.9945599,809:1.9945599,702:1.9945599,11:1.9926376,307:1.9926376,32:1.9926376]
7616681441095305067[826:1.5790755,207:1.5721571,549:1.535743,301:1.50748,307:1.50748,11:1.50748,32:1.50748,518:1.50748,256:1.50748,839:1.5]
7732211441095211112[826:3.7059078,549:3.7059078,307:3.3461132,256:3.3461132,518:3.3461132,11:3.3461132,301:3.3461132,32:3.3461132,570:3.1800203,477:3.1795032]
8211761441096060717[826:1.0,259:1.0,518:1.0,307:1.0,11:1.0,260:1.0,549:1.0,32:1.0,207:1.0,184:1.0]
8305691441168039389[577:2.2471673,4:2.083036,570:2.0549815,809:2.0,460:2.0,11:2.0,826:2.0,32:2.0,307:2.0,549:2.0]
8641451441097297246[11:1.0,632:1.0,518:1.0,826:1.0,260:1.0,570:1.0,549:1.0,32:1.0,307:1.0,477:1.0]
8651081441095283643[184:6.597979,258:6.1955295,260:6.1955295,826:5.5266876,549:5.5266876,477:5.5266876,259:4.662548,261:4.662548,11:4.626224,307:4.626224]
这样就得出了每个用户的推荐物品;
mahout 还有一个经常用到的算法 物品相似度 ,这样得到的结果是物品间的相度:
mahout itemsimilarity -i /mahout/itemcf/data1 -o /mahout/itemcf/result1 -s SIMILARITY_LOGLIKELIHOOD --tempDir /mahout/itemcf/temp1
0 0
- mahout itemCF 简单使用
- mahout学习(1)userCF+itemCF
- Mahout分布式程序开发 基于物品的协同过滤ItemCF
- Mahout分步式程序开发 基于物品的协同过滤ItemCF【一起学Mahout】
- Mahout分步式程序开发 基于物品的协同过滤ItemCF
- 【甘道夫】如何在cdh5.2上运行mahout的itemcf on hadoop
- Mahout分步式程序开发 基于物品的协同过滤ItemCF
- mahout分类简单demo
- mahout入门之初次使用mahout
- 使用 Mahout 实现集群
- mahout使用KMeans算法
- Mahout推荐引擎使用
- Mahout使用入门
- Mahout使用入门
- mahout中的kmeans简单实例
- Mahout协同推荐简单实例
- Mahout协同推荐简单实例
- Mahout--简单推荐系统Demo
- static_cast, dynamic_cast, reinterpret_cast, const_cast区别比较
- PHP学习笔记
- bug日记三
- 网络编程
- jquery获取屏幕高度
- mahout itemCF 简单使用
- Plus One
- 使用二维码生成框架libqrencode时遇到的问题解决办法
- Day1_Java基础
- java.util.ConcurrentModificationException
- Python基础教程(九):面向对象、正则表达式
- shareSDK使用分享
- 为什么程序员英文要好?(From:V2EX)
- 小知识汇总