Mahout K-Means输出结果解析
来源:互联网 发布:幽灵虎淘宝快10万了 编辑:程序博客网 时间:2024/05/18 02:15
怎么使用Mahout做聚类有空我会专门写的,这篇博客主要为了讲一下Mahout处理的结果。
Mahout版本为0.9,数据没做归一化、标准化,只是为了测试。
输出目录下有clusteredPoints、cluster-x、cluster-(x+1)-final等几个文件夹,x表示第x次迭代,每次的迭代结果都会存到cluster-x,最后一次(x+1)迭代结果存在cluster-(x+1)-final,clusteredPoints下存的也是最后聚类结果,但它俩存的东西不太一样,一个是类,一个是点,具体情况请看下面。
ps:
mahout clusterdump 解析ClusterWritable并转成可读文件 -of TEXT,CSV等,后面有贴的
#最后聚类结果(类名称vl-x,中心点位置c,半径r,类中点个数n)[root@drguo home]# mahout clusterdump -i file:///home/guo/Desktop/output/clusters-2-final -o /home/guo/Desktop/resultVL-0{n=7 c=[1.714, 2.286, 4.429, 0.857, 7.571] r=[2.185, 2.711, 6.884, 2.100, 5.233]}VL-1{n=3 c=[0.667, 8.667, 11.333, 5.333, 0.667, 4.333, 1.667, 3.333, 21.667] r=[0.943, 5.437, 5.185, 7.542, 0.943, 6.128, 2.357, 4.714, 9.428]}#最后聚类结果(key:所属类,value:权重wt、距离、向量(这是有名字的namedvector,不是普通的哦,之后我也会专门写如何生成))[root@drguo clusteredPoints]# mahout seqdumper -i file:///home/guo/Desktop/output/clusteredPoints -o /home/guo/Desktop/pointsInput Path: file:/home/guo/Desktop/output/clusteredPoints/part-m-0Key class: class org.apache.hadoop.io.IntWritable Value Class: class org.apache.mahout.clustering.classify.WeightedPropertyVectorWritableKey: 0: Value: wt: 0.7140480784137244 distance: 6.885358615591935 vec: 001461E4-86C64780-A0B495C4-D19BA86F__201601 = [5.000, 6.000, 6.000]Key: 1: Value: wt: 0.6106543697821432 distance: 11.445523142259598 vec: 001461E4-86C64780-A0B495C4-D19BA86F__201602 = [12.000, 15.000, 15.000]Key: 1: Value: wt: 0.6113140078611051 distance: 11.775681155103799 vec: 001461E4-86C64780-A0B495C4-D19BA86F__201603 = [13.000, 15.000, 15.000]Key: 0: Value: wt: 0.7140480784137244 distance: 6.885358615591935 vec: 001461E4-86C64780-A0B495C4-D19BA86F__201604 = [5.000, 6.000, 6.000]Key: 0: Value: wt: 0.7643111018595771 distance: 6.010195419417895 vec: 001461E4-86C64780-A0B495C4-D19BA86F__201605 = [2.000, 4.000, 4.000]Key: 0: Value: wt: 0.7408819961153278 distance: 7.529533687488249 vec: 001641C0-75CC4BC2-9E31CF60-C15627D2__201603 = [6.000, 6.000]Key: 0: Value: wt: 0.7511412095733683 distance: 7.989789402348321 vec: 001641C0-75CC4BC2-9E31CF60-C15627D2__201604 = [1.000, 1.000]Key: 0: Value: wt: 0.6648742191066574 distance: 9.264811638337692 vec: 001641C0-75CC4BC2-9E31CF60-C15627D2__201605 = [12.000, 12.000]Key: 0: Value: wt: 0.53656917576395 distance: 17.373449130609547 vec: 001641C0-75CC4BC2-9E31CF60-C15627D2__201606 = [18.000, 18.000]Key: 1: Value: wt: 0.5948320024451352 distance: 23.202011407059803 vec: 001641C0-75CC4BC2-9E31CF60-C15627D2__201608 = [2.000, 1.000, 4.000, 16.000, 2.000, 13.000, 5.000, 10.000, 35.000]Count: 10#将类与点结合输出[root@drguo home]# mahout clusterdump -i file:///home/guo/Desktop/output/clusters-2-final -p file:///home/guo/Desktop/output/clusteredPoints -o /home/guo/Desktop/cluster-pointVL-0{n=7 c=[1.714, 2.286, 4.429, 0.857, 7.571] r=[2.185, 2.711, 6.884, 2.100, 5.233]} Weight : [props - optional]: Point: 0.7140480784137244 : [distance=6.885358615591935]: 001461E4-86C64780-A0B495C4-D19BA86F__201601 = [5.000, 6.000, 6.000] 0.7140480784137244 : [distance=6.885358615591935]: 001461E4-86C64780-A0B495C4-D19BA86F__201604 = [5.000, 6.000, 6.000] 0.7643111018595771 : [distance=6.010195419417895]: 001461E4-86C64780-A0B495C4-D19BA86F__201605 = [2.000, 4.000, 4.000] 0.7408819961153278 : [distance=7.529533687488249]: 001641C0-75CC4BC2-9E31CF60-C15627D2__201603 = [6.000, 6.000] 0.7511412095733683 : [distance=7.989789402348321]: 001641C0-75CC4BC2-9E31CF60-C15627D2__201604 = [1.000, 1.000] 0.6648742191066574 : [distance=9.264811638337692]: 001641C0-75CC4BC2-9E31CF60-C15627D2__201605 = [12.000, 12.000] 0.53656917576395 : [distance=17.373449130609547]: 001641C0-75CC4BC2-9E31CF60-C15627D2__201606 = [18.000, 18.000]VL-1{n=3 c=[0.667, 8.667, 11.333, 5.333, 0.667, 4.333, 1.667, 3.333, 21.667] r=[0.943, 5.437, 5.185, 7.542, 0.943, 6.128, 2.357, 4.714, 9.428]} Weight : [props - optional]: Point: 0.6106543697821432 : [distance=11.445523142259598]: 001461E4-86C64780-A0B495C4-D19BA86F__201602 = [12.000, 15.000, 15.000] 0.6113140078611051 : [distance=11.775681155103799]: 001461E4-86C64780-A0B495C4-D19BA86F__201603 = [13.000, 15.000, 15.000] 0.5948320024451352 : [distance=23.202011407059803]: 001641C0-75CC4BC2-9E31CF60-C15627D2__201608 = [2.000, 1.000, 4.000, 16.000, 2.000, 13.000, 5.000, 10.000, 35.000]
最后贴一下参数选项
seqdumper
Job-Specific Options: --input (-i) input Path to job input directory. --output (-o) output The directory pathname for output. --substring (-b) substring The number of chars to print out per value --count (-c) Report the count only --numItems (-n) numItems Output at most <n> key value pairs --facets (-fa) Output the counts per key. Note, if there are a lot of unique keys, this can take up a fair amount of memory --quiet (-q) Print only file contents. --help (-h) Print out help --tempDir tempDir Intermediate output directory --startPhase startPhase First phase to run --endPhase endPhase Last phase to run
clusterdump
Job-Specific Options: --input (-i) input Path to job input directory. --output (-o) output The directory pathname for output. --outputFormat (-of) outputFormat The optional output format for the results. Options: TEXT, CSV, JSON or GRAPH_ML --substring (-b) substring The number of chars of the asFormatString() to print --numWords (-n) numWords The number of top terms to print --pointsDir (-p) pointsDir The directory containing points sequence files mapping input vectors to their cluster. If specified, then the program will output the points associated with a cluster --samplePoints (-sp) samplePoints Specifies the maximum number of points to include _per_ cluster. The default is to include all points --dictionary (-d) dictionary The dictionary file --dictionaryType (-dt) dictionaryType The dictionary file type (text|sequencefile) --evaluate (-e) Run ClusterEvaluator and CDbwEvaluator over the input. The output will be appended to the rest of the output at the end. --distanceMeasure (-dm) distanceMeasure The classname of the DistanceMeasure. Default is SquaredEuclidean --help (-h) Print out help --tempDir tempDir Intermediate output directory --startPhase startPhase First phase to run --endPhase endPhase Last phase to run
0 0
- Mahout K-Means输出结果解析
- Mahout K-means聚类
- Mahout----K-means学习
- Mahout K-means聚类
- Mahout Canopy+K-Means
- mahout k-means
- mahout k-means实战
- mahout 源码解析之聚类--K-Means,FuzzyKMeans
- Mahout实现k-means算法
- mahout中k-means例子的运行
- Mahout之k-means算法源码分析
- mahout下的K-Means Clustering实现
- mahout测试k-Means聚类算法
- Mahout clustering Canopy+K-means 源码分析
- mahout贝叶斯分类结果解析
- 实战Mahout聚类算法Canopy+K-means
- 利用Mahout实现在Hadoop上运行K-Means算法
- 实战Mahout聚类算法Canopy+K-means
- 悲观锁乐观锁
- 机器学习实战---书中谬误讨论
- 查看 MySQL 数据库中每个表占用的空间大小
- leetcode 409. Longest Palindrome
- 【ES6学习】— (2)异步编程Generator函数和Promise对象简介
- Mahout K-Means输出结果解析
- Poj 3208 Apocalypse Someday
- 主键与外键篇
- Retrofit 2.0 轻松实现多文件/图片上传
- 网站架构(页面静态化,图片服务器分离,负载均衡)方案全解析
- Action与Func委托
- eclipse启动tomcat时servlet-api.jar jar not loaded
- 如何制作校园平面图及路线导图
- restful风格,restcontroller与controller详解