Mahout 定制ClusterDumper:只输出中心点
来源:互联网 发布:js 保留两位小数 编辑:程序博客网 时间:2024/05/17 23:31
hadoop1.0.4,mahout0.5。
mahout里面有实现读取聚类算法中的方法,叫做ClusterDumper,这个类输出的格式一般如下:
VL-2{n=6 c=[1.833, 2.417] r=[0.687, 0.344]}Weight: Point:1.0: [1.000, 3.000]...1.0: [3.000, 2.500]VL-11{n=7 c=[2.857, 4.714] r=[0.990, 0.364]}Weight: Point:1.0: [1.000, 5.000]...1.0: [4.000, 4.500]VL-14{n=8 c=[4.750, 3.438] r=[0.433, 0.682]}Weight: Point:1.0: [4.000, 3.000]...1.0: [5.000, 4.000]不过,如果我只想实现输出聚类中心的文件的话,那么就不行了。本来想继承ClusterDumper,结果ClusterDumper是一个final的,算了,还是自己写吧。
参考ClusterDumper中的源码,如下:
for (Cluster value : new SequenceFileDirValueIterable<Cluster>(new Path(seqFileDir, "part-*"), PathType.GLOB, conf)) { String fmtStr = value.asFormatString(dictionary); if (subString > 0 && fmtStr.length() > subString) { writer.write(':'); writer.write(fmtStr, 0, Math.min(subString, fmtStr.length())); } else { writer.write(fmtStr); }或者参考lz之前的一篇文章:mahout源码KMeansDriver分析之二中心点文件分析(无语篇),里面也有关于聚类中心的读取;
可以写一个ClusterCenterDump的类,如下:
package com.caic.cloud.util;import java.io.File;import java.io.FileNotFoundException;import java.io.IOException;import java.io.Writer;import org.apache.commons.logging.Log;import org.apache.commons.logging.LogFactory;import org.apache.hadoop.conf.Configuration;import org.apache.hadoop.fs.Path;import org.apache.mahout.clustering.Cluster;import org.apache.mahout.common.iterator.sequencefile.PathType;import org.apache.mahout.common.iterator.sequencefile.SequenceFileDirValueIterable;import com.google.common.base.Charsets;import com.google.common.io.Files;/** * just output the center vector to a given file * @author fansy * */public class ClusterCenterDump {private Log log=LogFactory.getLog(ClusterCenterDump.class);private Configuration conf;private Path centerPathDir;private String outputPath;/*public ClusterCenterDump(){}public ClusterCenterDump(Configuration conf){this.conf=conf;}*/public ClusterCenterDump(Configuration conf,String centerPathDir,String outputPath){this.conf=conf;this.centerPathDir=new Path(centerPathDir);this.setOutputPath(outputPath);}/** * write the given cluster center to the given file * @return * @throws FileNotFoundException */public boolean writeCenterToLocal() throws FileNotFoundException{if(this.conf==null||this.outputPath==null||this.centerPathDir==null){log.info("error:\nshould initial the configuration ,outputPath and centerPath");return false;}Writer writer=null;try {File outputFile=new File(outputPath);writer = Files.newWriter(outputFile, Charsets.UTF_8);this.writeTxtCenter(writer, new SequenceFileDirValueIterable<Cluster>(new Path(centerPathDir, "part-*"), PathType.GLOB, conf));//new SequenceFileDirValueIterable<Writable>(new Path(centerPathDir, "part-r-00000"), PathType.LIST,//PathFilters.partFilter(),conf));writer.flush();} catch (IOException e) {log.info("write error:\n"+e.getMessage());return false;}finally{try {if(writer!=null){writer.close();}} catch (IOException e) {log.info("close writer error:\n"+e.getMessage());}}return true;}/** * write the cluster to writer * @param writer * @param cluster * @return * @throws IOException */private boolean writeTxtCenter(Writer writer,Iterable<Cluster> clusters) throws IOException{for(Cluster cluster:clusters){String fmtStr = cluster.asFormatString(null);System.out.println("fmtStr:"+fmtStr);writer.write(fmtStr);writer.write("\n");}return true;}public Configuration getConf() {return conf;}public void setConf(Configuration conf) {this.conf = conf;}public Path getCenterPathDir() {return centerPathDir;}public void setCenterPathDir(Path centerPathDir) {this.centerPathDir = centerPathDir;}/** * @return the outputPath */public String getOutputPath() {return outputPath;}/** * @param outputPath the outputPath to set */public void setOutputPath(String outputPath) {this.outputPath = outputPath;}}
下面是一个测试类:
package fansy;import java.io.FileNotFoundException;import junit.framework.TestCase;import org.apache.hadoop.conf.Configuration;import com.caic.cloud.util.ClusterCenterDump;import com.caic.forecast.pub.util.SpringUtil;public class ClusterCenterDumpTest extends TestCase {public void testWrite() throws FileNotFoundException{SpringUtil.springWithoutWeb();Configuration conf=new Configuration ();conf.set("mapred.job.tracker", "master:9001");conf.set("fs.default.name", "master:9000");String centerPath="output/clusters-2";String outputPath="e:/a.txt";ClusterCenterDump cc=new ClusterCenterDump(conf,centerPath,outputPath);boolean flag=cc.writeCenterToLocal();System.out.println("done:"+flag);}}
这样在本地e:/a.txt中就可以生成类似下面的文件了:
VL-2{n=6 c=[1.833, 2.417] r=[0.687, 0.344]}VL-15{n=10 c=[4.600, 3.700] r=[0.490, 0.812]}VL-5{n=5 c=[2.400, 4.700] r=[0.800, 0.400]}
如果您觉得lz的blog或者资源还ok的话,可以选择给lz投一票,多谢。(投票地址:http://vote.blog.csdn.net/blogstaritem/blogstar2013/fansy1990 )
分享,成长,快乐
转载请注明blog地址:http://blog.csdn.net/fansy1990
1 0
- Mahout 定制ClusterDumper:只输出中心点
- ClusterDumper输出聚类中心点
- mahout源码KMeansDriver分析之三自动写入中心点文件
- mahout源码KMeansDriver分析之二中心点文件分析(无语篇)
- 定制输入/输出
- 定制mapreduce输出
- Python __len__定制输出
- Mahout K-Means输出结果解析
- mahout 使用grouplens数据集定制datamodel以及评估
- Mahout对于GroupLens数据定制的推荐引擎
- Mahout对于定制的GroupLens推荐进行评估
- 只输出大写字母
- springmvc fastjson定制化输出
- mahout
- Mahout
- mahout
- mahout
- mahout
- KVC
- hdu1372Knight Moves(基本bfs)
- 想爬得更高,需要知道的数学体系
- 微信公众平台开发(九) 数据库操作
- 深入理解Java:注解(Annotation)--注解处理器
- Mahout 定制ClusterDumper:只输出中心点
- 回调总结
- spring 监听器 IntrospectorCleanupListener简介
- “有电才‘型’2013主流智能手机耐力挑战赛” 挑战者七:联想P780
- AUPE学习第三章------文件I/O1
- MQTT的学习研究(1)MQTT学习网站
- linux字符驱动之定时器去抖动按键驱动
- 关于将EXCEL文件导入到MYSQL数据库的一些方法
- MQTT的学习研究(2)moquette-mqtt 的使用之mqtt broker的启动