Mahout源码分析之DistributedLanczosSolver(6)--完结篇
来源:互联网 发布:unix网络编程卷3 pdf 编辑:程序博客网 时间:2024/05/22 01:36
Mahout版本:0.7,hadoop版本:1.0.4,jdk:1.7.0_25 64bit。
接上篇,分析完3个Job后得到继续往下:其实就剩下两个函数了:
List<Map.Entry<MatrixSlice, EigenStatus>> prunedEigenMeta = pruneEigens(eigenMetaData); saveCleanEigens(new Configuration(), prunedEigenMeta);看pruneEigens函数:
private List<Map.Entry<MatrixSlice, EigenStatus>> pruneEigens(Map<MatrixSlice, EigenStatus> eigenMetaData) { List<Map.Entry<MatrixSlice, EigenStatus>> prunedEigenMeta = Lists.newArrayList(); for (Map.Entry<MatrixSlice, EigenStatus> entry : eigenMetaData.entrySet()) { if (Math.abs(1 - entry.getValue().getCosAngle()) < maxError && entry.getValue().getEigenValue() > minEigenValue) { prunedEigenMeta.add(entry); } }看到这里其实是做筛选的,三个job生成了三个eigenStatus,每个eigenStatus都有一个cosAngle和eigenValue,用这两个参数来判断是否应该保留,这三个总结如下:
第一个;resultantVector:[-285.43017035605783, -61.30237570857193, -68.94124551381431, -520.2302762811703, -3232.201254912267, -32.31785150049481, -37.63572264009423, -12.025276244275622, -28.58260635344015, -6.8801603142200065, -28.491567864130573, -68.13521243410383, 4382.173720122737]vector:[0.01671441233225078, 0.0935655369363106, 0.09132650234523473, -0.0680324702834075, -0.9461123439509093, 0.10210271255992123, 0.10042714365337412, 0.11137954332150339, 0.10331974823993555, 0.10621406378767596, 0.10586960137353602, 0.09262650242313884, 0.09059904726143547]eigenValue=newNorm/oldNorm=5479.061620543984/1=5479.061620543984;cosAngle=resultantVector.dot(vector) / newNorm * oldNorm=0.6300724679092792第二个:resultantVector:vector:[0.01180448947054423, 0.001703710024210367, 0.002100735590662567, 0.014221147454610283, 0.09654151173375553, 0.0025666815984826535, 0.0026147055494762234, 1.753144283209579E-4, 0.0017595900141802873, 0.0049406361794682024, 7.881250692924197E-4, 0.002873479530226361, 0.9951286321096425]eigenValue:6433335.386819993cosAngle=0.9999998030863401第三个:vector:[-0.2883450858059115, -0.29170231535763447, -0.29157035465385267, -0.28754185317979386, -0.26018076078737895, -0.2914154866344813, -0.2913995247546756, -0.2922103132689348, -0.2916837423401091, -0.29062644748002026, -0.2920066313645422, -0.2913135151887795, 0.03848561950058266]eigenValue=1442.6143913921014cosAngle=0.3671147029085018
可以看到只有第二个可以通过筛选,得到的prunedEigenMeta如下:
看下一个函数saveCleanEigens:
private void saveCleanEigens(Configuration conf, Collection<Map.Entry<MatrixSlice, EigenStatus>> prunedEigenMeta) throws IOException { Path path = new Path(outPath, CLEAN_EIGENVECTORS); FileSystem fs = FileSystem.get(path.toUri(), conf); SequenceFile.Writer seqWriter = new SequenceFile.Writer(fs, conf, path, IntWritable.class, VectorWritable.class); try { IntWritable iw = new IntWritable(); int numEigensWritten = 0; for (Map.Entry<MatrixSlice, EigenStatus> pruneSlice : prunedEigenMeta) { MatrixSlice s = pruneSlice.getKey(); EigenStatus meta = pruneSlice.getValue(); EigenVector ev = new EigenVector(s.vector(), meta.getEigenValue(), Math.abs(1 - meta.getCosAngle()), s.index()); //log.info("appending {} to {}", ev, path); Writable vw = new VectorWritable(ev); iw.set(s.index()); seqWriter.append(iw, vw); // increment the number of eigenvectors written and see if we've // reached our specified limit, or if we wish to write all eigenvectors // (latter is built-in, since numEigensWritten will always be > 0 numEigensWritten++; if (numEigensWritten == maxEigensToKeep) { log.info("{} of the {} total eigens have been written", maxEigensToKeep, prunedEigenMeta.size()); break; } } } finally { Closeables.closeQuietly(seqWriter); } cleanedEigensPath = path; }看保存的ev是什么吧:
还不是筛选出来的那个值,不过这里的误差就是1-cosAngle了;
分享,成长,快乐
转载请注明blog地址:http://blog.csdn.net/fansy1990
- Mahout源码分析之DistributedLanczosSolver(6)--完结篇
- Mahout源码分析之DistributedLanczosSolver(5)
- Mahout源码分析之DistributedLanczosSolver(7)--总结篇
- Mahout源码分析之DistributedLanczosSolver(1)--实战
- Mahout源码分析之DistributedLanczosSolver(2)--Job1
- Mahout源码分析之DistributedLanczosSolver(4)--rawEigen
- Mahout源码分析DistributedLanczosSolver(3)--Job2
- Mahout之SequenceFilesFromDirectory源码分析
- Mahout贝叶斯算法源码分析(6)
- mahout源码分析之贝叶斯算法
- Mahout之k-means算法源码分析
- mahout源码KMeansDriver分析之KmeansDriver
- mahout源码KMeansDriver分析之五CIMapper
- mahout源码KMeansDriver分析之四
- mahout源码K均值算法分析(6)
- Mahout协同过滤算法源码分析(6)--并行思路
- Mahout协同过滤算法源码分析(5)--拓展篇
- Mahout关联规则源码分析(1)
- 升级Win8.1或安装系统更新后导致VS无法打开工程(或未找到与约束。。)的原因和解决方法
- 安卓自测试题——第三十一期
- apache基础
- Java中的MD5加密
- 安卓自测试题——第三十二期
- Mahout源码分析之DistributedLanczosSolver(6)--完结篇
- 如何让你的作业在 Hadoop集群中真正实现分布式运行
- ActionScript3.0解决AIR安装目录写入文件的限制
- 正则表达式验证各种字符串
- FFmpeg参数说明
- 为什么万年历中1752年9月少了11天呢?
- MFC 对话框
- FTPClient相关问题。。。
- JSP中读取 Properties文件中的值