mahout-0.6运行canopy聚类算法
来源:互联网 发布:网络兼职写手 编辑:程序博客网 时间:2024/05/20 05:30
1、将文本文件向量化
01.mahout org.apache.mahout.clustering.conversion.InputDriver -i /mahout/input/p04-17.txt -o /mahout/output/vectorfiles -v org.apache.mahout.math.RandomAccessSparseVector
[root@masterclone ~]# hadoop fs -ls /mahout/output/vectorfilesWarning: $HADOOP_HOME is deprecated.Found 3 items-rw-r--r-- 1 root supergroup 0 2014-05-12 06:58 /mahout/output/vectorfiles/_SUCCESSdrwxr-xr-x - root supergroup 0 2014-05-12 06:58 /mahout/output/vectorfiles/_logs-rw-r--r-- 1 root supergroup 56430 2014-05-12 06:58 /mahout/output/vectorfiles/part-m-00000
详细步骤:http://blog.csdn.net/panguoyuan/article/details/25655763
2、运行canopy聚类算法
mahout canopy -i /mahout/output/vectorfiles -o /mahout/output/canopy-result -t1 1 -t2 2 -ow
[root@masterclone ~]# mahout canopy -i /mahout/output/vectorfiles -o /mahout/output/canopy-result -t1 1 -t2 2 -owMAHOUT_LOCAL is not set; adding HADOOP_CONF_DIR to classpath.Running on hadoop, using HADOOP_HOME=/usr/lib/hadoopHADOOP_CONF_DIR=/usr/lib/hadoop/confMAHOUT-JOB: /root/mahout/mahout-distribution-0.6/mahout-examples-0.6-job.jarWarning: $HADOOP_HOME is deprecated.14/05/12 16:23:17 INFO common.AbstractJob: Command line arguments: {--distanceMeasure=org.apache.mahout.common.distance.SquaredEuclideanDistanceMeasure, --endPhase=2147483647, --input=/mahout/output/vectorfiles, --method=mapreduce, --output=/mahout/output/canopy-result, --overwrite=null, --startPhase=0, --t1=1, --t2=2, --tempDir=temp}14/05/12 16:23:17 INFO canopy.CanopyDriver: Build Clusters Input: /mahout/output/vectorfiles Out: /mahout/output/canopy-result Measure: org.apache.mahout.common.distance.SquaredEuclideanDistanceMeasure@6d79953c t1: 1.0 t2: 2.014/05/12 16:23:19 INFO input.FileInputFormat: Total input paths to process : 114/05/12 16:23:19 INFO mapred.JobClient: Running job: job_201405121559_000514/05/12 16:23:20 INFO mapred.JobClient: map 0% reduce 0%14/05/12 16:23:31 INFO mapred.JobClient: map 100% reduce 0%14/05/12 16:23:39 INFO mapred.JobClient: map 100% reduce 33%14/05/12 16:23:41 INFO mapred.JobClient: map 100% reduce 100%14/05/12 16:23:43 INFO mapred.JobClient: Job complete: job_201405121559_000514/05/12 16:23:43 INFO mapred.JobClient: Counters: 2914/05/12 16:23:43 INFO mapred.JobClient: Job Counters 14/05/12 16:23:43 INFO mapred.JobClient: Launched reduce tasks=114/05/12 16:23:43 INFO mapred.JobClient: SLOTS_MILLIS_MAPS=1007114/05/12 16:23:43 INFO mapred.JobClient: Total time spent by all reduces waiting after reserving slots (ms)=014/05/12 16:23:43 INFO mapred.JobClient: Total time spent by all maps waiting after reserving slots (ms)=014/05/12 16:23:43 INFO mapred.JobClient: Launched map tasks=114/05/12 16:23:43 INFO mapred.JobClient: Data-local map tasks=114/05/12 16:23:43 INFO mapred.JobClient: SLOTS_MILLIS_REDUCES=1014514/05/12 16:23:43 INFO mapred.JobClient: File Output Format Counters 14/05/12 16:23:43 INFO mapred.JobClient: Bytes Written=21014/05/12 16:23:43 INFO mapred.JobClient: FileSystemCounters14/05/12 16:23:43 INFO mapred.JobClient: FILE_BYTES_READ=3814/05/12 16:23:43 INFO mapred.JobClient: HDFS_BYTES_READ=5655714/05/12 16:23:43 INFO mapred.JobClient: FILE_BYTES_WRITTEN=10866214/05/12 16:23:43 INFO mapred.JobClient: HDFS_BYTES_WRITTEN=21014/05/12 16:23:43 INFO mapred.JobClient: File Input Format Counters 14/05/12 16:23:43 INFO mapred.JobClient: Bytes Read=5643014/05/12 16:23:43 INFO mapred.JobClient: Map-Reduce Framework14/05/12 16:23:43 INFO mapred.JobClient: Map output materialized bytes=3814/05/12 16:23:43 INFO mapred.JobClient: Map input records=180014/05/12 16:23:43 INFO mapred.JobClient: Reduce shuffle bytes=3814/05/12 16:23:43 INFO mapred.JobClient: Spilled Records=214/05/12 16:23:43 INFO mapred.JobClient: Map output bytes=3014/05/12 16:23:43 INFO mapred.JobClient: CPU time spent (ms)=140014/05/12 16:23:43 INFO mapred.JobClient: Total committed heap usage (bytes)=17603379214/05/12 16:23:43 INFO mapred.JobClient: Combine input records=014/05/12 16:23:43 INFO mapred.JobClient: SPLIT_RAW_BYTES=12714/05/12 16:23:43 INFO mapred.JobClient: Reduce input records=114/05/12 16:23:43 INFO mapred.JobClient: Reduce input groups=114/05/12 16:23:43 INFO mapred.JobClient: Combine output records=014/05/12 16:23:43 INFO mapred.JobClient: Physical memory (bytes) snapshot=25711411214/05/12 16:23:43 INFO mapred.JobClient: Reduce output records=114/05/12 16:23:43 INFO mapred.JobClient: Virtual memory (bytes) snapshot=210033868814/05/12 16:23:43 INFO mapred.JobClient: Map output records=114/05/12 16:23:43 INFO driver.MahoutDriver: Program took 26551 ms (Minutes: 0.44251666666666667)
3、查看输出目录
[root@masterclone ~]# hadoop fs -ls /mahout/output/canopy-resultWarning: $HADOOP_HOME is deprecated.Found 1 itemsdrwxr-xr-x - root supergroup 0 2014-05-12 16:23 /mahout/output/canopy-result/clusters-0-final[root@masterclone ~]#
0 0
- mahout-0.6运行canopy聚类算法
- mahout之canopy聚类算法
- Mahout聚类算法canopy源码分析(1)
- Mahout源码canopy聚类算法分析(2)
- Mahout源码canopy聚类算法分析(3)
- mahout 源码解析之聚类--Canopy算法
- 实战Mahout聚类算法Canopy+K-means
- 实战Mahout聚类算法Canopy+K-means
- 实战Mahout聚类算法Canopy+K-means
- Canopy聚类算法与Mahout中的实现
- Mahout系列之Canopy聚类算法分析
- 实战Mahout聚类算法Canopy+K-means
- mahout中的聚类算法(Canopy的主场)
- Mahout 聚类算法学习之Canopy(一)
- Mahout学习之运行canopy算法错误及解决办法
- Mahout之聚类Canopy分析
- mahout之canopy算法简介
- Mahout 系列之--canopy 算法
- [LeetCode43] Multiply Strings
- Android计算器实现
- 万年历
- vim使用笔记+常用插件
- day52(5.2)
- mahout-0.6运行canopy聚类算法
- day54(5..4)
- linux 共享内存(shmget,shmat,shmdt,shmctl)解析
- C#调用Matlab生成的dll方法的详细说明
- 教师兼干部类
- nginx和tomcat实现反向代理、负载均衡和session共享
- 文字
- 第十一周(项目二)——职员有薪水了。
- mahout0.6_fuzzy-kmeans模糊聚类算法