mahout 安装 配置 测试
来源:互联网 发布:textedit for mac 编辑:程序博客网 时间:2024/04/29 16:57
mahout是一个基于Map/Reduce的机器学习算法库,运行在hadoop集群上
1.到cloudera网站下载mahout cdh版本,解压到某一目录。
2.安装并运行hadoop集群。
3.执行bin/mahout --help 看是否列出很多命令,检查Mahout是否安装完好。
4.测试
测试代码如下:
package mahout;
import java.io.IOException;
import java.util.ArrayList;
import java.util.List;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.SequenceFile;
import org.apache.hadoop.io.Writable;
import org.apache.hadoop.util.ReflectionUtils;
import org.apache.mahout.clustering.kmeans.KMeansDriver;
import org.apache.mahout.clustering.kmeans.RandomSeedGenerator;
import org.apache.mahout.common.distance.DistanceMeasure;
import org.apache.mahout.common.distance.EuclideanDistanceMeasure;
import org.apache.mahout.math.RandomAccessSparseVector;
import org.apache.mahout.math.Vector;
import org.apache.mahout.math.VectorWritable;
public class SimpleKMeansClustering {
public static final double[][] points = { { 1, 1 }, { 2, 1 }, { 1, 2 },
{ 2, 2 }, { 3, 3 }, { 8, 8 }, { 9, 8 }, { 8, 9 }, { 9, 9 } };
// 转化为SequenceFile
public static void writePointsToFile(List<Vector> points,Path path,
FileSystem fs, Configuration conf) throws IOException {
// Path path = new Path(fileName);
SequenceFile.Writer writer = new SequenceFile.Writer(fs, conf, path,
LongWritable.class, VectorWritable.class);
long recNum = 0;
VectorWritable vec = new VectorWritable();
for (Vector point : points) {
vec.set(point);
writer.append(new LongWritable(recNum++), vec);
}
writer.close();
}
//转化为向量
public static List<Vector> getPoints(double[][] raw) {
List<Vector> points = new ArrayList<Vector>();
for (int i = 0; i < raw.length; i++) {
double[] fr = raw[i];
Vector vec = new RandomAccessSparseVector(fr.length);
vec.assign(fr);
points.add(vec);
}
return points;
}
public static void main(String[] args) throws IOException,
InterruptedException, ClassNotFoundException {
Path inPath = new Path("/test/testdata/points/part-000000");
Path inClusPath = new Path("/test/testdata/clusters/part-000000");
Path outPath = new Path("/test/testdata/output/");
DistanceMeasure measure = new EuclideanDistanceMeasure();
int k = 2;
List<Vector> vectors = getPoints(points);
Configuration conf = new Configuration();
FileSystem fs = FileSystem.get(conf);
writePointsToFile(
vectors,
inPath,
fs, conf);
Path clusters = RandomSeedGenerator.buildRandom(conf,
inPath,
inClusPath,k, measure);
KMeansDriver
.run(inPath,
clusters,
outPath,
new EuclideanDistanceMeasure(), 0.001, 10, true, false);
SequenceFile.Reader reader = new SequenceFile.Reader(fs, new Path(
"/test/testdata/output/clusteredPoints/part-m-00000"), conf);
Writable key = (Writable)ReflectionUtils.newInstance(reader.getKeyClass(), conf);
Writable value = (Writable)ReflectionUtils.newInstance(reader.getValueClass(), conf);
while (reader.next(key, value)) {
System.out.println(key.toString() + " belongs to cluster "
+ value.toString());
}
reader.close();
System.out.println("end end");
}
}
打包mahout.jar 。执行命令 hadoop jar mahout.jar mahout.SimpleKMeansClustering
运行结果:
1 belongs to cluster 1.0: [1.000, 1.000]
1 belongs to cluster 1.0: [2.000, 1.000]
1 belongs to cluster 1.0: [1.000, 2.000]
1 belongs to cluster 1.0: [2.000, 2.000]
1 belongs to cluster 1.0: [3.000, 3.000]
5 belongs to cluster 1.0: [8.000, 8.000]
5 belongs to cluster 1.0: [9.000, 8.000]
5 belongs to cluster 1.0: [8.000, 9.000]
5 belongs to cluster 1.0: [9.000, 9.000]
5,。如果看到运行结果证明mahout一切正常。
- mahout 安装 配置 测试
- Mahout学习之Mahout简介、安装、配置、程序测试
- Mahout学习之Mahout简介、安装、配置、入门程序测试
- Mahout学习之Mahout简介、安装、配置、入门程序测试
- Mahout学习之Mahout简介、安装、配置、入门程序测试
- Mahout学习之Mahout安装、配置、入门程序测试
- Mahout学习之Mahout简介、安装、配置、入门程序测试
- Mahout学习之Mahout简介、安装、配置、入门程序测试
- Mahout学习之Mahout简介、安装、配置、入门程序测试
- Mahout学习之Mahout简介、安装、配置、入门程序测试
- Mahout学习之Mahout简介、安装、配置、入门程序测试
- Mahout学习之Mahout简介、安装、配置、入门程序测试
- Mahout学习之Mahout简介、安装、配置、入门程序测试
- Mahout学习之Mahout简介、安装、配置、入门程序测试
- Mahout学习之Mahout简介、安装、配置、入门程序测试
- Mahout 安装配置及一个简单测试
- mahout安装测试
- Mahout安装及测试
- 控制input输入框允许输入的值
- C#使用Dotfuscator混淆代码的加密方法
- Objective-C Enum 枚举数据类型解析
- 慎用USES_CONVERSION
- 62域(二),用法四 PBOC借/贷记IC卡终端专用参数信息(PBOC IC Configation TableMessage)
- mahout 安装 配置 测试
- stm32启动代码分析_02[转]
- LTIB(七) 一个实例
- 编辑距离:动态规划【用最少的字符操作将字符串A 转换为字符串B】
- hibernate--Xml关系映射
- js技术的利与弊
- MySQL备份和恢复具体实施(上)
- NGINX模块开发
- 设计模式之桥接模式