IDEA+MR实现ALS
来源:互联网 发布:开淘宝店要加盟费吗 编辑:程序博客网 时间:2024/04/20 18:57
1.环境
导入spark-1.4.1-bin-hadoop2.6压缩包lib目录下的spark-assembly-1.4.1-hadoop2.6.0
2.IDEA代码
package demo
import org.apache.spark.{SparkContext, SparkConf}
import org.apache.spark.mllib.recommendation._
/**
* Created by tipdm101 on 2016/12/7.
*/
object ALSTrainer {
def main(args:Array[String])={
if(args.length!=3){
println("Usage:demo.ALSTrainer <input> <output> <rank> <iteration> <lambda>")
System.exit(1)
}
val input = args(0)
val output =args(1)
val rank = args(2).toInt
val iteration = args(3).toInt
val lambda = args(4).toDouble
// //初始化SparkContext
val sc = new SparkContext(new SparkConf().setAppName("ALS Model Trainer"))
// //数据加载并分割
val original = sc.textFile(input).map{x => val f = x.split("::");(f(3).toInt,(f(0),f(1),f(2)))}.sortByKey()
////
val splitNum = (original.count * 0.05).toInt
val splitTimeStamp = original.take(splitNum).toList.last._1
val train = original.filter(x => x._1 > splitTimeStamp).map(x => Rating(x._2._1.toInt,x._2._2.toInt,x._2._3.toDouble))
val test = original.filter(x => x._1 <= splitTimeStamp).map(x => (x._2._1.toInt,x._2._2.toInt,x._2._3.toDouble))
// //建立模型
val model = ALS.train(train,rank,iteration,lambda)
def computeRMSE(model:MatrixFactorizationModel,test:org.apache.spark.rdd.RDD[(Int,Int,Double)]):Double =
{Math.sqrt(model.predict(test.map(x =>(x._1,x._2)))
.map(x => ((x.user,x.product),x.rating)).
join(test.map(x =>((x._1,x._2),x._3))).
map(x =>(x._2._1-x._2._2)*(x._2._1-x._2._2)).sum/test.count)}
val rmse = computeRMSE(model,test)
//
//
model.save(sc,output + "/model")
sc.parallelize(List(rmse),1).saveAsTextFile(output + "/rmse")
sc.stop()
}
}
3.打jar包,右上角点击IDEA图标进入,点击Artifacts,新建als.jar 只加入'als' compile output,OK退出
主页面上方找到Build,Build Artifacts,出现als-选build,
out文件夹中找到als.jar,右键show in explorer
4.从show in explorer打开的目录中将als.jar拖入shell的/opt目录下
5.只打开hadoop集群,用MR实现ALS(spark集群用不到)
./spark-submit --master yarn --class demo.ALSTrainer /opt/als.jar /root/ratings.dat /root/als_output 10 10 0.01
导入spark-1.4.1-bin-hadoop2.6压缩包lib目录下的spark-assembly-1.4.1-hadoop2.6.0
2.IDEA代码
package demo
import org.apache.spark.{SparkContext, SparkConf}
import org.apache.spark.mllib.recommendation._
/**
* Created by tipdm101 on 2016/12/7.
*/
object ALSTrainer {
def main(args:Array[String])={
if(args.length!=3){
println("Usage:demo.ALSTrainer <input> <output> <rank> <iteration> <lambda>")
System.exit(1)
}
val input = args(0)
val output =args(1)
val rank = args(2).toInt
val iteration = args(3).toInt
val lambda = args(4).toDouble
// //初始化SparkContext
val sc = new SparkContext(new SparkConf().setAppName("ALS Model Trainer"))
// //数据加载并分割
val original = sc.textFile(input).map{x => val f = x.split("::");(f(3).toInt,(f(0),f(1),f(2)))}.sortByKey()
////
val splitNum = (original.count * 0.05).toInt
val splitTimeStamp = original.take(splitNum).toList.last._1
val train = original.filter(x => x._1 > splitTimeStamp).map(x => Rating(x._2._1.toInt,x._2._2.toInt,x._2._3.toDouble))
val test = original.filter(x => x._1 <= splitTimeStamp).map(x => (x._2._1.toInt,x._2._2.toInt,x._2._3.toDouble))
// //建立模型
val model = ALS.train(train,rank,iteration,lambda)
def computeRMSE(model:MatrixFactorizationModel,test:org.apache.spark.rdd.RDD[(Int,Int,Double)]):Double =
{Math.sqrt(model.predict(test.map(x =>(x._1,x._2)))
.map(x => ((x.user,x.product),x.rating)).
join(test.map(x =>((x._1,x._2),x._3))).
map(x =>(x._2._1-x._2._2)*(x._2._1-x._2._2)).sum/test.count)}
val rmse = computeRMSE(model,test)
//
//
model.save(sc,output + "/model")
sc.parallelize(List(rmse),1).saveAsTextFile(output + "/rmse")
sc.stop()
}
}
3.打jar包,右上角点击IDEA图标进入,点击Artifacts,新建als.jar 只加入'als' compile output,OK退出
主页面上方找到Build,Build Artifacts,出现als-选build,
out文件夹中找到als.jar,右键show in explorer
4.从show in explorer打开的目录中将als.jar拖入shell的/opt目录下
5.只打开hadoop集群,用MR实现ALS(spark集群用不到)
./spark-submit --master yarn --class demo.ALSTrainer /opt/als.jar /root/ratings.dat /root/als_output 10 10 0.01
0 0
- IDEA+MR实现ALS
- mahout0.9 分布式推荐算法ALS-MR
- ALS实现电影推荐
- mahout之旅---分布式推荐算法ALS-MR
- 使用Spark ALS实现协同过滤
- 如何使用Spark ALS实现协同过滤
- ALS 在 Spark MLlib 中的实现
- 如何使用Spark ALS实现协同过滤
- 如何使用Spark ALS实现协同过滤
- ALS 在 Spark MLlib 中的实现
- 如何使用Spark ALS实现协同过滤
- python 实现MR
- Bash 实现MR
- MR实现Join
- ItermCF的MR并行实现
- 倒排索引 mr实现
- Databricks孟祥瑞:ALS 在 Spark MLlib 中的实现
- ALS推荐算法理解及Spark编程实现
- c++实现waveinopen录音功能
- jap+java+struts2文件上传到服务器
- C#总结之流程控制
- 关于自动关闭微信浏览器问题
- 项目实战jquery倒计时
- IDEA+MR实现ALS
- 欢迎使用CSDN-markdown编辑器
- 12306火车订票系统谈网站架构优化
- CentOS最小系统联网指南
- caffe源码学习(1)-矩阵向量运算
- SN74LVC2G06用法
- js 在指定宽高元素中插入自适应图片
- jetty服务器作支付处理的问题
- 程序员应该掌握的英语词汇