IDEA+MR实现ALS

来源:互联网 发布:开淘宝店要加盟费吗 编辑:程序博客网 时间:2024/04/20 18:57
1.环境
导入spark-1.4.1-bin-hadoop2.6压缩包lib目录下的spark-assembly-1.4.1-hadoop2.6.0


2.IDEA代码
package demo


import org.apache.spark.{SparkContext, SparkConf}
import org.apache.spark.mllib.recommendation._
/**
 * Created by tipdm101 on 2016/12/7.
 */
object ALSTrainer {
  def main(args:Array[String])={
    if(args.length!=3){
      println("Usage:demo.ALSTrainer <input> <output> <rank> <iteration> <lambda>")
      System.exit(1)
    }
    val input = args(0)
    val output =args(1)
    val rank = args(2).toInt
    val iteration = args(3).toInt
    val lambda = args(4).toDouble


//    //初始化SparkContext
    val sc = new SparkContext(new SparkConf().setAppName("ALS Model Trainer"))
//    //数据加载并分割
    val original = sc.textFile(input).map{x => val f = x.split("::");(f(3).toInt,(f(0),f(1),f(2)))}.sortByKey()
////
    val splitNum = (original.count * 0.05).toInt
    val splitTimeStamp = original.take(splitNum).toList.last._1
    val train = original.filter(x => x._1 > splitTimeStamp).map(x => Rating(x._2._1.toInt,x._2._2.toInt,x._2._3.toDouble))
    val test = original.filter(x => x._1 <= splitTimeStamp).map(x => (x._2._1.toInt,x._2._2.toInt,x._2._3.toDouble))
//    //建立模型
    val model = ALS.train(train,rank,iteration,lambda)
    def computeRMSE(model:MatrixFactorizationModel,test:org.apache.spark.rdd.RDD[(Int,Int,Double)]):Double =
    {Math.sqrt(model.predict(test.map(x =>(x._1,x._2)))
      .map(x => ((x.user,x.product),x.rating)).
      join(test.map(x =>((x._1,x._2),x._3))).
      map(x =>(x._2._1-x._2._2)*(x._2._1-x._2._2)).sum/test.count)}
    val rmse = computeRMSE(model,test)
//
//
     model.save(sc,output + "/model")
    sc.parallelize(List(rmse),1).saveAsTextFile(output + "/rmse")
    sc.stop()
  }
}


3.打jar包,右上角点击IDEA图标进入,点击Artifacts,新建als.jar 只加入'als' compile output,OK退出
主页面上方找到Build,Build Artifacts,出现als-选build,
out文件夹中找到als.jar,右键show in explorer


4.从show in explorer打开的目录中将als.jar拖入shell的/opt目录下


5.只打开hadoop集群,用MR实现ALS(spark集群用不到)


./spark-submit --master yarn --class demo.ALSTrainer /opt/als.jar /root/ratings.dat /root/als_output 10 10 0.01









0 0
原创粉丝点击