spark之MLlib机器学习-Kmeans

来源:互联网 发布:java租车系统界面 编辑:程序博客网 时间:2024/05/16 23:53

1、构建SBT项目环境
mkdir -p ~/kmeans/src/main/scala
2、编写kmeans.sbt

name := "Kmeans Project"version := "1.0"scalaVersion := "2.11.8"libraryDependencies ++=Seq( "org.apache.spark" %% "spark-core" % "2.0.0",                            "org.apache.spark" %% "spark-mllib" % "2.0.0")

当时,忘记添加mllib库,出现报错:“error object mllib is not a member of package org.apache.spark

3、编写scala源代码 kmeans_test.scala

import org.apache.spark.mllib.clustering.KMeansimport org.apache.spark.mllib.linalg.Vectorsimport org.apache.spark.SparkConfimport org.apache.spark.SparkContextimport org.apache.spark.SparkContext._object kmeans_test{  def main(args: Array[String]) {  val conf = new SparkConf().setAppName("Kmeans Test")  val sc = new SparkContext(conf)  val data=sc.textFile("file:///usr/spark2.0/data/mllib/kmeans_data.txt")  val parsedData=data.map(s=>Vectors.dense(s.split(" ").map(_.toDouble))).cache()  val numClusters=2  val numIterations=20  val clusters=KMeans.train(parsedData,numClusters,numIterations)  val WSSSE=clusters.computeCost(parsedData)  println("Within Set Sum of Squared Errors="+WSSSE)  sc.stop()  }}

4、将scala源码拷贝至~/kmeans/src/main/scala/目录下
6、最终工程目录如下:

find ../kmeans.sbt/src/src/main/src/main/scala/src/main/scala/kmean_test.scala

5、进入kmeans目录,执行编译操作

  cd ~/kmeans  sbt complile

4、编译完成后执行打包

sbt package

5、打包完成后使用spark-submit工具提交任务

spark-submit --class kmeans_test target/scala-2.11/kmeans-project_2.11-1.0.jar

6、结果输出如下:
这里写图片描述

0 0
原创粉丝点击