spark向量、矩阵类型

来源:互联网 发布:activiti引擎源码分析 编辑:程序博客网 时间:2024/05/14 09:54

先来个普通的数组:

scala> var arr=Array(1.0,2,3,4)arr: Array[Double] = Array(1.0, 2.0, 3.0, 4.0)

可以将它转换成一个Vector:

scala> import org.apache.spark.mllib.linalg._scala> var vec=Vectors.dense(arr)vec: org.apache.spark.mllib.linalg.Vector = [1.0,2.0,3.0,4.0]

再做一个RDD[Vector]:

scala> val rdd=sc.makeRDD(Seq(Vectors.dense(arr),Vectors.dense(arr.map(_*10)),Vectors.dense(arr.map(_*100))))rdd: org.apache.spark.rdd.RDD[org.apache.spark.mllib.linalg.Vector] = ParallelCollectionRDD[6] at makeRDD at <console>:26

可以根据这个RDD做一个分布式的矩阵:

scala> import org.apache.spark.mllib.linalg.distributed._scala> val mat: RowMatrix = new RowMatrix(rdd)mat: org.apache.spark.mllib.linalg.distributed.RowMatrix = org.apache.spark.mllib.linalg.distributed.RowMatrix@3133b850scala> val m = mat.numRows()m: Long = 3scala> val n = mat.numCols()n: Long = 4

试试统计工具,算算平均值:

scala> var sum=Statistics.colStats(rdd)scala> sum.meanres7: org.apache.spark.mllib.linalg.Vector = [37.0,74.0,111.0,148.0]
2 0
原创粉丝点击