spark中做svd计算
来源:互联网 发布:电脑解压缩软件 编辑:程序博客网 时间:2024/06/05 13:13
转载自:http://blog.csdn.net/jiapengxmu/article/details/41983341
import org.apache.spark.mllib.linalg.distributed.RowMatrix
import org.apache.spark.mllib.linalg._
import org.apache.spark.{SparkConf, SparkContext}
// To use the latest sparse SVD implementation, please build your spark-assembly after this
// change: https://github.com/apache/spark/pull/1378
// Input tsv with 3 fields: rowIndex(Long), columnIndex(Long), weight(Double), indices start with 0
// Assume the number of rows is larger than the number of columns, and the number of columns is
// smaller than Int.MaxValue
// sc is a SparkContext defined in the job
val inputData = sc.textFile("hdfs://...").map{ line =>
val parts = line.split("\t")
(parts(0).toLong, parts(1).toInt, parts(2).toDouble)
}
// Number of columns
val nCol = inputData.map(_._2).distinct().count().toInt
// Construct rows of the RowMatrix
val dataRows = inputData.groupBy(_._1).map[(Long, Vector)]{ row =>
val (indices, values) = row._2.map(e => (e._2, e._3)).unzip
(row._1, new SparseVector(nCol, indices.toArray, values.toArray))
}
// Compute 20 largest singular values and corresponding singular vectors
val svd = new RowMatrix(dataRows.map(_._2).persist()).computeSVD(20, computeU = true)
// Write results to hdfs
val V = svd.V.toArray.grouped(svd.V.numRows).toList.transpose
sc.makeRDD(V, 1).zipWithIndex()
.map(line => line._2 + "\t" + line._1.mkString("\t")) // make tsv line starting with column index
.saveAsTextFile("hdfs://...output/right_singular_vectors")
svd.U.rows.map(row => row.toArray).zip(dataRows.map(_._1))
.map(line => line._2 + "\t" + line._1.mkString("\t")) // make tsv line starting with row index
.saveAsTextFile("hdfs://...output/left_singular_vectors")
sc.makeRDD(svd.s.toArray, 1)
.saveAsTextFile("hdfs://...output/singular_values")
- spark中做svd计算
- spark读取hbase数据做分布式计算
- Spark上如何做分布式AUC计算
- spark | 做基本计算和批量提交
- SVD 详解 与 spark实战
- spark中 进行高维矩阵的SVD分解(1)
- spark中 进行高维矩阵的SVD分解(2)
- spark使用java读取hbase数据做分布式计算
- 【spark】spark计算Pi
- 在Vim中做简单计算
- 利用SVD分解做协同过滤推荐
- 大矩阵 svd 近似计算算法
- SVD
- SVD
- svd
- svd++
- SVD
- SVD
- SECURITY_ATTRIBUTES 设置低权限
- 开发者日记(2016年01月06日17:39:15):关于android bluetooth 学习:
- Debian命令行查看当前使用的无线热点的参数
- 如何提升Android开发效率
- java中File类的基本用法
- spark中做svd计算
- weakSelf 和 strongSelf
- 互联网产品快速迭代下是否需要写详细测试用例
- java递归之求0到100甚至更大区间的和
- java中InputStream的用法
- NSURLSession学习笔记(一)简介
- Oracle 中查询数据库中表的字段名称
- Python之pandas数据加载、存储
- java中从键盘输入字符存入文件中