LocalKMeans

来源:互联网 发布:labview编程样式 pdf 编辑:程序博客网 时间:2024/05/22 04:27
1. 生成N个元素的数组,元素类型为 vector  ,其中vector的大小为D,随机生成D个double作为vector中的元素;(Array-->Vector-->Double)
 def generateData = {    def generatePoint(i: Int) = {      DenseVector.fill(D){rand.nextDouble * R}    }    Array.tabulate(N)(generatePoint)  }

2.计算给出K个中心点,p离哪个中心点最近,返回最近中心点index

 def closestPoint(p: Vector[Double], centers: HashMap[Int, Vector[Double]]): Int = {    var index = 0    var bestIndex = 0    var closest = Double.PositiveInfinity    for (i <- 1 to centers.size) {      val vCurr = centers.get(i).get      val tempDist = squaredDistance(p, vCurr)      if (tempDist < closest) {        closest = tempDist        bestIndex = i      }    }    bestIndex  }
3. 随机取Array1~N中的k个Vector作为初始中心点

 while (points.size < K) {      points.add(data(rand.nextInt(N)))    }    val iter = points.iterator    for (i <- 1 to points.size) {      kPoints.put(i, iter.next())    }

4. 计算每一个Vector的最近中心点,返回 (index,(vector, 1(计数)))
var closest = data.map (p => (closestPoint(p, kPoints), (p, 1)))
5. 按 index分组
var mappings = closest.groupBy[Int] (x => x._1)
6. 计算 属于一类中心点所有的vector 之和以及个数,pair._2是(index,(vector,1)) 返回(index, (sum(vector), totalCounts)),reduceLeft左叠加
 var pointStats = mappings.map { pair =>        pair._2.reduceLeft [(Int, (Vector[Double], Int))] {          case ((id1, (x1, y1)), (id2, (x2, y2))) => (id1, (x1 + x2, y1 + y2))        }      }

7. 重新计算新的K个中心点  (index, sum(vector)/totalNum)
  var newPoints = pointStats.map {mapping =>        (mapping._1, mapping._2._1 * (1.0 / mapping._2._2))}
8. 计算新的K个中心点与上次的中心点距离 是否达到收敛,否则重复计算
 tempDist = 0.0      for (mapping <- newPoints) {        tempDist += squaredDistance(kPoints.get(mapping._1).get, mapping._2)      }

9.更新K个索引点

 for (newP <- newPoints) {        kPoints.put(newP._1, newP._2)      }


0 0
原创粉丝点击