Scala语言 + Spark MLLib进行机器学习---聚类

来源：互联网发布：淘宝手机卡禁售编辑：程序博客网时间：2024/06/06 11:40

在下面的例子中，我们首先加载和解析数据，然后使用KMeans算法将数据聚成两类。聚类的数目可以在程序中设定并传递给KMeans算法。然后计算集合内方差和（ Within Set Sum of Squared Error，这是评价聚类好坏的标准，数值越小说明同一簇实例之间的距离越小。---译者注）

import org.apache.spark.mllib.clustering.KMeans// Load and parse the dataval data = sc.textFile("kmeans_data.txt")val parsedData = data.map( _.split(' ').map(_.toDouble))// Cluster the data into two classes using KMeansval numIterations = 20val numClusters = 2val clusters = KMeans.train(parsedData, numClusters, numIterations)// Evaluate clustering by computing Within Set Sum of Squared Errorsval WSSSE = clusters.computeCost(parsedData)println("Within Set Sum of Squared Errors = " + WSSSE)

0 0