Scala语言 + Spark MLLib进行机器学习---聚类

来源:互联网 发布:淘宝手机卡禁售 编辑:程序博客网 时间:2024/06/06 11:40

在下面的例子中,我们首先加载和解析数据,然后使用KMeans算法将数据聚成两类。聚类的数目可以在程序中设定并传递给KMeans算法。然后计算集合内方差和( Within Set Sum of Squared Error,这是评价聚类好坏的标准,数值越小说明同一簇实例之间的距离越小。---译者注)


import org.apache.spark.mllib.clustering.KMeans// Load and parse the dataval data = sc.textFile("kmeans_data.txt")val parsedData = data.map( _.split(' ').map(_.toDouble))// Cluster the data into two classes using KMeansval numIterations = 20val numClusters = 2val clusters = KMeans.train(parsedData, numClusters, numIterations)// Evaluate clustering by computing Within Set Sum of Squared Errorsval WSSSE = clusters.computeCost(parsedData)println("Within Set Sum of Squared Errors = " + WSSSE)

0 0
原创粉丝点击