Python实现K-means聚类

来源：互联网发布：地摊毛巾知乎编辑：程序博客网时间：2024/06/03 20:39

kmeans是最简单的聚类算法之一，但是运用十分广泛。最近在工作中也经常遇到这个算法。kmeans一般在数据分析前期使用，选取适当的k，将数据分类后，然后分类研究不同聚类下数据的特点。

kmeans算法步骤：

1 随机选取k个中心点

2 遍历所有数据，将每个数据划分到最近的中心点中

3 计算每个聚类的平均值，并作为新的中心点

4 重复2-3，直到这k个聚类中心点不再变化（收敛了），或执行了足够多的迭代

下面是一个对二维数据用K-means进行聚类的示例，类中心标记为绿色大圆环，聚类出的两类分别标记为蓝色星号和红色点。

实现代码：

from scipy.cluster.vq import *from numpy.random import randnfrom numpy import vstackfrom numpy import arrayfrom numpy import wherefrom matplotlib.pyplot import figurefrom matplotlib.pyplot import plotfrom matplotlib.pyplot import axisfrom matplotlib.pyplot import showclass1=1.5*randn(100,2)class2=randn(100,2)+array([5,5])features=vstack((class1,class2))centriods,variance=kmeans(features,2)code,distance=vq(features,centriods)figure()ndx=where(code==0)[0]plot(features[ndx,0],features[ndx,1],'*')ndx=where(code==1)[0]plot(features[ndx,0],features[ndx,1],'r.')plot(centriods[:,0],centriods[:,1],'go')axis('off')show()

1 0