Python实现k-means算法
来源:互联网 发布:符文战争桌游淘宝 编辑:程序博客网 时间:2024/05/17 10:28
这也是周志华《机器学习》的习题9.4。
数据集是西瓜数据集4.0,如下
编号,密度,含糖率1,0.697,0.462,0.774,0.3763,0.634,0.2644,0.608,0.3185,0.556,0.2156,0.403,0.2377,0.481,0.1498,0.437,0.2119,0.666,0.09110,0.243,0.26711,0.245,0.05712,0.343,0.09913,0.639,0.16114,0.657,0.19815,0.36,0.3716,0.593,0.04217,0.719,0.10318,0.359,0.18819,0.339,0.24120,0.282,0.25721,0.784,0.23222,0.714,0.34623,0.483,0.31224,0.478,0.43725,0.525,0.36926,0.751,0.48927,0.532,0.47228,0.473,0.37629,0.725,0.44530,0.446,0.459
算法很简单,就不解释了,代码也不复杂,直接放上来:
# -*- coding: utf-8 -*- """Excercise 9.4"""import numpy as npimport pandas as pdimport matplotlib.pyplot as pltimport sysimport randomdata = pd.read_csv(filepath_or_buffer = '../dataset/watermelon4.0.csv', sep = ',')[["密度","含糖率"]].values########################################## K-means ####################################### k = int(sys.argv[1])#Randomly choose k samples from data as mean vectorsmean_vectors = random.sample(data,k)def dist(p1,p2): return np.sqrt(sum((p1-p2)*(p1-p2)))while True: print mean_vectors clusters = map ((lambda x:[x]), mean_vectors) for sample in data: distances = map((lambda m: dist(sample,m)), mean_vectors) min_index = distances.index(min(distances)) clusters[min_index].append(sample) new_mean_vectors = [] for c,v in zip(clusters,mean_vectors): new_mean_vector = sum(c)/len(c) #If the difference betweenthe new mean vector and the old mean vector is less than 0.0001 #then do not updata the mean vector if all(np.divide((new_mean_vector-v),v) < np.array([0.0001,0.0001]) ): new_mean_vectors.append(v) else: new_mean_vectors.append(new_mean_vector) if np.array_equal(mean_vectors,new_mean_vectors): break else: mean_vectors = new_mean_vectors #Show the clustering resulttotal_colors = ['r','y','g','b','c','m','k']colors = random.sample(total_colors,k)for cluster,color in zip(clusters,colors): density = map(lambda arr:arr[0],cluster) sugar_content = map(lambda arr:arr[1],cluster) plt.scatter(density,sugar_content,c = color)plt.show()
运行方式:在命令行输入 python k_means.py 4。其中4就是
下面是k分别等于3,4,5的运行结果,因为一开始的均值向量是随机的,所以每次运行结果会有不同。
0 0
- Python 实现K-means算法
- Python实现k-means算法
- Python实现k-means算法
- k-means算法Python实现
- K-means算法 Python实现
- Python实现K-Means聚类算法
- K-means算法的Python实现
- k-means算法的Python实现
- k-means聚类算法python实现
- k-means算法(python)
- K-Means Python实现
- K-means算法实现
- K-means算法实现
- 聚类算法——python实现k-means算法
- 数据分类K—means 算法的python代码实现
- python 实现周志华 机器学习书中 k-means 算法
- 数据挖掘:K-Means算法的原理与Python实现
- 数据挖掘之k-means算法的Python实现
- 冠军挑战赛之我的冠军经历
- Flask学习总结笔记(11) -- 利用itsdangerous实现用户身份确认
- python核心编程第3版第1章 正则表达式【读书笔记】
- 自动瘦脸与眼睛放大美颜算法
- 集合的最优分组问题
- Python实现k-means算法
- Linux下使用Wireshark进行抓包分析(含SIP和RTP包)
- Unity T4M 中文讲解
- PPPOE和pppd的流程详解--good
- configure: error: C compiler cannot create executables
- Proguard总结
- 函数的使用(部分)
- 类的成员之二:成员方法
- 数据库相关文章