创建正态分布的点并用k均值算法聚类

来源:互联网 发布:音质增强软件 编辑:程序博客网 时间:2024/06/05 06:58

一、要用python创建正态分布的点,你得安装相关安装包。

比如pip,numpy,scipy等:

pip是安装扩展名为.whl的python包用的工具。

1、下载地址:https://pypi.python.org/pypi/pip#downloads   ,选择


目录的第二个,解压到一个目录下,比如D: 。

2、找到所在目录,在cmd里切换到此目录,并输入执行

python setup.py install
指令。

3、把C:/Python27/Scripts;添加到系统环境变量。

4、重启cmd,输入pip,显示pip的帮助信息,则安装成功。

numpy和scipy下载地址:

numpy:http://www.lfd.uci.edu/~gohlke/pythonlibs/#numpy

scipy:  http://www.lfd.uci.edu/~gohlke/pythonlibs/#scipy  

二、创建正态分布的随机点

1、利用numpy包中的random.normal函数创建

例如:

import numpy#生成正态分布  x = numpy.round(numpy.random.normal(1.75, 0.5, 20),2)  y = numpy.round(numpy.random.normal(100, 10, 20), 2)z = numpy.column_stack((x, y))  print(z) 
输出:
[[   1.64  117.74] [   1.7   103.37] [   2.01   78.99] [   2.07   96.34] [   2.47  102.45] [   1.48   98.99] [   1.44   88.01] [   1.55  105.95] [   1.54  105.66] [   1.57  113.98] [   1.56  118.53] [   1.62  102.22] [   1.27  106.08] [   1.84  111.  ] [   1.68   94.13] [   2.19   86.87] [   1.19  102.93] [   1.13  115.38] [   2.52  113.05] [   2.31   88.71]]
其中random.normal(a,b,c)函数的a为平均数,b为标准差,c为随机数的个数

2、把点在图表上表示出来

相关函数参考网址:http://blog.csdn.net/ywjun0919/article/details/8692018

例如:

import numpyimport matplotlib.pyplot as pltplt.figure(1)#生成正态分布  x = numpy.round(numpy.random.normal(1.75, 0.5, 200),2)  y = numpy.round(numpy.random.normal(100, 10, 200), 2)plt.figure(1)plt.plot(x,y,'b*')plt.show()
输出:

三、对上述方法得到的点用k均值算法聚类

k均值的介绍我放在了下一个博文中,如有不懂可查看我的下一个博文: k均值介绍

下面我采用的是k=3时的聚类,也就是分别创建了3组正态分布随机数

代码如下:

import numpyimport matplotlib.pyplot as pltimport randomplt.figure(1)plt.figure(2)plt.figure(3)#生成正态分布  x = numpy.round(numpy.random.normal(100, 10, 200),2)  y = numpy.round(numpy.random.normal(100, 5, 200), 2)  p = numpy.round(numpy.random.normal(125, 10, 200),2)  q = numpy.round(numpy.random.normal(125, 8, 200), 2)j  =numpy.round(numpy.random.normal(150, 10, 200), 2)k =numpy.round(numpy.random.normal(100, 6, 200), 2)a=[]b=[]for i in x:     a.append(i)for i in p:     a.append(i)for i in j:     a.append(i)for i in y:     b.append(i)for i in q:     b.append(i)for i in k:     b.append(i)def cal_distance(a, b):                                #计算两点的距离    return (a[0]- b[0]) ** 2 + (a[1] - b[1]) **2k1=[random.randint(100,150) for _ in range(2)]k2=[random.randint(100,150) for _ in range(2)]k3=[random.randint(100,150) for _ in range(2)]'''k1=[100,100]k2=[125,125]k3=[150,100]''' #若用上者的赋值,则容易出现一个聚类集合为空,用下者的话,不是很符合均值函数中的随机取值clu_k1 = []                                            #划分三个聚类clu_k2 = []clu_k3 = []while True:    clu_k1 = []    clu_k2 = []    clu_k3 = []    for i in range(600):        ab_distance1 = cal_distance(k1, [a[i], b[i]])        ab_distance2 = cal_distance(k2, [a[i], b[i]])  #计算每个样本到聚类中心的距离        ab_distance3 = cal_distance(k3, [a[i], b[i]])        if (ab_distance1 <= ab_distance2 and ab_distance1 <= ab_distance3):            clu_k1.append(i)        elif (ab_distance2 <= ab_distance1 and ab_distance2 <= ab_distance3):            clu_k2.append(i)                           #每个样本归于更近的聚类中心        else:             clu_k3.append(i)                      k1_x = sum([a[i] for i in clu_k1]) / len(clu_k1)   #每类样本计算质心并使之成为新的聚类中心    k1_y = sum([b[i] for i in clu_k1]) / len(clu_k1)    k2_x = sum([a[i] for i in clu_k2]) / len(clu_k2)    k2_y = sum([b[i] for i in clu_k2]) / len(clu_k2)    k3_x = sum([a[i] for i in clu_k3]) / len(clu_k3)    k3_y = sum([b[i] for i in clu_k3]) / len(clu_k3)    #原来质心和新质心的距离(也就是偏移量)与我们所给的标准进行比较    if cal_distance(k1, [k1_x, k1_y])>=0.1 or cal_distance(k2, [k2_x, k2_y])>=0.1 or cal_distance(k3, [k3_x, k3_y])>=0.1:        k1 = [k1_x, k1_y]        k2 = [k2_x, k2_y]        k3 = [k3_x, k3_y]    #若偏移量大于所给标准,则求出的质心取代原来的聚类中心    else:        break                                          #偏移量小于所给标准,则迭代终止kv1_x = [a[i] for i in clu_k1]                         #迭代终止后将同一类的点x,y分别以列表形式存放kv1_y = [b[i] for i in clu_k1]kv2_x = [a[i] for i in clu_k2]kv2_y = [b[i] for i in clu_k2]kv3_x = [a[i] for i in clu_k3]kv3_y = [b[i] for i in clu_k3]r = numpy.column_stack((a, b))s = numpy.column_stack((kv1_x, kv1_y))t = numpy.column_stack((kv2_x, kv2_y))o= numpy.column_stack((kv3_x, kv3_y))print '这些数为:',rprint '一类为:',sprint '第二类为',tprint '第三类为',oplt.figure(1)plt.plot(x,y,'r+')plt.plot(p,q,'gx')plt.plot(j,k,'c*')plt.figure(2)plt.plot(a,b,'b*')plt.figure(3)plt.plot(kv1_x,kv1_y ,'r+')plt.plot(kv2_x, kv2_y,'gx')plt.plot(kv3_x, kv3_y,'c*')plt.show()

输出:

这些数为: [[  85.94   98.42] [ 104.67  100.14] [  96.07  110.85] ...,  [ 152.34  106.77] [ 156.34   97.74] [ 160.91   87.55]]一类为: [[ 117.08  111.11] [ 121.47  107.21] [ 122.64  106.21] [ 113.88  113.04] [ 110.61  115.01] [ 121.53  109.03] [ 140.98  118.19] [ 122.92  131.79] [ 136.95  120.17] [ 128.46  125.79] [ 130.69  124.46] [ 128.58  126.54] [ 134.97  134.14] [ 120.22  131.08] [ 128.08  122.5 ] [ 115.14  127.8 ] [ 135.58  111.11] [ 127.58  133.76] [ 134.96  115.2 ] [ 107.74  118.7 ] [ 110.78  127.36] [ 115.28  129.  ] [ 131.63  120.76] [ 132.35  119.35] [ 134.2   134.82] [ 125.85  112.84] [ 123.5   113.75] [ 112.19  117.7 ] [ 116.82  126.86] [ 111.38  127.22] [ 132.31  118.51] [ 135.47  120.15] [ 115.99  120.46] [ 126.14  121.92] [ 136.24  122.32] [ 125.02  131.13] [ 107.    125.96] [ 124.05  116.46] [ 126.9   128.11] [ 116.47  120.48] [ 144.51  132.16] [ 127.85  122.97] [ 128.22  142.98] [ 116.81  125.1 ] [ 113.67  136.17] [ 133.76  120.26] [ 121.84  129.  ] [ 132.75  110.94] [ 120.41  117.21] [ 129.81  133.96] [ 120.68  130.02] [ 136.23  128.13] [ 118.04  113.01] [ 138.17  137.38] [ 108.74  117.03] [ 125.56  131.15] [ 135.09  126.98] [ 129.13  114.04] [ 129.78  125.32] [ 115.42  128.88] [ 115.83  126.32] [ 146.63  137.55] [ 146.59  125.28] [ 110.07  133.52] [ 122.91  137.7 ] [ 133.78  127.74] [ 115.83  134.04] [ 131.22  120.95] [ 134.69  129.15] [ 119.62  113.19] [ 122.44  116.64] [ 124.48  116.24] [ 132.64  135.51] [ 106.91  133.5 ] [ 134.41  118.75] [ 126.19  122.9 ] [ 120.95  124.54] [ 140.78  117.69] [ 125.24  134.05] [ 113.44  124.26] [ 116.16  118.94] [ 130.39  133.65] [ 125.03  120.55] [ 126.43  118.81] [ 116.41  135.06] [ 114.1   136.29] [ 125.01  131.88] [ 123.86  139.1 ] [ 127.59  127.12] [ 130.04  127.25] [ 117.69  124.44] [ 130.93  123.8 ] [ 129.73  138.21] [ 115.97  112.83] [ 125.98  130.39] [ 121.69  115.37] [ 126.61  136.99] [ 118.24  123.29] [ 126.46  113.5 ] [ 128.99  128.78] [ 122.22  112.55] [ 122.98  121.08] [ 118.27  125.11] [ 119.56  127.51] [ 130.38  131.22] [ 128.63  116.32] [ 114.43  115.93] [ 114.99  138.32] [ 126.03  126.67] [ 133.54  128.57] [ 125.24  128.95] [ 120.54  118.91] [ 121.13  127.35] [ 112.52  115.22] [ 125.38  122.02] [ 130.35  117.77] [ 122.06  129.59] [ 118.78  120.62] [ 110.21  124.38] [ 107.48  120.17] [ 127.55  116.25] [ 133.23  125.54] [ 141.03  125.  ] [ 118.81  122.75] [ 110.42  121.52] [ 121.58  115.07] [ 122.19  121.11] [ 131.58  133.09] [ 136.7   125.87] [ 113.1   131.95] [ 122.72  140.4 ] [ 142.88  126.34] [ 128.15  120.3 ] [ 135.99  136.01] [ 120.98  131.72] [ 114.63  120.99] [ 108.25  127.23] [ 114.11  130.01] [ 121.29  121.7 ] [ 113.51  132.91] [ 118.63  135.44] [ 118.37  127.7 ] [ 131.56  121.23] [ 130.07  115.02] [ 135.67  125.13] [ 114.33  133.69] [ 125.85  112.15] [ 110.54  130.47] [ 127.17  123.51] [ 121.2   121.05] [ 141.71  128.24] [ 128.67  116.3 ] [ 134.6   133.75] [ 125.17  124.42] [ 130.29  128.19] [ 121.64  128.79] [ 132.1   121.2 ] [ 132.77  117.34] [ 132.82  127.24] [ 131.42  129.05] [ 136.77  135.73] [ 130.28  133.91] [ 119.77  132.7 ] [ 122.37  106.25] [ 123.01  133.01] [ 125.45  128.39] [ 127.67  131.19] [ 123.17  133.13] [ 129.82  133.66] [ 129.82  124.02] [ 120.41  125.03] [ 131.81  136.02] [ 131.73  123.62] [ 132.79  119.01] [ 127.09  125.82] [ 126.03  127.94] [ 141.63  121.94] [ 130.32  138.81] [ 120.55  126.12] [ 120.64  129.58] [ 136.87  136.63] [ 127.12  113.78] [ 128.05  121.69] [ 107.79  134.64] [  95.35  139.67] [ 127.    137.27] [ 136.58  136.18] [ 132.77  120.15] [ 119.21  114.64] [ 115.    122.49] [ 110.04  120.82] [ 134.93  137.31] [ 123.08  106.97]]第二类为 [[  85.94   98.42] [ 104.67  100.14] [  96.07  110.85] [  97.14   90.87] [ 104.5   103.29] [ 107.28  105.57] [  92.69  111.78] [  82.03   98.47] [ 102.64  103.9 ] [ 107.01   97.89] [ 110.17   94.62] [  98.72   91.91] [  85.47   99.38] [ 100.61   98.24] [ 107.11   93.61] [  82.44  101.42] [ 111.77  108.28] [ 109.39   97.56] [ 100.27   99.97] [ 106.47  108.75] [  98.54  107.64] [ 103.39   96.33] [  92.78   97.27] [  99.55   93.84] [  90.69  100.79] [  96.41  105.07] [ 117.98   94.37] [  97.1    97.87] [  99.01   93.17] [ 115.48   97.11] [ 110.49   98.66] [ 110.7   106.9 ] [ 106.26   97.  ] [ 107.27  106.58] [  70.16   98.68] [ 101.42  105.1 ] [ 102.81  102.88] [  96.86   92.76] [ 102.71  114.37] [ 113.23   94.97] [ 103.45   95.69] [ 104.2   104.2 ] [  95.34   98.38] [ 107.51  106.93] [  99.92  101.5 ] [  96.51  105.15] [ 103.75   99.07] [  94.64   94.3 ] [  99.17  100.34] [ 101.49   97.35] [  90.49   99.07] [ 106.51  100.84] [  84.3    93.33] [ 104.59  104.71] [ 100.08  103.65] [  97.56  105.2 ] [  81.7   104.04] [ 103.19   99.19] [  91.21   93.94] [ 107.03   95.11] [ 116.     97.57] [  84.79   98.99] [  98.75   92.25] [ 100.46  106.45] [ 111.36  107.92] [ 105.86   93.73] [  92.15  101.89] [  84.59  101.17] [ 110.73  102.56] [  95.    100.32] [  74.73  101.95] [ 105.67  104.96] [  87.6    97.67] [  94.49  105.52] [ 103.92   97.27] [  98.53  104.61] [ 102.07  102.04] [  78.75   93.58] [  97.7    98.76] [  89.67   92.57] [ 110.06  103.42] [ 114.09   98.98] [ 113.36   92.11] [  92.66   95.43] [ 107.16   97.19] [ 107.26   88.18] [  98.57   98.01] [  99.04  100.53] [  93.87  105.02] [  91.62  102.17] [ 110.93   98.81] [ 102.56   96.88] [ 105.72  100.65] [ 108.16  100.89] [ 108.58   99.34] [ 109.37   99.38] [  94.71   99.59] [  89.94  106.1 ] [ 112.24  101.64] [  82.86   94.53] [ 103.5   101.06] [ 103.06  101.95] [ 106.04  107.56] [  99.96  108.03] [  95.98   90.36] [ 118.23   97.22] [ 111.92   98.94] [  90.16  100.53] [ 103.03   99.87] [ 100.62   95.92] [ 100.79   96.87] [ 114.     99.36] [ 106.23  105.05] [  93.     99.92] [  92.27   93.08] [  95.31  103.68] [ 101.03   96.02] [  97.31  104.47] [  99.37   91.35] [ 100.93  102.39] [ 110.32  100.67] [  78.26   96.78] [ 102.27  106.81] [  90.53  102.33] [  99.81   99.23] [ 104.63  117.48] [  86.48  106.39] [  90.5    98.98] [  96.4   100.16] [ 100.51  107.16] [  84.44   99.57] [ 102.05   94.76] [  98.69  107.32] [  91.82   90.08] [  87.95  105.95] [ 108.88   89.58] [  81.32  100.93] [ 103.97   98.03] [ 108.86  100.43] [ 108.7    98.99] [  94.59  101.57] [  96.58  105.37] [  93.01   94.2 ] [  93.4   101.47] [  98.32   90.39] [  98.3   100.36] [ 108.27   94.91] [  97.78   98.28] [  93.79  102.6 ] [ 101.35  112.03] [ 110.4   106.31] [  92.39  101.15] [  87.44  104.65] [ 101.15  102.91] [ 103.95   98.  ] [ 101.46  108.39] [  91.1    96.25] [ 101.85  106.58] [  88.95   97.83] [  91.68  107.33] [  89.7   101.55] [ 108.26  100.37] [ 114.75   92.22] [  99.22  111.48] [ 111.     94.33] [  90.31   98.04] [ 108.6    95.  ] [  78.58  100.99] [ 100.73   98.51] [  91.79  104.08] [ 100.01   97.01] [ 113.51   93.62] [ 108.12  100.31] [  96.63  101.95] [  81.69   95.06] [  99.58   95.83] [ 101.77  100.68] [  87.12   99.15] [  80.84   98.46] [ 102.06   98.26] [  92.38   89.01] [  99.54   95.1 ] [  95.65   95.09] [  95.54  106.35] [ 108.57   99.14] [ 105.22   97.37] [  90.48  107.22] [  94.91  106.98] [ 103.12   97.47] [  89.14   93.38] [ 104.15   89.53] [  87.44  105.46] [ 108.15  103.46] [  96.91  102.67] [  94.8    94.87] [ 114.39  106.93] [ 115.33  107.57] [ 103.54  120.73] [ 112.4   110.67] [ 114.57  104.11] [ 124.05   95.5 ] [ 124.37   97.43] [ 120.98  103.96] [ 124.71   99.59]]第三类为 [[ 140.22  113.22] [ 145.09  111.17] [ 140.4   113.12] [ 137.95  110.67] [ 144.58  119.59] [ 138.9   111.16] [ 148.78  123.14] [ 154.16  127.13] [ 161.42  100.84] [ 145.19   95.56] [ 170.73   97.93] [ 160.22  106.44] [ 142.26  106.71] [ 176.96   92.9 ] [ 139.42   91.28] [ 149.74   99.03] [ 153.46  109.42] [ 138.67   97.41] [ 155.09   96.9 ] [ 152.88   99.27] [ 153.83   92.88] [ 153.97  102.97] [ 134.24   89.21] [ 147.48   93.85] [ 129.19   99.21] [ 141.13   94.51] [ 142.05  108.65] [ 156.1    93.23] [ 154.15   83.91] [ 157.36  108.43] [ 157.88  107.4 ] [ 169.16  101.57] [ 150.15   96.11] [ 137.05  109.06] [ 160.61  104.39] [ 143.14   99.95] [ 135.94   94.58] [ 161.08   95.15] [ 154.01  105.74] [ 148.56   89.02] [ 145.88  101.15] [ 164.54  100.36] [ 152.5   109.39] [ 136.17   98.47] [ 153.52  100.93] [ 134.34   98.59] [ 162.44  109.99] [ 144.84  100.99] [ 166.12  103.65] [ 155.08   95.23] [ 165.29   99.48] [ 141.86  101.51] [ 142.43   99.33] [ 143.75  111.87] [ 127.13   99.37] [ 141.67  106.6 ] [ 132.92  101.7 ] [ 173.51   85.79] [ 140.04   97.97] [ 166.98   98.86] [ 151.12   94.18] [ 170.41   92.78] [ 156.52   94.77] [ 138.61   99.24] [ 142.49  105.61] [ 140.71  107.89] [ 147.17   93.45] [ 164.42  106.34] [ 156.61   91.57] [ 146.55   98.77] [ 137.27   98.88] [ 136.17  101.28] [ 164.52   88.44] [ 159.22   96.07] [ 132.19   93.62] [ 145.76   94.03] [ 159.41   95.08] [ 134.48  102.17] [ 168.69  100.78] [ 153.7   108.77] [ 161.62  109.73] [ 159.07  108.53] [ 154.43  100.74] [ 157.27  101.07] [ 153.47   96.14] [ 176.55  101.37] [ 166.57   92.83] [ 128.1    99.42] [ 125.46   97.47] [ 151.29   98.63] [ 157.32  103.41] [ 141.39  103.78] [ 145.94  111.27] [ 140.07  110.24] [ 129.89   90.19] [ 147.32   90.43] [ 171.42  108.46] [ 161.11  102.88] [ 140.86  104.19] [ 153.79   90.24] [ 144.06  101.68] [ 157.89  100.08] [ 141.19   94.75] [ 148.95  101.38] [ 164.12   89.32] [ 146.78   90.48] [ 147.1    89.21] [ 136.71   95.82] [ 163.09   92.78] [ 165.03   98.33] [ 147.3    92.59] [ 134.33   98.48] [ 160.82  100.99] [ 152.7    93.09] [ 143.     90.57] [ 158.23   95.32] [ 155.4   106.84] [ 140.83   90.63] [ 135.44   98.71] [ 136.81  109.46] [ 141.36  101.03] [ 147.3    90.44] [ 142.59  109.69] [ 159.99   97.44] [ 150.64   97.3 ] [ 144.57  106.3 ] [ 145.98   92.15] [ 146.37  102.35] [ 142.57   89.99] [ 152.87  106.97] [ 154.44   98.33] [ 161.81   93.26] [ 145.53   94.39] [ 152.15  107.16] [ 143.96  106.32] [ 149.2    98.84] [ 153.71  100.52] [ 159.73  110.09] [ 130.68  101.46] [ 163.25   96.68] [ 164.91   99.8 ] [ 136.45  102.94] [ 129.72  103.31] [ 150.47  105.76] [ 161.78  102.04] [ 153.16   97.61] [ 149.5   105.62] [ 146.55  101.11] [ 139.91  102.71] [ 140.57   99.44] [ 145.02  102.73] [ 151.41  101.53] [ 141.24   85.42] [ 147.58  103.44] [ 153.03  102.85] [ 136.19  102.22] [ 139.8   106.6 ] [ 174.63  108.34] [ 141.53   92.38] [ 138.82   97.27] [ 163.37   91.86] [ 164.82  107.82] [ 148.52   90.85] [ 150.08  108.38] [ 147.47   99.07] [ 155.28   88.86] [ 137.69  108.6 ] [ 146.37   93.46] [ 168.86  104.28] [ 170.26  111.08] [ 155.75   93.26] [ 137.99   95.27] [ 148.7    98.92] [ 154.64   97.04] [ 156.74   97.11] [ 137.03   94.26] [ 143.12  101.26] [ 140.46  105.23] [ 142.47  102.32] [ 139.57  100.14] [ 152.88   98.73] [ 159.45   88.95] [ 150.17  104.48] [ 138.02   97.35] [ 143.38   94.66] [ 156.91   88.23] [ 147.37   92.18] [ 146.25   99.9 ] [ 146.63  106.08] [ 149.78  106.48] [ 136.01  102.04] [ 133.15   99.86] [ 137.34   97.34] [ 137.17   93.68] [ 151.14  104.75] [ 154.13   96.9 ] [ 160.01   87.6 ] [ 142.98  104.83] [ 147.54   94.89] [ 157.04   98.87] [ 152.34  106.77] [ 156.34   97.74] [ 160.91   87.55]]


注:在上述代码中随机数的产生random.normal(a,b,c),a,b的变化可能会导致计算过程出现某个集合为空,导致出现错误,如:

Traceback (most recent call last):  File "C:\Python27\Daima\k均值算法\正态分布k=3.py", line 61, in <module>    k2_x = sum([a[i] for i in clu_k2]) / len(clu_k2)ZeroDivisionError: integer division or modulo by zero
#当出现这样的错误时,不用担心,这是由于取初始质心时,这个点离数据太远,导致这个相应的聚类集合为空,你可以试着多试几次或者调节random.normal(a,b,c)函数中a,b的值,当然更有效的方法是自己定义初始的质心,初始质心选的好,就不会出现刚才所述的问题了。