针对toy datasets的不同聚类方法比较
来源:互联网 发布:思迅软件怎么样 编辑:程序博客网 时间:2024/06/06 10:25
Comparing different clustering algorithms on toy datasets
针对toy datasets的不同聚类方法比较
原地址 http://scikit-learn.org/stable/auto_examples/cluster/plot_cluster_comparison.html
This example aims at showing characteristics of different clustering algorithms on datasets that are “interesting”but still in 2D. The last dataset is an example of a ‘null’situation for clustering: the data is homogeneous, andthere is no good clustering.
这个例子是为了说明不同聚类算法在2维空间下的特性。这些新数据是聚类分析针对“空”的情形:数据是均匀则没有好的簇。
While these examples give some intuition about the algorithms,this intuition might not apply to very high dimensional data.
而这些例子仅仅给出算法的一些直观的例子,这些例子未必适用于高维数据。
The results could be improved by tweaking the parameters foreach clustering strategy, for instance setting the number ofclusters for the methods that needs this parameterspecified. Note that affinity propagation has a tendency to create many clusters. Thus in this example its two parameters(damping and per-point preference) were set to to mitigate this behavior.
可以通过修改参数来提高聚类效果。例如通过设置簇的个数来设置。需要注意的是,临近扩展成为一种生成簇的趋势。因此,例子中有两个参数(衰减和点偏)被用来设置。
Python source code: plot_cluster_comparison.py
print(__doc__)import timeimport numpy as npimport matplotlib.pyplot as pltfrom sklearn import cluster, datasetsfrom sklearn.neighbors import kneighbors_graphfrom sklearn.preprocessing import StandardScalernp.random.seed(0)# Generate datasets. We choose the size big enough to see the scalability# of the algorithms, but not too big to avoid too long running timesn_samples = 1500noisy_circles = datasets.make_circles(n_samples=n_samples, factor=.5, noise=.05)noisy_moons = datasets.make_moons(n_samples=n_samples, noise=.05)blobs = datasets.make_blobs(n_samples=n_samples, random_state=8)no_structure = np.random.rand(n_samples, 2), Nonecolors = np.array([x for x in 'bgrcmykbgrcmykbgrcmykbgrcmyk'])colors = np.hstack([colors] * 20)clustering_names = [ 'MiniBatchKMeans', 'AffinityPropagation', 'MeanShift', 'SpectralClustering', 'Ward', 'AgglomerativeClustering', 'DBSCAN', 'Birch']plt.figure(figsize=(len(clustering_names) * 2 + 3, 9.5))plt.subplots_adjust(left=.02, right=.98, bottom=.001, top=.96, wspace=.05, hspace=.01)plot_num = 1datasets = [noisy_circles, noisy_moons, blobs, no_structure]for i_dataset, dataset in enumerate(datasets): X, y = dataset # normalize dataset for easier parameter selection X = StandardScaler().fit_transform(X) # estimate bandwidth for mean shift bandwidth = cluster.estimate_bandwidth(X, quantile=0.3) # connectivity matrix for structured Ward connectivity = kneighbors_graph(X, n_neighbors=10, include_self=False) # make connectivity symmetric connectivity = 0.5 * (connectivity + connectivity.T) # create clustering estimators ms = cluster.MeanShift(bandwidth=bandwidth, bin_seeding=True) two_means = cluster.MiniBatchKMeans(n_clusters=2) ward = cluster.AgglomerativeClustering(n_clusters=2, linkage='ward', connectivity=connectivity) spectral = cluster.SpectralClustering(n_clusters=2, eigen_solver='arpack', affinity="nearest_neighbors") dbscan = cluster.DBSCAN(eps=.2) affinity_propagation = cluster.AffinityPropagation(damping=.9, preference=-200) average_linkage = cluster.AgglomerativeClustering( linkage="average", affinity="cityblock", n_clusters=2, connectivity=connectivity) birch = cluster.Birch(n_clusters=2) clustering_algorithms = [ two_means, affinity_propagation, ms, spectral, ward, average_linkage, dbscan, birch] for name, algorithm in zip(clustering_names, clustering_algorithms): # predict cluster memberships t0 = time.time() algorithm.fit(X) t1 = time.time() if hasattr(algorithm, 'labels_'): y_pred = algorithm.labels_.astype(np.int) else: y_pred = algorithm.predict(X) # plot plt.subplot(4, len(clustering_algorithms), plot_num) if i_dataset == 0: plt.title(name, size=18) plt.scatter(X[:, 0], X[:, 1], color=colors[y_pred].tolist(), s=10) if hasattr(algorithm, 'cluster_centers_'): centers = algorithm.cluster_centers_ center_colors = colors[:len(centers)] plt.scatter(centers[:, 0], centers[:, 1], s=100, c=center_colors) plt.xlim(-2, 2) plt.ylim(-2, 2) plt.xticks(()) plt.yticks(()) plt.text(.99, .01, ('%.2fs' % (t1 - t0)).lstrip('0'), transform=plt.gca().transAxes, size=15, horizontalalignment='right') plot_num += 1plt.show()
- 针对toy datasets的不同聚类方法比较
- CSS针对不同的浏览器的适应方法
- Android针对不同屏幕分辨率的4种布局适应方法
- 不同方法数组去重的比较
- Android Studio 多渠道打包中针对不同渠道不同应用名称的处理方法
- Ubuntu 下针对不同扩展名的安装包进行安装的方法
- log4j日志针对不同的类输出到不同的文件中。
- 利用互信息比较不同的聚类结果
- Swagger2 (4)针对于相同url headers 参数不同映射不同的方法问题(已解决)
- 针对不同浏览器的css样式
- 针对不同.NET版本的条件编译
- 针对不同.NET版本的条件编译
- 针对不同浏览器引擎,css3的写法
- 针对ie应该使用不同的css
- 针对不同的屏幕加载样式表
- DBCP针对不同数据库的validationQuery
- 针对不同屏幕设计图的尺寸
- DBCP针对不同数据库的validationQuery
- 开启记录自己的开发旅程
- mysql programs(一)--mysqlshow
- 项目中问题总结
- 使用 Minidumps 和 Visual Studio .NET 进行崩溃后调试
- 大二下期ACM比赛前感想
- 针对toy datasets的不同聚类方法比较
- 菜鸟学android(二): XML概述
- 关于宇宙学的一点想法
- 创建销售订单-用外部给号的方法步骤
- 在ABAP中制作一个多屏幕的应用事务
- maven的setting.xml 配置文件详解
- Windows平台的SDK、DDK与WDK (转)
- yii2对数据库的基本操作
- auto c++