动态层次聚类方法示例
来源:互联网 发布:新浪微博淘宝客粉丝 编辑:程序博客网 时间:2024/04/30 04:59
Connections 17(2):78-80
Copyright 1994 INSNA
Stephen P. Borgatti
University of South Carolina
Given a set of N items to be clustered, and an NxN distance (or similarity) matrix, the basic process of Johnson's (1967) hierarchical clustering is this:
Start by assigning each item to its own cluster, so that if you have N items, you now have N clusters, each containing just one item. Let the distances (similarities) between the clusters equal the distances (similarities) between the items they contain.
Find the closest (most similar) pair of clusters and merge them into a single cluster, so that now you have one less cluster.
Compute distances (similarities) between the new cluster and each of the old clusters.
Repeat steps 2 and 3 until all items are clustered into a single cluster of size N.
Step 3 can be done in different ways, which is what distinguishessingle-linkfromcomplete-linkandaverage-linkclustering. Insingle-linkclustering (also called theconnectednessorminimummethod), we consider the distance between one cluster and another cluster to be equal to the shortestdistance from any member of one cluster to any member of the other cluster. If the data consist of similarities, we consider the similarity between one cluster and another cluster to be equal to the greatest similarity from any member of one cluster to any member of the other cluster. Incomplete-linkclustering (also called thediameterormaximummethod), we consider the distance between one cluster and another cluster to be equal to the longestdistance from any member of one cluster to any member of the other cluster. Inaverage-linkclustering, we consider the distance between one cluster and another cluster to be equal to theaveragedistance from any member of one cluster to any member of the other cluster. A variation on average-link clustering is the UCLUS method of D'Andrade (1978) which uses the median distance.
Example.The following pages trace a hierarchical clustering of distances in miles between U.S. cities. The method of clustering issingle-link.
Input distance matrix:
BOSNYDCMIACHISEASFLADENBOS020642915049632976309529791949NY206023313088022815293427861771DC429233010756712684279926311616MIA150413081075013293273305326872037CHI96380267113290201321422054996SEA29762815268432732013080811311307SF3095293427993053214280803791235LA29792786263126872054113137901059DEN19491771161620379961307123510590
The nearest pair of cities is BOS and NY, at distance 206. These are merged into a single cluster called "BOS/NY".
Then we compute the distance from this new compound object to all other objects. In single link clustering the rule is that the distance from the compound object to another object is equal to the shortest distance from any member of the cluster to the outside object. So the distance from "BOS/NY" to DC is chosen to be 233, which is the distance from NY to DC. Similarly, the distance from "BOS/NY" to DEN is chosen to be 1771.
After merging BOS with NY:
BOS/NYDCMIACHISEASFLADENBOS/NY022313088022815293427861771DC223010756712684279926311616MIA13081075013293273305326872037CHI80267113290201321422054996SEA2815268432732013080811311307SF293427993053214280803791235LA2786263126872054113137901059DEN1771161620379961307123510590
The nearest pair of objects is BOS/NY and DC, at distance 223. These are merged into a single cluster called "BOS/NY/DC". Then we compute the distance from this new cluster to all other clusters, to get a new distance matrix:
After merging DC with BOS-NY:
BOS/NY/DCMIACHISEASFLADENBOS/NY/DC010756712684279926311616MIA1075013293273305326872037CHI67113290201321422054996SEA268432732013080811311307SF27993053214280803791235LA263126872054113137901059DEN161620379961307123510590
Now, the nearest pair of objects is SF and LA, at distance 379. These are merged into a single cluster called "SF/LA". Then we compute the distance from this new cluster to all other objects, to get a new distance matrix:
After merging SF with LA:
BOS/
NY/DC
MIACHISEASF/LADENBOS/NY/DC01075671268426311616MIA107501329327326872037CHI6711329020132054996SEA26843273201308081307SF/LA26312687205480801059DEN16162037996130710590
Now, the nearest pair of objects is CHI and BOS/NY/DC, at distance 671. These are merged into a single cluster called "BOS/NY/DC/CHI". Then we compute the distance from this new cluster to all other clusters, to get a new distance matrix:
After merging CHI with BOS/NY/DC:
BOS/NY/DC/
CHI
MIASEASF/LADENBOS/NY/DC/CHI0107520132054996MIA10750327326872037SEA2013327308081307SF/LA2054268780801059DEN9962037130710590
Now, the nearest pair of objects is SEA and SF/LA, at distance 808. These are merged into a single cluster called "SF/LA/SEA". Then we compute the distance from this new cluster to all other clusters, to get a new distance matrix:
After merging SEA with SF/LA:
BOS/NY/DC/CHIMIASF/LA/SEADENBOS/NY/DC/CHI010752013996MIA1075026872037SF/LA/SEA2054268701059DEN996203710590
Now, the nearest pair of objects is DEN and BOS/NY/DC/CHI, at distance 996. These are merged into a single cluster called "BOS/NY/DC/CHI/DEN". Then we compute the distance from this new cluster to all other clusters, to get a new distance matrix:
After merging DEN with BOS/NY/DC/CHI:
BOS/NY/DC/CHI/DENMIASF/LA/SEABOS/NY/DC/CHI/DEN010751059MIA107502687SF/LA/SEA105926870
Now, the nearest pair of objects is BOS/NY/DC/CHI/DEN and SF/LA/SEA, at distance 1059. These are merged into a single cluster called "BOS/NY/DC/CHI/DEN/SF/LA/SEA". Then we compute the distance from this new compound object to all other objects, to get a new distance matrix:
After merging SF/LA/SEA with BOS/NY/DC/CHI/DEN:
BOS/NY/DC/CHI/DEN/SF/LA/SEAMIABOS/NY/DC/CHI/DEN/SF/LA/SEA01075MIA10750
Finally, we merge the last two clusters at level 1075. This process is summarized by the clustering diagram printed by many software packages:
In the diagram, the columns are associated with the items and the rows are associated with levels (stages) of clustering. An 'X' is placed between two columns in a given row if the corresponding items are merged at that stage in the clustering.
- 动态层次聚类方法示例
- 层次聚类方法
- 层次聚类方法
- 机器学习之层次聚类及代码示例
- 层次聚类方法(Hierarchical Clustering)
- 层次聚类——自底向上方法
- 存储器层次结构示例
- JS动态调用方法名示例
- MATLAB 层次聚类
- 层次聚类算法
- 层次聚类
- 层次聚类算法
- 层次聚类算法
- 层次聚类
- 层次聚类
- 层次聚类算法
- 层次聚类算法
- 层次聚类算法
- Grid ++ MIME 类型配置 载入报表数据,检查此URL及其数据,错误提示 网络服务器响应不成功
- 一
- 函数式编程
- 浅谈const的使用
- Routes #Facebook Relay文档翻译#
- 动态层次聚类方法示例
- 熟悉linux-用户和组
- maven -Dmaven.multiModuleProjectDirectory system propery is not set. Check $M2_HOME
- 4G何时才能打电话 VoLTE到底是个啥?
- 关于重载运算符的思考
- MyEclipse2014快速配置SpringMVC入门实例
- 我必须改变
- 用scala语言实现并行堆排序(top k)
- struct和union结构体和联合体的区别