第27节--层次聚类(下)
来源:互联网 发布:three.js 阴影 锯齿 编辑:程序博客网 时间:2024/05/18 01:47
层次聚类代码:
from numpy import *"""Code for hierarchical clustering, modified from Programming Collective Intelligence by Toby Segaran (O'Reilly Media 2007, page 33). """class cluster_node: def __init__(self,vec,left=None,right=None,distance=0.0,id=None,count=1): self.left=left self.right=right self.vec=vec self.id=id self.distance=distance self.count=count #only used for weighted average def L2dist(v1,v2): return sqrt(sum((v1-v2)**2))def L1dist(v1,v2): return sum(abs(v1-v2))# def Chi2dist(v1,v2):# return sqrt(sum((v1-v2)**2))def hcluster(features,distance=L2dist): #cluster the rows of the "features" matrix distances={} currentclustid=-1 # clusters are initially just the individual rows clust=[cluster_node(array(features[i]),id=i) for i in range(len(features))] while len(clust)>1: lowestpair=(0,1) closest=distance(clust[0].vec,clust[1].vec) # loop through every pair looking for the smallest distance for i in range(len(clust)): for j in range(i+1,len(clust)): # distances is the cache of distance calculations if (clust[i].id,clust[j].id) not in distances: distances[(clust[i].id,clust[j].id)]=distance(clust[i].vec,clust[j].vec) d=distances[(clust[i].id,clust[j].id)] if d<closest: closest=d lowestpair=(i,j) # calculate the average of the two clusters mergevec=[(clust[lowestpair[0]].vec[i]+clust[lowestpair[1]].vec[i])/2.0 \ for i in range(len(clust[0].vec))] # create the new cluster newcluster=cluster_node(array(mergevec),left=clust[lowestpair[0]], right=clust[lowestpair[1]], distance=closest,id=currentclustid) # cluster ids that weren't in the original set are negative currentclustid-=1 del clust[lowestpair[1]] del clust[lowestpair[0]] clust.append(newcluster) return clust[0]def extract_clusters(clust,dist): # extract list of sub-tree clusters from hcluster tree with distance<dist clusters = {} if clust.distance<dist: # we have found a cluster subtree return [clust] else: # check the right and left branches cl = [] cr = [] if clust.left!=None: cl = extract_clusters(clust.left,dist=dist) if clust.right!=None: cr = extract_clusters(clust.right,dist=dist) return cl+cr def get_cluster_elements(clust): # return ids for elements in a cluster sub-tree if clust.id>=0: # positive id means that this is a leaf return [clust.id] else: # check the right and left branches cl = [] cr = [] if clust.left!=None: cl = get_cluster_elements(clust.left) if clust.right!=None: cr = get_cluster_elements(clust.right) return cl+crdef printclust(clust,labels=None,n=0): # indent to make a hierarchy layout for i in range(n): print (' '), if clust.id<0: # negative id means that this is branch print ('-') else: # positive id means that this is an endpoint if labels==None: print (clust.id) else: print (labels[clust.id]) # now print the right and left branches if clust.left!=None: printclust(clust.left,labels=labels,n=n+1) if clust.right!=None: printclust(clust.right,labels=labels,n=n+1)def getheight(clust): # Is this an endpoint? Then the height is just 1 if clust.left==None and clust.right==None: return 1 # Otherwise the height is the same of the heights of # each branch return getheight(clust.left)+getheight(clust.right)def getdepth(clust): # The distance of an endpoint is 0.0 if clust.left==None and clust.right==None: return 0 # The distance of a branch is the greater of its two sides # plus its own distance return max(getdepth(clust.left),getdepth(clust.right))+clust.distance
0 0
- 第27节--层次聚类(下)
- 第26节--层次聚类(上)
- 层次聚类(1)
- 层次聚类(二)
- 第20条:类层次优于标签
- 系统聚类(层次聚类)
- 聚类分析(四)层次聚类算法
- 层次聚类方法(Hierarchical Clustering)
- 层次凝聚聚类算法(HAC)
- 层次聚类算法(一)
- 层次聚类算法(二)
- 层次聚类算法(三)
- 层次算法聚类(四)
- Hierarchical Clustering(层次聚类)
- 机器学习(层次聚类)
- 聚类算法(二)--层次聚类法
- MATLAB 层次聚类
- 层次聚类算法
- laplacian,degree,adjacency and oriented incidence matrix, differential and laplacian coordinates
- int型整数的最大值和最小值
- Android产品研发(十)-->尽量不使用静态变量保存数据
- akka demo
- 初涉入嵌入式开发!
- 第27节--层次聚类(下)
- A. Grasshopper And the String
- android 标题栏,状态栏和导航栏的区别
- 设计模式之单例模式(Singleton)
- 面向对象三之call和apply的区别
- 开源许可证GPL、BSD、MIT、Mozilla、Apache和LGPL的区别
- (十五)java多线程之并发集合ArrayBlockingQueue
- linux搭建svn版本控制器
- 算法 排序算法之插入排序