社区发现评估指标-NMI

来源:互联网 发布:淘宝欢迎语大全 编辑:程序博客网 时间:2024/05/23 12:51

1、介绍
NMI(Normalized Mutual Information)标准化互信息,常用在聚类中,度量两个聚类结果的相近程度。是社区发现(community detection)的重要衡量指标,基本可以比较客观地评价出一个社区划分与标准划分之间相比的准确度。NMI的值域是0到1,越高代表划分得越准。

2、python代码

# coding=utf-8import numpy as npimport mathdef NMI(A,B):    # len(A) should be equal to len(B)    total = len(A)    A_ids = set(A)    B_ids = set(B)    #Mutual information    MI = 0    eps = 1.4e-45    for idA in A_ids:        for idB in B_ids:            idAOccur = np.where(A==idA)            idBOccur = np.where(B==idB)            idABOccur = np.intersect1d(idAOccur,idBOccur)            px = 1.0*len(idAOccur[0])/total            py = 1.0*len(idBOccur[0])/total            pxy = 1.0*len(idABOccur)/total            MI = MI + pxy*math.log(pxy/(px*py)+eps,2)    # Normalized Mutual information    Hx = 0    for idA in A_ids:        idAOccurCount = 1.0*len(np.where(A==idA)[0])        Hx = Hx - (idAOccurCount/total)*math.log(idAOccurCount/total+eps,2)    Hy = 0    for idB in B_ids:        idBOccurCount = 1.0*len(np.where(B==idB)[0])        Hy = Hy - (idBOccurCount/total)*math.log(idBOccurCount/total+eps,2)    MIhat = 2.0*MI/(Hx+Hy)    return MIhatif __name__ == '__main__':    A = np.array([1,1,1,1,1,1,2,2,2,2,2,2,3,3,3,3,3])    B = np.array([1,2,1,1,1,1,1,2,2,2,2,3,1,1,3,3,3])    print (NMI(A,B))

结果:0.36456