社团划分——有向图的Label Propagation算法

来源：互联网发布：淘宝whoo小样是真的吗编辑：程序博客网时间：2024/05/16 14:39

在博文社区划分——Label Propagation中，介绍了Label Propagation社区划分算法的基本原理，基本的Label Propagation算法是针对无向图的社区划分算法。

一、基本Label Propagation算法原理

对于网络中的每一个节点，在初始阶段，Label Propagation算法对每一个节点一个唯一的标签，在每一个迭代的过程中，每一个节点根据与其相连的节点所属的标签改变自己的标签，更改的原则是选择与其相连的节点中所属标签最多的社区标签为自己的社区标签，这便是标签传播的含义。随着社区标签的不断传播，最终紧密连接的节点将有共同的标签。

其中，标签的异步更新方式如下：

C x (t) = f (C x i 1 (t), \dots, C x i m (t), C x i (m + 1) (t - 1), \dots, C x i k (t - 1))

Label Propagation算法的过程如下：

对网络中的每一节点初始化其所属社区标签，如对于节点x，初始化其社区标签为Cx(0)=x；
设置代数t；
对于网络中的节点设置其遍历顺序和节点的集合X；
对于每一个节点x∈X，令Cx(t)=f(Cxi1(t),⋯,Cxim(t),Cxi(m+1)(t−1),⋯,Cxik(t−1))；
判断是否可以迭代结束，如果否，则设置t=t+1，重新遍历。

二、有向图的Label Propagation算法

1、有向图

有向图是指图中的边是带有方向的图。对于有向图，每两个节点之间的边的条数是两条，分别为流出的边和流入的边，其流出边的总数为出度，流入边的总数为入度，如下图的有向图：

这里写图片描述
(图片来自百度百科)

对于节点5，其出度为2，入度也为2。对于更多的有向图的知识，可参阅相关图论的书。

2、对于Label Propagation算法的修正

要使得Label Propagation算法能够求解有向图的社区划分，问题即变为如何将有向图转换成无向图。即如何定义有向图中两个节点之间的边的权重。对于这个问题，设计了如下的公式：

w i, j = α λ i, j + β λ j, i

其中wi,j表示的是节点j对于节点i的权重，λi,j表示的是节点i到节点j的权重，λj,i表示的是节点j到节点i的权重。通过参数α和参数β可以调节不同的权重比例。

通过如上的办法将有向图的Label Propagation算法转换成无向图的Label Propagation算法进行求解。

三、实验

对于如下的数据：

0   2   12   0   20   3   23   0   10   4   34   0   10   5   25   0   11   2   32   1   11   4   54   1   21   7   17   1   42   4   24   2   22   5   95   2   72   6   16   2   43   7   17   3   54   10  110  4   45   7   17   5   25   11  111  5   26   7   37   6   76   11  511  6   28   9   19   8   68   10  410  8   28   11  211  8   18   14  514  8   38   15  815  8   59   12  212  9   19   14  114  9   210  11  1011  10  110  12  212  10  310  13  913  10  810  14  814  10  711  13  113  11  4

程序源码如下：

###################################### Author:zhaozhiyong# Date:20160602# Fun:Label Propagation#####################################import stringdef loadData(filePath):    f = open(filePath)    vector_dict = {}    edge_dict_out = {}#out    edge_dict_in = {}#in    for line in f.readlines():        lines = line.strip().split("\t")    if lines[0] not in vector_dict:        vector_dict[lines[0]] = string.atoi(lines[0])    if lines[1] not in vector_dict:        vector_dict[lines[1]] = string.atoi(lines[1])    if lines[0] not in edge_dict_out:        edge_list = []        if len(lines) == 3:            edge_list.append(lines[1] + ":" + lines[2])        edge_dict_out[lines[0]] = edge_list    else:        edge_list = edge_dict_out[lines[0]]        if len(lines) == 3:            edge_list.append(lines[1] + ":" + lines[2])        edge_dict_out[lines[0]] = edge_list    if lines[1] not in edge_dict_in:        edge_list = []                if len(lines) == 3:                        edge_list.append(lines[0] + ":" + lines[2])                edge_dict_in[lines[1]] = edge_list    else:        edge_list = edge_dict_in[lines[1]]                if len(lines) == 3:                        edge_list.append(lines[0] + ":" + lines[2])                edge_dict_in[lines[1]] = edge_list    f.close()    return vector_dict, edge_dict_out, edge_dict_indef get_max_community_label(vector_dict, adjacency_node_list):    label_dict = {}    # generate the label_dict    for node in adjacency_node_list:        node_id_weight = node.strip().split(":")        node_id = node_id_weight[0]        node_weight = float(node_id_weight[1])        if vector_dict[node_id] not in label_dict:            label_dict[vector_dict[node_id]] = node_weight        else:            label_dict[vector_dict[node_id]] += node_weight    # find the max label    sort_list = sorted(label_dict.items(), key = lambda d: d[1], reverse=True)    return sort_list[0][0]def check(vector_dict, edge_dict):    #for every node    for node in vector_dict.keys():        adjacency_node_list = edge_dict[node]        node_label = vector_dict[node]#suject to         label_check = {}        for ad_node in adjacency_node_list:            node_id_weight = ad_node.strip().split(":")            node_id = node_id_weight[0]        node_weight = node_id_weight[1]            if vector_dict[node_id] not in label_check:                label_check[vector_dict[node_id]] = float(node_weight)            else:                label_check[vector_dict[node_id]] += float(node_weight)        #print label_check        sort_list = sorted(label_check.items(), key = lambda d: d[1], reverse=True)        if node_label == sort_list[0][0]:            continue        else:            return 0    return 1    def label_propagation(vector_dict, edge_dict_out, edge_dict_in):    #rebuild edge_dict    edge_dict = {}    for node in vector_dict.iterkeys():    out_list = edge_dict_out[node]    in_list = edge_dict_in[node]    #print "node:", node    #print "out_list:", out_list    #print "in_list:", in_list    #print "------------------------------------------------"        out_dict = {}    for out_x in out_list:        out_xs = out_x.strip().split(":")        if out_xs[0] not in out_dict:            out_dict[out_xs[0]] = float(out_xs[1])    in_dict = {}    for in_x in in_list:        in_xs = in_x.strip().split(":")        if in_xs[0] not in in_dict:            in_dict[in_xs[0]] = float(in_xs[1])    #print "out_dict:", out_dict    #print "in_dict:", in_dict    last_list = []    for x in out_dict.iterkeys():        out_x = out_dict[x]        in_x = 0.0        if x in in_dict:            in_x = in_dict.pop(x)        result = out_x + 0.5 * in_x        last_list.append(x + ":" + str(result))    if not in_dict:        for x in in_dict.iterkeys():            in_x = in_dict[x]            result = 0.5 * in_x            last_list.append(x + ":" + str(result))    #print "last_list:", last_list    if node not in edge_dict:        edge_dict[node] = last_list    #initial, let every vector belongs to a community    t = 0    #for every node in a random order    while True:        if (check(vector_dict, edge_dict) == 0):            t = t+1            print "----------------------------------------"            print "iteration: ", t            for node in vector_dict.keys():                adjacency_node_list = edge_dict[node]                vector_dict[node] = get_max_community_label(vector_dict, adjacency_node_list)            print vector_dict        else:            break    return vector_dictif __name__ == "__main__":    vector_dict, edge_dict_out, edge_dict_in = loadData("./cd_data.txt")    print vector_dict    print edge_dict_out    print edge_dict_in    #print "original community: ", vector_dict    vec_new = label_propagation(vector_dict, edge_dict_out, edge_dict_in)    print "---------------------------------------------------------"    print "the final result: "    for key in vec_new.keys():        print str(key) + " ---> " + str(vec_new[key])

最终的结果：

这里写图片描述

程序和数据的github地址

0 0