Chinese Whispers 聚类算法
来源:互联网 发布:晋中市教育网络平台 编辑:程序博客网 时间:2024/06/01 09:14
Chinese Whispers 聚类算法用于当你不知道有多少个类时。他的基本算法步骤是:
1,对于所有节点v,都赋值一个初始的类class(vi)=i
2,随机选取一个节点vt,找到v所有的临接节点,对临接节点所属的类进行打分。例如一个节点1的临接节点有2,3,4,5,分别属于a,b,c,b类别,边1-2,1-3,1-4,1-5的权值都为1,那么类a的得分就是1,类b得分2,类c得分1
3,将得分最高的类别赋值给vt
4,返回2
下面上dlib的代码进行解析:
inline unsigned long chinese_whispers ( const std::vector<ordered_sample_pair>& edges, std::vector<unsigned long>& labels, const unsigned long num_iterations, dlib::rand& rnd ) { // make sure requires clause is not broken,传进来的边集需要排好序 DLIB_ASSERT(is_ordered_by_index(edges), "\t unsigned long chinese_whispers()" << "\n\t Invalid inputs were given to this function" ); labels.clear(); if (edges.size() == 0) return 0; std::vector<std::pair<unsigned long, unsigned long> > neighbors; find_neighbor_ranges(edges, neighbors); // Initialize the labels, each node gets a different label. labels.resize(neighbors.size()); for (unsigned long i = 0; i < labels.size(); ++i) labels[i] = i; for (unsigned long iter = 0; iter < neighbors.size()*num_iterations; ++iter) { // Pick a random node.随机挑选一个节点 const unsigned long idx = rnd.get_random_64bit_number()%neighbors.size(); // Count how many times each label happens amongst our neighbors.对节点的临接几点所属的类别进行统计打分 std::map<unsigned long, double> labels_to_counts; const unsigned long end = neighbors[idx].second; for (unsigned long i = neighbors[idx].first; i != end; ++i) { labels_to_counts[labels[edges[i].index2()]] += edges[i].distance(); } // find the most common label.找到得分最高的类,并给该节点归类 std::map<unsigned long, double>::iterator i; double best_score = -std::numeric_limits<double>::infinity(); unsigned long best_label = labels[idx]; for (i = labels_to_counts.begin(); i != labels_to_counts.end(); ++i) { if (i->second > best_score) { best_score = i->second; best_label = i->first; } } labels[idx] = best_label; } // Remap the labels into a contiguous range. First we find the // mapping.因为上述找到的类别可能不是连续的0,1,2,3...,需要对类别进行重新映射为连续的编号 std::map<unsigned long,unsigned long> label_remap; for (unsigned long i = 0; i < labels.size(); ++i) { const unsigned long next_id = label_remap.size(); if (label_remap.count(labels[i]) == 0) label_remap[labels[i]] = next_id; } // now apply the mapping to all the labels.给所有节点赋值类别 for (unsigned long i = 0; i < labels.size(); ++i) { labels[i] = label_remap[labels[i]]; } return label_remap.size(); }相关参考论文
《Chinese Whispers - an Efficient Graph Clustering Algorithm and its Application to Natural Language Processing Problems》
阅读全文
0 0
- Chinese Whispers 聚类算法
- facenet chinese whispers(face cluster)
- Chinese Whisper 人脸聚类算法实现
- **"Chinese"
- chinese
- Whispers from an old wooden chair
- 中国麻将(Chinese Mahjong, UVa 11210)【JAVA算法实现】
- 语文(Chinese)
- Chinese Zodiac
- Chinese Dragon
- XML(Chinese)
- chinese input
- chinese linux
- support chinese
- Written Chinese
- chinese.php
- android chinese
- Chinese.java
- "<Module>"的类型初始值设定项引发异常的解决办法
- 2017云栖大会·杭州峰会:《云数据·大计算:海量日志数据分析与应用》之《数据分析展现:可视化报表及嵌入应用》篇
- Java——I/O
- F
- ECMAscript6快速入门-iterator
- Chinese Whispers 聚类算法
- HTML使用JS导出Excel(五种方法)
- 帆软报表-聚合报表使用方法
- com_lar
- LVS(二)——环境的搭建
- linux系统命令行基本操作——文件管理
- set.seed()实现了可重复的随机
- spring boot初学入门-安装启动报错
- Oracle 按日分区表(数据仓库通常表形式)