kaggel[6] - recommend missing links in a social network

来源:互联网 发布:java输出hello world 编辑:程序博客网 时间:2024/05/29 19:26

比赛地址:http://www.kaggle.com/c/FacebookRecruiting

数据集很简单。

训练集(train): 两列 (source_node, destination_node) ---source follow destination

测试集(test):一列(source_node)。 对每个source_node, 预测10个destination_node。(预测他follow的10个好友关系)

衡量的标准是Mean Average Precision,具体参考比赛链接。。。


先说说思路吧。由于数据简单的很,第一想法就是bfs咯,找到离source_node最近的点作为预测。第二个是可以用random walk,根据train算转移概率,然后对每个source_node可以求出稳定的时候位于每个node的概率,再排下序即可。第三个的话可以考虑下推荐的模型,毕竟最后是要找出top10的missing edge。最后的话,还是可以做成分类问题,前提条件是得创造正负样本集,然后根绝输出概率,排序。


从大家的尝试来看,第一个思路只是一个最基本的可以作为一个benchmark。第二个思路有好结果(6th),也有不好结果的。第三个思路他们主要用到的办法是edgerank,第四个的话属于各显神通吧。下面看看第一名的做法:

1. candidates selection。 他用了多个edgerank,来对每个source_node提取前30的destination_node(前后不一定有follow关系)。

2. 对每个关系对(A,B)  建立若干特征,主要有:A 是否 follow B、A和B的一些相似性特征 以及其他一些特征。

3. 构造训练集。每个样本是2中的一个关系对,如果A确实followB,那么标记为1,否则为0(就是2中某个特征)。他这里的要随机删掉4%的边,是为了使得training data更加robust,表示没有看懂==、

4. 最后就是把训练集丢到模型里去了。主要用了:MatrixNet(据说不是公开的)、GBM、RanfomForest。


最后看看别人用到的特征:

Existence of a reverse link between nodes. (1=yes/0=no)Count of forward-forward links between nodes.Count of forward-reverse links between nodes.Count of forward-bidirectional links between nodes.Count of reverse-forward links between nodes.Count of reverse-reverse links between nodes.Count of reverse-bidirectional links between nodes.Count of bidirectional-forward links between nodes.Count of bidirectional-reverse links between nodes.Count of bidirectional-bidirectional links between nodes.Count of common neighbors.Number of links ending at the node to be predicted / Number of links starting at the node to be predicted.Number of links starting at the node to be ranked.Number of links ending at the node to be ranked.Number of links ending at the node to be ranked / Number of links starting at the node to be ranked.Count of common neighbors / Count of all neighbors.Count of paths with exactly three links between nodes / Count of paths with exactly three links from node to be predicted to any node.Count of forward-forward-forward links between nodes / Count of all length three paths between nodes.Count of forward-forward-reverse links between nodes / Count of all length three paths between nodes.Count of forward-reverse-forward links between nodes / Count of all length three paths between nodes.Count of forward-reverse-reverse links between nodes / Count of all length three paths between nodes.Count of reverse-forward-forward links between nodes / Count of all length three paths between nodes.Count of reverse-forward-reverse links between nodes / Count of all length three paths between nodes.Count of reverse-forward-forward links between nodes / Count of all length three paths between nodes.Count of reverse-forward-reverse links between nodes / Count of all length three paths between nodes.Count of reverse-reverse-forward links between nodes / Count of all length three paths between nodes.Count of reverse-reverse-reverse links between nodes / Count of all length three paths between nodes.Average length of all unique paths from a node to its immediate successors.Average length of all unique paths from a node to its immediate predecessors.


0 0
原创粉丝点击
热门问题 老师的惩罚 人脸识别 我在镇武司摸鱼那些年 重生之率土为王 我在大康的咸鱼生活 盘龙之生命进化 天生仙种 凡人之先天五行 春回大明朝 姑娘不必设防,我是瞎子 win7用户已锁定怎么办 电脑账户被锁定怎么办 win7状态不可用怎么办 电脑的本地连接不见了怎么办 win7电脑没有本地连接怎么办 win7字体换不了怎么办 电脑没系统了怎么办 电脑系统没有了怎么办 系统调用区域小怎么办 win8.1网络受限怎么办 电脑连wifi受限怎么办 鼠标不好用了怎么办 手机指纹不好使怎么办 手机录像不好使怎么办 手机1个卡不好使怎么办 手机收藏不好使怎么办 window10连不上网怎么办 电脑没有家庭组怎么办 usb共享网络出错怎么办 电脑共享有密码怎么办 win10连不上网怎么办 w10网络重置了怎么办 点击网络重置后怎么办? 电脑启动找不到硬盘怎么办 电脑开机找不到硬盘怎么办 电脑找不到宽带连接怎么办 电脑文件找不到了怎么办 电脑找不到手机热点怎么办 win7电脑没有网络适配器怎么办 手机计算机桌面找不到怎么办 手机忘记开锁密码怎么办 电脑没网感叹号怎么办 本地连接2没有了怎么办 电脑上找不到本地连接怎么办 网络无访问权限怎么办 电脑无网络访问怎么办 xp连不上网怎么办 xp系统本地连接不见了怎么办 电脑xp系统本地连接怎么办 xp系统本地连接失败怎么办 xp系统找不到本地连接怎么办