挖掘DBLP作者合作关系,FP-Growth算法实践(1):从DBLP数据集中提取目标信息(会议、作者等)
来源:互联网 发布:黄蜂vs活塞数据 编辑:程序博客网 时间:2024/04/29 14:48
首先从官网下载DBLP数据集http://dblp.uni-trier.de/xml/只需下载 dblp.xml.gz 解压后得到1G多dblp.xml文件!文件略大。
从原始数据中提取样本:
r=open("dblp.xml","r")w=open("dblpExample.xml","w")for i in range(30):print "extract line", ic=r.readline()w.write(c)r.close()w.close()最终结果:
<?xml version="1.0" encoding="ISO-8859-1"?><!DOCTYPE dblp SYSTEM "dblp.dtd"><dblp><article mdate="2011-01-11" key="journals/acta/Saxena96"><author>Sanjeev Saxena</author><title>Parallel Integer Sorting and Simulation Amongst CRCW Models.</title><pages>607-619</pages><year>1996</year><volume>33</volume><journal>Acta Inf.</journal><number>7</number><url>db/journals/acta/acta33.html#Saxena96</url><ee>http://dx.doi.org/10.1007/BF03036466</ee></article><article mdate="2011-01-11" key="journals/acta/Simon83">...</article>...</dblp>
发现没用,因为只能看一种情况。下面采用另一种方法:
由于只提取如下会议:SDM, ICDM, ECML--PKDD, PAKDD, WSDM, DMKD, TKDE, KDD Explorations, ACM Trans. On KDD, CVPR, ICML, NIPS, COLT、CVPR、SIGIR、SIGKDD 十六个会议,至少从2000年至今的所有数据。
看一下SDM:
<inproceedings mdate="2014-02-12" key="conf/sdm/HanN08"><author>Shuguo Han</author><author>Wee Keong Ng</author><title>Preemptive Measures against Malicious Party in Privacy-Preserving Data Mining.</title><pages>375-386</pages><year>2008</year><booktitle>SDM</booktitle><ee>http://dx.doi.org/10.1137/1.9781611972788.34</ee><crossref>conf/sdm/2008</crossref><url>db/conf/sdm/sdm2008.html#HanN08</url></inproceedings><inproceedings mdate="2015-12-30" key="conf/sdm/LiGGDZ15"><author>Kang Li</author><author>Jing Gao</author><author>Suxin Guo</author><author>Nan Du</author><author>Aidong Zhang</author><title>Functional Node Detection on Linked Data.</title><pages>1-9</pages><year>2015</year><booktitle>SDM</booktitle><ee>http://dx.doi.org/10.1137/1.9781611974010.1</ee><crossref>conf/sdm/2015</crossref><url>db/conf/sdm/sdm2015.html#LiGGDZ15</url></inproceedings>
看一下ICDM:
<inproceedings mdate="2014-09-17" key="conf/icdm/LazarevicKKKT03"><author>Aleksandar Lazarevic</author><author>Ramdev Kanapady</author><author>Chandrika Kamath</author><author>Vipin Kumar</author><author>Kumar K. Tamma</author><title>Localized Prediction of Continuous Target Variables Using Hierarchical Clustering.</title><pages>139-146</pages><year>2003</year><crossref>conf/icdm/2003</crossref><booktitle>ICDM</booktitle><ee>http://dx.doi.org/10.1109/ICDM.2003.1250913</ee><ee>http://doi.ieeecomputersociety.org/10.1109/ICDM.2003.1250913</ee><url>db/conf/icdm/icdm2003.html#LazarevicKKKT03</url></inproceedings><inproceedings mdate="2014-09-17" key="conf/icdm/CampagnaP09"><author>Andrea Campagna</author><author>Rasmus Pagh</author><title>Finding Associations and Computing Similarity via Biased Pair Sampling.</title><pages>61-70</pages><year>2009</year><booktitle>ICDM</booktitle><ee>http://dx.doi.org/10.1109/ICDM.2009.35</ee><ee>http://doi.ieeecomputersociety.org/10.1109/ICDM.2009.35</ee><crossref>conf/icdm/2009</crossref><url>db/conf/icdm/icdm2009.html#CampagnaP09</url></inproceedings>
单独看ECML-PKDD:
<inproceedings mdate="2013-08-30" key="conf/pkdd/TomasevM13a"><author>Nenad Tomasev</author><author>Dunja Mladenic</author><title>Image Hub Explorer: Evaluating Representations and Metrics for Content-Based Image Retrieval and Object Recognition.</title><pages>637-640</pages><year>2013</year><booktitle>ECML/PKDD (3)</booktitle><ee>http://dx.doi.org/10.1007/978-3-642-40994-3_44</ee><crossref>conf/pkdd/2013-3</crossref><url>db/conf/pkdd/pkdd2013-3.html#TomasevM13a</url></inproceedings><inproceedings mdate="2015-08-30" key="conf/pkdd/BudhathokiV15"><author>Kailash Budhathoki</author><author>Jilles Vreeken</author><title>The Difference and the Norm - Characterising Similarities and Differences Between Databases.</title><pages>206-223</pages><year>2015</year><booktitle>ECML/PKDD (2)</booktitle><ee>http://dx.doi.org/10.1007/978-3-319-23525-7_13</ee><crossref>conf/pkdd/2015-2</crossref><url>db/conf/pkdd/pkdd2015-2.html#BudhathokiV15</url></inproceedings>
单独看PAKDD:
<inproceedings mdate="2008-05-15" key="conf/pakdd/HanN08"><author>Shuguo Han</author><author>Wee Keong Ng</author><title>Privacy-Preserving Linear Fisher Discriminant Analysis.</title><pages>136-147</pages><year>2008</year><booktitle>PAKDD</booktitle><ee>http://dx.doi.org/10.1007/978-3-540-68125-0_14</ee><crossref>conf/pakdd/2008</crossref><url>db/conf/pakdd/pakdd2008.html#HanN08</url></inproceedings><inproceedings mdate="2005-05-18" key="conf/pakdd/BoWJ05"><author>Liefeng Bo</author><author>Ling Wang</author><author>Licheng Jiao</author><title>Training Support Vector Machines Using Greedy Stagewise Algorithm.</title><pages>632-638</pages><year>2005</year><crossref>conf/pakdd/2005</crossref><booktitle>PAKDD</booktitle><ee>http://dx.doi.org/10.1007/11430919_73</ee><url>db/conf/pakdd/pakdd2005.html#BoWJ05</url></inproceedings>
单独看WSDM:
<inproceedings mdate="2011-01-31" key="conf/wsdm/Kawamae11a"><author>Noriaki Kawamae</author><title>Predicting future reviews: sentiment analysis models for collaborative filtering.</title><pages>605-614</pages><year>2011</year><booktitle>WSDM</booktitle><ee>http://doi.acm.org/10.1145/1935826.1935911</ee><crossref>conf/wsdm/2011</crossref><url>db/conf/wsdm/wsdm2011.html#Kawamae11a</url></inproceedings><inproceedings mdate="2015-01-29" key="conf/wsdm/TangCAL15"><author>Jiliang Tang</author><author>Shiyu Chang</author><author>Charu C. Aggarwal</author><author>Huan Liu</author><title>Negative Link Prediction in Social Media.</title><pages>87-96</pages><year>2015</year><booktitle>WSDM</booktitle><ee>http://doi.acm.org/10.1145/2684822.2685295</ee><crossref>conf/wsdm/2015</crossref><url>db/conf/wsdm/wsdm2015.html#TangCAL15</url></inproceedings>
单独看DMKD:
<inproceedings mdate="2003-04-04" key="conf/dmkd/KantarciogluC02"><author>Murat Kantarcioglu</author><author>Chris Clifton</author><title>Privacy-Preserving Distributed Mining of Association Rules on Horizontally Partitioned Data.</title><year>2002</year><booktitle>DMKD</booktitle><ee>http://www.bell-labs.com/user/minos/DMKD02/Papers/kantarcioglu.pdf</ee><url>db/conf/dmkd/dmkd2002.html#KantarciogluC02</url></inproceedings><inproceedings mdate="2003-04-04" key="conf/dmkd/ZhuFAE02"><author>Xingquan Zhu</author><author>Jianping Fan</author><author>Walid G. Aref</author><author>Ahmed K. Elmagarmid</author><title>ClassMiner: Mining Medical Video Content Structure and Events Towards Efficient Access and Scalable Skimming.</title><year>2002</year><booktitle>DMKD</booktitle><ee>http://www.bell-labs.com/user/minos/DMKD02/Papers/zhu.pdf</ee><url>db/conf/dmkd/dmkd2002.html#ZhuFAE02</url></inproceedings>
忽略TKDE,KDD Explorations,ACM Trans. On KDD。
单独看CVPR:
<inproceedings mdate="2014-07-31" key="conf/cvpr/BrandOP97"><author>Matthew Brand</author><author>Nuria Oliver</author><author>Alex Pentland</author><title>Coupled hidden Markov models for complex action recognition.</title><pages>994-999</pages><year>1997</year><crossref>conf/cvpr/1997</crossref><booktitle>CVPR</booktitle><ee>http://dx.doi.org/10.1109/CVPR.1997.609450</ee><ee>http://doi.ieeecomputersociety.org/10.1109/CVPR.1997.609450</ee><url>db/conf/cvpr/cvpr1997.html#BrandOP97</url></inproceedings><inproceedings mdate="2014-07-30" key="conf/cvpr/LiGK09"><author>Yan Li</author><author>Leon Gu</author><author>Takeo Kanade</author><title>A robust shape model for multi-view car alignment.</title><pages>2466-2473</pages><year>2009</year><booktitle>CVPR</booktitle><ee>http://dx.doi.org/10.1109/CVPRW.2009.5206799</ee><ee>http://doi.ieeecomputersociety.org/10.1109/CVPRW.2009.5206799</ee><crossref>conf/cvpr/2009</crossref><url>db/conf/cvpr/cvpr2009.html#LiGK09</url></inproceedings>
单独看ICML:
<inproceedings mdate="2013-11-25" key="journals/jmlr/WilsonFT12"><author>Aaron Wilson</author><author>Alan Fern</author><author>Prasad Tadepalli</author><title>Transfer Learning in Sequential Decision Problems: A Hierarchical Bayesian Approach.</title><pages>217-227</pages><booktitle>ICML Unsupervised and Transfer Learning</booktitle><crossref>conf/icml/2011utl</crossref><year>2012</year><ee>http://jmlr.csail.mit.edu/proceedings/papers/v27/wilson12a.html</ee><url>db/journals/jmlr/jmlrp27.html#WilsonFT12</url></inproceedings>
<inproceedings mdate="2013-12-04" key="journals/jmlr/GlowackaDS12"><author>Dorota Glowacka</author><author>Louis Dorard</author><author>John Shawe-Taylor</author><title>Preface.</title><booktitle>ICML On-line Trading of Exploration and Exploitation</booktitle><year>2012</year><crossref>conf/icml/2011otee</crossref><ee>http://jmlr.csail.mit.edu/proceedings/papers/v26/glowacka12a/glowacka12a.pdf</ee><url>db/journals/jmlr/jmlrp26.html#GlowackaDS12</url></inproceedings>
单独看NIPS:
<inproceedings mdate="2013-11-25" key="journals/jmlr/ZhengG10"><author>Cheng Zheng</author><author>Zhi Geng</author><title>Reverse Engineering of Asynchronous Boolean Networks.</title><pages>237-248</pages><booktitle>NIPS Causality: Objectives and Assessment</booktitle><year>2010</year><crossref>conf/nips/2008coa</crossref><ee>http://www.jmlr.org/proceedings/papers/v6/zheng10a.html</ee><url>db/journals/jmlr/jmlrp6.html#ZhengG10</url></inproceedings>
<inproceedings mdate="2013-11-25" key="journals/jmlr/WhiteCL11"><author>Halbert White</author><author>Karim Chalak</author><author>Xun Lu</author><title>Linking Granger Causality and the Pearl Causal Model with Settable Systems.</title><pages>1-29</pages><booktitle>NIPS Mini-Symposium on Causality in Time Series</booktitle><year>2011</year><crossref>conf/nips/2009mscts</crossref><ee>http://www.jmlr.org/proceedings/papers/v12/white11.htm</ee><url>db/journals/jmlr/jmlrp12.html#WhiteCL11</url></inproceedings>
单独看COLT:
<inproceedings mdate="2013-11-25" key="journals/jmlr/BalcanCIW12"><author>Maria-Florina Balcan</author><author>Florin Constantin</author><author>Satoru Iwata</author><author>Lei Wang</author><title>Learning Valuation Functions.</title><pages>4.1-4.24</pages><booktitle>COLT</booktitle><year>2012</year><crossref>conf/colt/2012</crossref><ee>http://www.jmlr.org/proceedings/papers/v23/balcan12b/balcan12b.pdf</ee><url>db/journals/jmlr/jmlrp23.html#BalcanCIW12</url></inproceedings>
单独看SIGIR:
<inproceedings mdate="2012-08-15" key="conf/sigir/RaveendranC12"><author>Gobaan Raveendran</author><author>Charles L. A. Clarke</author><title>Lightweight contrastive summarization for news comment mining.</title><pages>1103-1104</pages><year>2012</year><booktitle>SIGIR</booktitle><ee>http://doi.acm.org/10.1145/2348283.2348490</ee><crossref>conf/sigir/2012</crossref><url>db/conf/sigir/sigir2012.html#RaveendranC12</url></inproceedings><inproceedings mdate="2012-09-13" key="conf/sigir/KraftB84"><author>Donald H. Kraft</author><author>Duncan A. Buell</author><title>Advances in a Bayesian Decision Model of User Stopping Behaviour for Scanning the Output of an Information Retrieval System.</title><pages>421-433</pages><year>1984</year><booktitle>SIGIR</booktitle><url>db/conf/sigir/sigir84.html#KraftB84</url><ee>http://dl.acm.org/citation.cfm?id=636833</ee></inproceedings>
单独看SIGKDD:
<inproceedings mdate="2010-08-09" key="conf/kdd/FeiH10"><author>Hongliang Fei</author><author>Jun Huan</author><title>Boosting with structure information in the functional space: an application to graph classification.</title><pages>643-652</pages><year>2010</year><booktitle>KDD</booktitle><ee>http://doi.acm.org/10.1145/1835804.1835886</ee><crossref>conf/kdd/2010</crossref><url>db/conf/kdd/kdd2010.html#FeiH10</url></inproceedings><inproceedings mdate="2015-08-10" key="conf/kdd/OuCWW015"><author>Mingdong Ou</author><author>Peng Cui</author><author>Fei Wang</author><author>Jun Wang</author><author>Wenwu Zhu 0001</author><title>Non-transitive Hashing with Latent Similarity Components.</title><pages>895-904</pages><year>2015</year><booktitle>KDD</booktitle><ee>http://doi.acm.org/10.1145/2783258.2783283</ee><crossref>conf/kdd/2015</crossref><url>db/conf/kdd/kdd2015.html#OuCWW015</url></inproceedings>
好吧,费了九牛二虎之力,一个个看了一遍,到底是用来干嘛的???
第一,确定哪些标签对我们找到这16个会议有用:
DBLPContentHandler.pubList = ["article", "inproceedings", "proceedings", "book", "incollection", "phdthesis", "mastersthesis", "www"]
第二,确定哪些标签是我们感兴趣的数据:
DBLPContentHandler.fieldList = ["author", "editor", "title", "booktitle", "pages", "year", "address", "journal", "volume", "number", "month", "url", "ee", "cdrom", "cite", "publisher", "note", "crossref", "isbn", "series", "school", "chapter"]
红色的标志解决第一个问题;黑色的标志解决第二个问题。
还需要注意一点,比如ICML的<booktitle>就有多种,所以,不能直接用等于号去匹配!!!
<booktitle>ICML Unsupervised and Transfer Learning</booktitle>
<booktitle>ICML On-line Trading of Exploration and Exploitation</booktitle>
下面给出提取所需要的allDB(感兴趣的信息库)和authorDB(作者信息单独一个库)的代码:
#!usr/bin/env python# -*- coding:utf-8 -*-from xml.dom.minidom import parsefileName="dblp.xml"confNameDict={"SDM":1, "ICDM":1, "ECML/PKDD":1, "PAKDD":1, "WSDM":1, "DMKD":1, "CVPR":1, "ICML":1, "NIPS":1, "COLT":1, "SIGIR":1, "KDD":1}fromYear="2000"allList=[] #"confName \t year \t title \t author1|author2|..|authorn"authorDict={} #author: [frequence, yearStart, yearEnd]if __name__=="__main__": domTree=parse(fileName) dblp=domTree.documentElement inproceedingsList=dblp.getElementsByTagName("inproceedings") for inproceedings in inproceedingsList: year=inproceedings.getElementsByTagName("year")[0] yearStr=str(year.childNodes[0].data) if yearStr<fromYear: continue print "yearStr", yearStr, "=="*20 booktitle=inproceedings.getElementsByTagName("booktitle")[0] booktitleStr=str(booktitle.childNodes[0].data) #for "<booktitle>ICML Unsupervised and Transfer Learning</booktitle>" booktitleStr=booktitleStr.split(" ")[0] if not confNameDict.has_key(booktitleStr): continue print "booktitleStr", booktitleStr, "^^"*20 #allList=[] #"confName \t year \t title \t author1|author2|..|authorn" #authorDict={} #author: [frequence, yearStart, yearEnd] allContent=booktitleStr+"\t"+yearStr+"\t" #confName \t year \t title=inproceedings.getElementsByTagName("title")[0] titleStr=str(title.childNodes[0].data) allContent+=titleStr+"\t" #title \t authorList=inproceedings.getElementsByTagName("author") for i, author in enumerate(authorList): authorStr=str(author.childNodes[0].data) allContent+=authorStr+"|" #authori| if authorDict.has_key(authorStr): authorDict[authorStr][0]+=1 if yearStr<authorDict[authorStr][1]: authorDict[authorStr][1]=yearStr elif yearStr>authorDict[authorStr][2]: authorDict[authorStr][2]=yearStr else: authorDict[authorStr]=[1, yearStr, yearStr] allList.append(allContent) allContent="\n".join(allList) wf=open("allDB.txt","w") wf.write(allContent) wf.close() authorList=sorted(authorDict.items(), lambda x, y: cmp(x[1], y[1]), reverse=True) wf=open("authorDB.txt","w") allContent="\n".join([author+"\t"+str(frequence)+"\t"+yearStart+"\t"+yearEnd for author, (frequence , yearStart, yearEnd) in authorList]) wf.write(allContent) wf.close()
这里直接使用xml.dom去解析文件,这样的方式需要将所有数据读到内存,然后构建dom树;我在服务器上跑的,用台式机的建议使用sax解析(参考:http://www.runoob.com/python/python-xml.html)。
下面是测试数据集:好吧,上出了几次都不成功;
<?xml version="1.0" encoding="ISO-8859-1"?> <!DOCTYPE dblp SYSTEM "dblp.dtd"><dblp><inproceedings mdate="2014-02-12" key="conf/sdm/HanN08"> <author>Shuguo Han</author> <author>Wee Keong Ng</author> <title>Preemptive Measures against Malicious Party in Privacy-Preserving Data Mining.</title> <pages>375-386</pages> <year>2008</year> <booktitle>SDM</booktitle> <ee>http://dx.doi.org/10.1137/1.9781611972788.34</ee> <crossref>conf/sdm/2008</crossref> <url>db/conf/sdm/sdm2008.html#HanN08</url> </inproceedings> <inproceedings mdate="2015-12-30" key="conf/sdm/LiGGDZ15"> <author>Kang Li</author> <author>Jing Gao</author> <author>Suxin Guo</author> <author>Nan Du</author> <author>Aidong Zhang</author> <title>Functional Node Detection on Linked Data.</title> <pages>1-9</pages> <year>2015</year> <booktitle>SDM</booktitle> <ee>http://dx.doi.org/10.1137/1.9781611974010.1</ee> <crossref>conf/sdm/2015</crossref> <url>db/conf/sdm/sdm2015.html#LiGGDZ15</url> </inproceedings><inproceedings mdate="2014-09-17" key="conf/icdm/LazarevicKKKT03"> <author>Aleksandar Lazarevic</author> <author>Ramdev Kanapady</author> <author>Chandrika Kamath</author> <author>Vipin Kumar</author> <author>Kumar K. Tamma</author> <title>Localized Prediction of Continuous Target Variables Using Hierarchical Clustering.</title> <pages>139-146</pages> <year>2003</year> <crossref>conf/icdm/2003</crossref> <booktitle>ICDM</booktitle> <ee>http://dx.doi.org/10.1109/ICDM.2003.1250913</ee> <ee>http://doi.ieeecomputersociety.org/10.1109/ICDM.2003.1250913</ee> <url>db/conf/icdm/icdm2003.html#LazarevicKKKT03</url> </inproceedings> <inproceedings mdate="2014-09-17" key="conf/icdm/CampagnaP09"> <author>Andrea Campagna</author> <author>Rasmus Pagh</author> <title>Finding Associations and Computing Similarity via Biased Pair Sampling.</title> <pages>61-70</pages> <year>2009</year> <booktitle>ICDM</booktitle> <ee>http://dx.doi.org/10.1109/ICDM.2009.35</ee> <ee>http://doi.ieeecomputersociety.org/10.1109/ICDM.2009.35</ee> <crossref>conf/icdm/2009</crossref> <url>db/conf/icdm/icdm2009.html#CampagnaP09</url> </inproceedings><inproceedings mdate="2013-08-30" key="conf/pkdd/TomasevM13a"> <author>Nenad Tomasev</author> <author>Dunja Mladenic</author> <title>Image Hub Explorer: Evaluating Representations and Metrics for Content-Based Image Retrieval and Object Recognition.</title> <pages>637-640</pages> <year>2013</year> <booktitle>ECML/PKDD (3)</booktitle> <ee>http://dx.doi.org/10.1007/978-3-642-40994-3_44</ee> <crossref>conf/pkdd/2013-3</crossref> <url>db/conf/pkdd/pkdd2013-3.html#TomasevM13a</url> </inproceedings> <inproceedings mdate="2015-08-30" key="conf/pkdd/BudhathokiV15"> <author>Kailash Budhathoki</author> <author>Jilles Vreeken</author> <title>The Difference and the Norm - Characterising Similarities and Differences Between Databases.</title> <pages>206-223</pages> <year>2015</year> <booktitle>ECML/PKDD (2)</booktitle> <ee>http://dx.doi.org/10.1007/978-3-319-23525-7_13</ee> <crossref>conf/pkdd/2015-2</crossref> <url>db/conf/pkdd/pkdd2015-2.html#BudhathokiV15</url> </inproceedings><inproceedings mdate="2008-05-15" key="conf/pakdd/HanN08"> <author>Shuguo Han</author> <author>Wee Keong Ng</author> <title>Privacy-Preserving Linear Fisher Discriminant Analysis.</title> <pages>136-147</pages> <year>2008</year> <booktitle>PAKDD</booktitle> <ee>http://dx.doi.org/10.1007/978-3-540-68125-0_14</ee> <crossref>conf/pakdd/2008</crossref> <url>db/conf/pakdd/pakdd2008.html#HanN08</url> </inproceedings> <inproceedings mdate="2005-05-18" key="conf/pakdd/BoWJ05"> <author>Liefeng Bo</author> <author>Ling Wang</author> <author>Licheng Jiao</author> <title>Training Support Vector Machines Using Greedy Stagewise Algorithm.</title> <pages>632-638</pages> <year>2005</year> <crossref>conf/pakdd/2005</crossref> <booktitle>PAKDD</booktitle> <ee>http://dx.doi.org/10.1007/11430919_73</ee> <url>db/conf/pakdd/pakdd2005.html#BoWJ05</url> </inproceedings><inproceedings mdate="2011-01-31" key="conf/wsdm/Kawamae11a"> <author>Noriaki Kawamae</author> <title>Predicting future reviews: sentiment analysis models for collaborative filtering.</title> <pages>605-614</pages> <year>2011</year> <booktitle>WSDM</booktitle> <ee>http://doi.acm.org/10.1145/1935826.1935911</ee> <crossref>conf/wsdm/2011</crossref> <url>db/conf/wsdm/wsdm2011.html#Kawamae11a</url> </inproceedings> <inproceedings mdate="2015-01-29" key="conf/wsdm/TangCAL15"> <author>Jiliang Tang</author> <author>Shiyu Chang</author> <author>Charu C. Aggarwal</author> <author>Huan Liu</author> <title>Negative Link Prediction in Social Media.</title> <pages>87-96</pages> <year>2015</year> <booktitle>WSDM</booktitle> <ee>http://doi.acm.org/10.1145/2684822.2685295</ee> <crossref>conf/wsdm/2015</crossref> <url>db/conf/wsdm/wsdm2015.html#TangCAL15</url> </inproceedings><inproceedings mdate="2003-04-04" key="conf/dmkd/KantarciogluC02"> <author>Murat Kantarcioglu</author> <author>Chris Clifton</author> <title>Privacy-Preserving Distributed Mining of Association Rules on Horizontally Partitioned Data.</title> <year>2002</year> <booktitle>DMKD</booktitle> <ee>http://www.bell-labs.com/user/minos/DMKD02/Papers/kantarcioglu.pdf</ee> <url>db/conf/dmkd/dmkd2002.html#KantarciogluC02</url> </inproceedings> <inproceedings mdate="2003-04-04" key="conf/dmkd/ZhuFAE02"> <author>Xingquan Zhu</author> <author>Jianping Fan</author> <author>Walid G. Aref</author> <author>Ahmed K. Elmagarmid</author> <title>ClassMiner: Mining Medical Video Content Structure and Events Towards Efficient Access and Scalable Skimming.</title> <year>2002</year> <booktitle>DMKD</booktitle> <ee>http://www.bell-labs.com/user/minos/DMKD02/Papers/zhu.pdf</ee> <url>db/conf/dmkd/dmkd2002.html#ZhuFAE02</url> </inproceedings><inproceedings mdate="2014-07-31" key="conf/cvpr/BrandOP97"> <author>Matthew Brand</author> <author>Nuria Oliver</author> <author>Alex Pentland</author> <title>Coupled hidden Markov models for complex action recognition.</title> <pages>994-999</pages> <year>1997</year> <crossref>conf/cvpr/1997</crossref> <booktitle>CVPR</booktitle> <ee>http://dx.doi.org/10.1109/CVPR.1997.609450</ee> <ee>http://doi.ieeecomputersociety.org/10.1109/CVPR.1997.609450</ee> <url>db/conf/cvpr/cvpr1997.html#BrandOP97</url> </inproceedings> <inproceedings mdate="2014-07-30" key="conf/cvpr/LiGK09"> <author>Yan Li</author> <author>Leon Gu</author> <author>Takeo Kanade</author> <title>A robust shape model for multi-view car alignment.</title> <pages>2466-2473</pages> <year>2009</year> <booktitle>CVPR</booktitle> <ee>http://dx.doi.org/10.1109/CVPRW.2009.5206799</ee> <ee>http://doi.ieeecomputersociety.org/10.1109/CVPRW.2009.5206799</ee> <crossref>conf/cvpr/2009</crossref> <url>db/conf/cvpr/cvpr2009.html#LiGK09</url> </inproceedings><inproceedings mdate="2013-11-25" key="journals/jmlr/WilsonFT12"> <author>Aaron Wilson</author> <author>Alan Fern</author> <author>Prasad Tadepalli</author> <title>Transfer Learning in Sequential Decision Problems: A Hierarchical Bayesian Approach.</title> <pages>217-227</pages> <booktitle>ICML Unsupervised and Transfer Learning</booktitle> <crossref>conf/icml/2011utl</crossref> <year>2012</year> <ee>http://jmlr.csail.mit.edu/proceedings/papers/v27/wilson12a.html</ee> <url>db/journals/jmlr/jmlrp27.html#WilsonFT12</url> </inproceedings><inproceedings mdate="2013-12-04" key="journals/jmlr/GlowackaDS12"> <author>Dorota Glowacka</author> <author>Louis Dorard</author> <author>John Shawe-Taylor</author> <title>Preface.</title> <booktitle>ICML On-line Trading of Exploration and Exploitation</booktitle> <year>2012</year> <crossref>conf/icml/2011otee</crossref> <ee>http://jmlr.csail.mit.edu/proceedings/papers/v26/glowacka12a/glowacka12a.pdf</ee> <url>db/journals/jmlr/jmlrp26.html#GlowackaDS12</url> </inproceedings><inproceedings mdate="2013-11-25" key="journals/jmlr/ZhengG10"> <author>Cheng Zheng</author> <author>Zhi Geng</author> <title>Reverse Engineering of Asynchronous Boolean Networks.</title> <pages>237-248</pages> <booktitle>NIPS Causality: Objectives and Assessment</booktitle> <year>2010</year> <crossref>conf/nips/2008coa</crossref> <ee>http://www.jmlr.org/proceedings/papers/v6/zheng10a.html</ee> <url>db/journals/jmlr/jmlrp6.html#ZhengG10</url> </inproceedings><inproceedings mdate="2013-11-25" key="journals/jmlr/WhiteCL11"> <author>Halbert White</author> <author>Karim Chalak</author> <author>Xun Lu</author> <title>Linking Granger Causality and the Pearl Causal Model with Settable Systems.</title> <pages>1-29</pages> <booktitle>NIPS Mini-Symposium on Causality in Time Series</booktitle> <year>2011</year> <crossref>conf/nips/2009mscts</crossref> <ee>http://www.jmlr.org/proceedings/papers/v12/white11.htm</ee> <url>db/journals/jmlr/jmlrp12.html#WhiteCL11</url> </inproceedings><inproceedings mdate="2013-11-25" key="journals/jmlr/BalcanCIW12"> <author>Maria-Florina Balcan</author> <author>Florin Constantin</author> <author>Satoru Iwata</author> <author>Lei Wang</author> <title>Learning Valuation Functions.</title> <pages>4.1-4.24</pages> <booktitle>COLT</booktitle> <year>2012</year> <crossref>conf/colt/2012</crossref> <ee>http://www.jmlr.org/proceedings/papers/v23/balcan12b/balcan12b.pdf</ee> <url>db/journals/jmlr/jmlrp23.html#BalcanCIW12</url> </inproceedings><inproceedings mdate="2012-08-15" key="conf/sigir/RaveendranC12"> <author>Gobaan Raveendran</author> <author>Charles L. A. Clarke</author> <title>Lightweight contrastive summarization for news comment mining.</title> <pages>1103-1104</pages> <year>2012</year> <booktitle>SIGIR</booktitle> <ee>http://doi.acm.org/10.1145/2348283.2348490</ee> <crossref>conf/sigir/2012</crossref> <url>db/conf/sigir/sigir2012.html#RaveendranC12</url> </inproceedings> <inproceedings mdate="2012-09-13" key="conf/sigir/KraftB84"> <author>Donald H. Kraft</author> <author>Duncan A. Buell</author> <title>Advances in a Bayesian Decision Model of User Stopping Behaviour for Scanning the Output of an Information Retrieval System.</title> <pages>421-433</pages> <year>1984</year> <booktitle>SIGIR</booktitle> <url>db/conf/sigir/sigir84.html#KraftB84</url> <ee>http://dl.acm.org/citation.cfm?id=636833</ee> </inproceedings><inproceedings mdate="2010-08-09" key="conf/kdd/FeiH10"> <author>Hongliang Fei</author> <author>Jun Huan</author> <title>Boosting with structure information in the functional space: an application to graph classification.</title> <pages>643-652</pages> <year>2010</year> <booktitle>KDD</booktitle> <ee>http://doi.acm.org/10.1145/1835804.1835886</ee> <crossref>conf/kdd/2010</crossref> <url>db/conf/kdd/kdd2010.html#FeiH10</url> </inproceedings> <inproceedings mdate="2015-08-10" key="conf/kdd/OuCWW015"> <author>Mingdong Ou</author> <author>Peng Cui</author> <author>Fei Wang</author> <author>Jun Wang</author> <author>Wenwu Zhu 0001</author> <title>Non-transitive Hashing with Latent Similarity Components.</title> <pages>895-904</pages> <year>2015</year> <booktitle>KDD</booktitle> <ee>http://doi.acm.org/10.1145/2783258.2783283</ee> <crossref>conf/kdd/2015</crossref> <url>db/conf/kdd/kdd2015.html#OuCWW015</url> </inproceedings></dblp>
allDB.txt部分结果:
SDM2008Preemptive Measures against Malicious Party in Privacy-Preserving Data Mining.Shuguo Han|Wee Keong Ng|SDM2015Functional Node Detection on Linked Data.Kang Li|Jing Gao|Suxin Guo|Nan Du|Aidong Zhang|ICDM2003Localized Prediction of Continuous Target Variables Using Hierarchical Clustering.Aleksandar Lazarevic|Ramdev Kanapady|Chandrika Kamath|Vipin Kumar|Kumar K. Tamma|ICDM2009Finding Associations and Computing Similarity via Biased Pair Sampling.Andrea Campagna|Rasmus Pagh|ECML/PKDD2013Image Hub Explorer: Evaluating Representations and Metrics for Content-Based Image Retrieval and Object Recognition.Nenad Tomasev|Dunja Mladenic|ECML/PKDD2015The Difference and the Norm - Characterising Similarities and Differences Between Databases.Kailash Budhathoki|Jilles Vreeken|PAKDD2008Privacy-Preserving Linear Fisher Discriminant Analysis.Shuguo Han|Wee Keong Ng|PAKDD2005Training Support Vector Machines Using Greedy Stagewise Algorithm.Liefeng Bo|Ling Wang|Licheng Jiao|WSDM2011Predicting future reviews: sentiment analysis models for collaborative filtering.Noriaki Kawamae|WSDM2015Negative Link Prediction in Social Media.Jiliang Tang|Shiyu Chang|Charu C. Aggarwal|Huan Liu|
authorDB.txt部分结果:
Shuguo Han220082008Wee Keong Ng220082008Jilles Vreeken120152015Aidong Zhang120152015Jing Gao120152015Suxin Guo120152015Fei Wang120152015Shiyu Chang120152015Nan Du120152015Wenwu Zhu 0001120152015Peng Cui120152015Huan Liu120152015Kang Li120152015Mingdong Ou120152015Charu C. Aggarwal120152015Jun Wang120152015Jiliang Tang120152015Kailash Budhathoki120152015Nenad Tomasev120132013Dunja Mladenic120132013
参考:http://ju.outofmemory.cn/entry/137734
- 挖掘DBLP作者合作关系,FP-Growth算法实践(1):从DBLP数据集中提取目标信息(会议、作者等)
- 挖掘DBLP作者合作关系,FP-Growth算法实践(2):从DBLP数据集中提取信息,三种源码(dom,sax,string)
- 挖掘DBLP作者合作关系,FP-Growth算法实践(4):挖掘每个会议的核心研究者
- 挖掘DBLP作者合作关系,FP-Growth算法实践(5):挖掘研究者合作关系
- 挖掘DBLP作者合作关系,FP-Growth算法实践(3):挖掘任务、思路简介
- 挖掘DBLP作者合作关系,FP-Growth算法实践(6):简单的总结报告
- 强烈推荐DBLP数据集,其中包含了大部分作者的论文信息,可以研究研究
- DBLP
- 数据挖掘 fp-growth 算法 频繁模式挖掘
- 数据挖掘进阶之关联规则挖掘FP-Growth算法
- 用java sax处理xml文件(DBLP数据集)
- DBLP数据解析
- DBLP数据集调研
- 数据挖掘 关联规则的FP-growth-tree(FP增长树)的python实现(二)
- 数据挖掘 关联规则的FP-growth-tree(FP增长树)的python实现 使用方法
- 网络分析,图挖掘常用数据集:dblp dataset,kdd dataset....
- 数据挖掘算法之Apriori和FP-growth
- DBLP数据构成浅析(二)
- Raspberry Pi Blink--NO delay(WiringPi)
- cmd命令
- UVA 307(p218)----Sticks
- Water's coin (高效)
- Broadcast--自定义广播
- 挖掘DBLP作者合作关系,FP-Growth算法实践(1):从DBLP数据集中提取目标信息(会议、作者等)
- Android中的intent
- #define与typedef的区别
- CF633 B 数论 阶乘末尾有几个0 二分
- 十一、初学jsp之jsp生命周期
- 我的追求
- MATLAB以MEX方式调用C代码
- 十二、初学jsp之jsp语法和指令
- Unity3D热更新<一> 学习Lua