【分享】Wikipedia Keyphraseness
来源:互联网 发布:2016年国家旅游数据 编辑:程序博客网 时间:2024/05/18 10:59
This dataset contains a collection of keyphraseness values for phrases extracted from Wikipedia articles. The keyphraseness value Q(s) of a phrase s is the probability that the phrase appears in a Wikipedia article as being anchor text. In total, 4,342,732 phrases are extracted from the English Wikipedia dump created on January 30, 2010. In this release, we remove the 184,979 phrases containing non-English characters. Among the remaining 4,157,753 phrases, about 1.9 million phrases have non-zero keyphraseness values. This dataset contains one text file and a readme file in zip format (about 45MB in size). Each line is a mapping: [phrase],[keyphraseness value] (e.g., jackie_chan, 0.9509918319719953).
This dataset has been used in the following 3 papers. Please refer to the papers for more details about the dataset and how the keyphraseness values can be used in various tasks (All papers can be downloaded freely from ACM digital library using the links below). This dataset is released solely for research purposes.
Please cite at least one of the following 3 papers if you use this dataset in your research.
·Chenliang Li, Aixin Sun, Jianshu Weng, Qi He. Exploiting hybrid contexts for Tweet segmentation. SIGIR 2013
·Chenliang Li, Aixin Sun, Anwitaman Datta. Twevent: segment-based event detection from tweets. CIKM 2012
·Chenliang Li, Jianshu Weng, Qi He, Yuxia Yao, Anwitaman Datta, Aixin Sun, Bu-Sung Lee. TwiNER: named entity recognition in targeted twitter stream SIGIR 2012.
数据下载:http://www.datatang.com/data/45421
数据堂-数据共享服务平台
0 0
- 【分享】Wikipedia Keyphraseness
- WikiPedia 技术架构学习分享
- WikiPedia 技术架构学习分享
- WikiPedia 技术架构学习分享
- WikiPedia 技术架构学习分享
- Wikipedia
- 红黑树 - Wikipedia
- 进Wikipedia
- NoSQL - Wikipedia
- Autoencode --wikipedia
- Wikipedia viewer
- Wikipedia:数学首页
- sourceForge, wikipedia与异形
- 数学常数(Wikipedia)
- Bit blit----From Wikipedia
- Google or Wikipedia
- Wikipedia 的财政危机
- Wikipedia词条翻译:Python
- mysql 怎样取得varchar类型的数据的最大值?
- HashMap和Hashtable异同点
- Step-By-Step Installation of RAC with RAW Datafiles on Windows 2000
- 透明flash蒲公英动画素材
- 自然的馈赠,要取之有道。
- 【分享】Wikipedia Keyphraseness
- SURF算法应用工业检测之二(原理详解)
- 关于html元素的disabled,readonly 的分析
- Git
- 映射
- C语言学习(7)---结构体
- What’s the difference between alignment, de novo assembly, and map to reference
- python的__slots__
- MyEclipse安装SVN