Python 使用nltk获取TF-IDF
来源:互联网 发布:oracle连接mysql详解 编辑:程序博客网 时间:2024/05/29 02:39
#!/usr/bin/python # -*- coding: utf-8 -*-'''Created on 2015-1-19@author: beyondzhou@name: explore_google_tfidf.py'''# Querying Google+ data with TF-IDFimport jsonimport nltk# Load in human language data from wherever you've saved itDATA = r'E:\eclipse\Google\dFile\107033731246200681024.json'data = json.loads(open(DATA).read())# Provide your own query terms hereQUERY_TERMS = ['best']activities = [activity['object']['content'].lower().split() \ for activity in data \ if activity['object']['content'] != ""]#print activities,# TextCollection provides tf, idf, and tf_idf abstractions so# that we don't have to maintan/compute them ourselvestc = nltk.TextCollection(activities)relevant_activities = []for idx in range(len(activities)): #print 'idx:', idx score = 0 for term in [t.lower() for t in QUERY_TERMS]: #print 'term:', term #print 'activities[idx]:', activities[idx] score += tc.tf_idf(term, activities[idx]) if score > 0: relevant_activities.append({'score':score, 'title':data[idx]['title'], 'url':data[idx]['url']})#print relevant_activities,# Sort by score and display resultsrelevant_activities = sorted(relevant_activities, key=lambda p: p['score'], reverse=True)for activity in relevant_activities: print activity['title'] print '\tLink: %s' % (activity['url'], ) print '\tScore: %s' % (activity['score'], ) print
Now on Medium--the Best of O'Reilly Radar: http://bit.ly/133U4wb Our latest thinking on the big ideas...Link: https://plus.google.com/107033731246200681024/posts/LzTHAvJsDZ9Score: 0.142631571496The best definition of Freudian psychoanalysis I've ever seen, from poet W.H. Auden:"...he merely ...Link: https://plus.google.com/107033731246200681024/posts/ZE3cDmqLXnNScore: 0.0413424844915Can We Use Data to Make Better Regulations?Evgeny Morozov either misunderstands or misrepresents the...Link: https://plus.google.com/107033731246200681024/posts/gboAUahQwuZScore: 0.0156165954192
0 0
- Python 使用nltk获取TF-IDF
- 使用Python爬取十篇新闻统计TF-IDF
- python 使用sklearn计算TF-IDF权重
- TF-IDF计算 Python
- 利用lucene获取tf-idf
- 利用lucene获取tf-idf
- [python] 使用scikit-learn工具计算文本TF-IDF值
- [python] 使用scikit-learn工具计算文本TF-IDF值
- python使用tf-idf法判断文本关键词
- python实现TF-IDF算法
- 使用tf-idf文本分类
- TF-IDF原理及使用
- TF-IDF原理及使用
- TF-IDF原理及使用
- Lucene获取TF、IDF等信息
- TF-IDF算法解析与Python实现
- TF/IDF
- TF-IDF
- Longest Consecutive Sequence Leetcode Python
- Integer.parseInt(s, radix)
- 位运算
- (XCTest.h file not found)cocoapods引入Specta库之后报XCTest.h文件无法找到
- Git book 读书笔记 (二) -- 创建自己的仓库
- Python 使用nltk获取TF-IDF
- 好老师,什么样?
- Implement strStr()
- 《人人都想当经理》--自序
- 多态
- BZOJ 3028 食物 母函数
- Mainloop.c (1) 函数MainLoop 和PQExpBufferData,_psqlSettings
- Analyzing Specific Performance Problems
- Objective-C基础 (基本语法)