Word Frequence Counting with NLTK
来源:互联网 发布:bf风格的淘宝店 编辑:程序博客网 时间:2024/06/05 23:07
Word Frequence Counting with NLTK
Version info
Python 2.4 or 2.5 (test with 2.7)
NLTK2.0 (downward compatibility, test with 3.2.3)
Anaconda2 4.3
Code
from nltk.book import *
text1.concordance("monstrous")
Displaying 11 of 11 matches:ong the former , one was of a most monstrous size . ... This came towards us , ON OF THE PSALMS . " Touching that monstrous bulk of the whale or ork we have rll over with a heathenish array of monstrous clubs and spears . Some were thickd as you gazed , and wondered what monstrous cannibal and savage could ever havthat has survived the flood ; most monstrous and most mountainous ! That Himmalthey might scout at Moby Dick as a monstrous fable , or still worse and more deth of Radney .'" CHAPTER 55 Of the Monstrous Pictures of Whales . I shall ere ling Scenes . In connexion with the monstrous pictures of whales , I am stronglyere to enter upon those still more monstrous stories of them which are to be foght have been rummaged out of this monstrous cabinet there is no telling . But of Whale - Bones ; for Whales of a monstrous size are oftentimes cast up dead u
text1.similar("monstrous")
imperial subtly impalpable pitiable curious abundant periloustrustworthy untoward singular lamentable few determined maddenshorrible tyrannical lazy mystifying christian exasperate
text2.similar("monstrous")
very exceedingly so heartily a great good amazingly as sweetremarkably extremely vast
text2.common_contexts(["monstrous", "very"])
a_pretty is_pretty a_lucky am_glad be_glad
text4.dispersion_plot(["citizens", "democracy", "freedom", "duties", "America"])
text3.generate()
---------------------------------------------------------------------------TypeError Traceback (most recent call last)<ipython-input-17-e0816ba18b61> in <module>()----> 1 text3.generate()TypeError: generate() takes exactly 2 arguments (1 given)
len(text3)
44764
sorted(set(text3))
[u'!', ... u'A', u'Abel', u'Abelmizraim', ... u'coffin', u'cold', ...]
len(set(text3))
2789
# average usage of each word from __future__ import divisionlen(text3) / len(set(text3))
16.050197203298673
text3.count("smote")
5
# usage percentage of a word100 * text4.count('a') / len(text4)
1.4643016433938312
fdist1 = FreqDist(text1)print(fdist1)
<FreqDist with 19317 samples and 260819 outcomes>
vocabulary1 = fdist1.keys()print(vocabulary1[:50])
[u'funereal', u'unscientific', u'divinely', u'foul', u'four', u'gag', u'prefix', u'woods', u'clotted', u'Duck', u'hanging', u'plaudits', u'woody', u'Until', u'marching', u'disobeying', u'canes', u'granting', u'advantage', u'Westers', u'insertion', u'DRYDEN', u'formless', u'Untried', u'superficially', u'vesper', u'Western', u'portentous', u'meadows', u'sinking', u'Ding', u'Spurn', u'treasuries', u'churned', u'oceans', u'powders', u'tinkerings', u'tantalizing', u'yellow', u'bolting', u'uncertain', u'stabbed', u'bringing', u'elevations', u'ferreting', u'wooded', u'songster', u'uttering', u'scholar', u'Less']
fdist1['whale']
906
fdist1.plot(50, cumulative=True)
阅读全文
0 0
- Word Frequence Counting with NLTK
- Classify Text With NLTK
- 使用NLTK计算word的相似度
- Extracting Information from Text With NLTK
- NLTK
- nltk
- Frequence statement of SQL
- PNP : Work Cound & Frequence
- VCW,Vowel-Counting-Word是什么?(元音词...)
- Using Databases with Python-Counting Organizations
- Unit 3-Lecture3: Counting with Bijections
- MSQL优化基础(frequence直方图选择率)
- Creating Word Documents with XSLT
- 1-5.4Word Counting (1-11 ,1-12)
- Tricks while feature_extracting text: Extend the vectorizer with NLTK's stemmer
- QTP -25 Working with MS Word 与word交互
- 安装NLTK
- 安装NLTK
- 企业大数据平台架构
- linux yum命令详解
- 将HTML5 Canvas的内容保存为图片
- 简单的Map
- 乘法逆元小结
- Word Frequence Counting with NLTK
- Python绘图Turtle库详解
- 深入理解nodejs Event loop
- linux.查看定时任务,查看定时任务日志(类似于eclispe控制台)
- 探究HashMap数据结构
- 发附件含有压缩文件的邮件,出现压缩文件名乱码
- 关于在同一个网段互相ping不通的问题
- python在子线程中使用WMI报错-2147221020-moniker,i,bindCTX=pythoncom.MKParseDisplayName(Pathname)
- mysql 查询JSON类型数据