Computing with Language:Simple Statistics
来源:互联网 发布:史密斯热水器 贵 知乎 编辑:程序博客网 时间:2024/05/15 23:48
Frequency Distributions
//定义变量fdist1 = FreqDist(text1)//输出fdist1//重复最多的50个fdist1.most_common(50)//whale重复次数fdist1['whale']//累积频率图fdist1.plot(50,cumulative=True)//单频词fdist1.hapaxes()
//定义V,V是一个链表,而不是一个集合V = set(text1)//在V中长度大于15的词long_words = [w for w in V if len(w) > 15]//排序sorted(long_words)
Python这里很类似于数学的表达方式,和正在用的java相比,更偏数学语言。
//词长>7,且词频>7的词(与文本内容相关的高频词)fdist5 = FreqDist(text5)sorted(w for w in set(text5) if len(w) > 7 and fdist5[w] > 7)
Collocations and Bigrams
双联词
bigrams(['more','is','said','than','done'])直接执行上述代码会报错
Traceback (most recent call last):
File "<pyshell#2>", line 1, in <module>
bigrams(['more','is','said','than','done'])
NameError: name 'nltk' is not defined
需要import nltk
from nltk import *之后执行,并未显示出来,而是以下语句,需要加上list函数执行。
<generator object bigrams at 0x044A6BD0>
list(bigrams(['more','is','said','than','done']))collocation函数为我们找到一个text中的双联词
text4.collocations()
Counting other things
//词长的频率fdist = FreqDist([len(w) for w in text1])fdist.keys()//freqdist后的结果fdist.items()fdist.max()fdist[3]fdist.freq(3)
NLTK频率分布类中定义的函数
例子描述fdist = FreqDist(samples)创建包含给定样本的频率分布fdist.inc(sample)增加样本fdist['monstrous']计数给定样本出现的次数fdist.freq('monstrous')给定样本的频率fdist.N()样本总数fdist.keys()以频率递减顺序排序的样本链表for sample in fdist :以频率递减的顺序遍历样本fidst.max()数值最大的样本fdist.tabulate()绘制频率分布表fdist.plot()绘制频率分布图fdist.plot(cumulative=True)绘制累积频率分布图fdist1 < fdist2测试样本在fdist1中出现的频率是否小于fdist2
阅读全文
0 0
- Computing with Language:Simple Statistics
- Cloud Computing Language
- Simple statistics for SharePoint
- Hoj 2576 Simple Computing
- Hoj 2577 Simple Computing II
- Computing with huge integer
- Cloud computing with Linux
- Parallel Computing with MATLAB
- Numerical Computing with MATLAB
- Chinese Language Processing and Chinese Computing
- Julia: A New Language for Scientific Computing
- Open Computing Language,开放运算语言
- Use awk to do simple statistics job
- 重学Statistics, Cha14 Simple Linear Regression
- Easy Rolling Statistics with PROC
- [hoj 2576 2577]Simple Computing & II
- GPU Accelerated Computing with Python
- Simple Description for C Language
- nodejs中异步回调总有些不完美
- selenium 菜鸟学习(1)
- Android 插件化 动态升级
- 如何利用C/C++逐行读取txt文件中的字符串(可以顺便实现文本文件的复制)
- 使用 FLIP 来提高 Web 动画的性能
- Computing with Language:Simple Statistics
- plc抗干扰设计
- C#使用正则表达式
- 图片转换成base64编码
- Git:本地操作的一些命令
- linux-2-redis配置
- html适配移动开发需要加一行代码
- 枚举类型
- TextView字体