每天一个小程序之python实现统计英文单词出现个数

来源:互联网 发布:剑灵女捏脸数据图 编辑:程序博客网 时间:2024/06/05 17:46

python实现任一个英文的纯文本文件,统计其中的单词出现的个数、行数、字符数

新浪微博看到:

刘鑫-MarsLiu:既然有朋友提到python练习,我每天想一个吧,开个tag叫#每天一个小程序#今天的是,任一个英文的纯文本文件,统计其中的单词出现的个数


现在有文本文件movie.txt,内容如下:

Fight Club: "It's only after we've lost everything that we're free to do anything."We spend all of our lives trying to move up on an imaginary ladder. Even when we do succeed, we live in constant fear of falling off and hitting bottom, so it's still very agonizing - and not very fulfilling. By actually hitting bottom, it gives one unbelievable strength when the realization emerges that you have the ability to endure rock bottom.You understand that you have nothing to lose.This makes one virtually unstoppable from that point forth. You are free to do anything,to go anywhere, to try anything. No matter what you try, it doesn't matter if you fail. You've already lived through rock bottom and are still breathing.This is the only way to fully enjoy the journey for what it is。

思路:

1、读取文本文件

2、统计单词个数


需注意问题:

    行结尾符。\n


可扩展需求:

    统计行数、统计字符个数


以下为实现代码:

#!/usr/bin/env python# -*- coding: utf-8 -*-"""python实现任一个英文的纯文本文件,统计其中的单词出现的个数、行数、字符数"""file_name = "movie.txt"line_counts = 0word_counts = 0character_counts = 0with open(file_name, 'r') as f:    for line in f:        words = line.split()        line_counts += 1        word_counts += len(words)        character_counts += len(line)print "line_counts ", line_countsprint "word_counts ", word_countsprint "character_counts ", character_counts
代码运行结果:



以上代码可继续优化:

如,文件名的传递,封装到方法参数里。

if __name__ == '__main__':

main()

继续写下去的话,就是统计每个单词出现的频率了。

可扩展统计代码行数。OMG睡觉吧。


=================================

TODO

Sorted Word frequency count using python


http://stackoverflow.com/questions/4088265/sorted-word-frequency-count-using-python