python gutenberg古腾堡语料库

来源:互联网 发布:淘宝老店铺 编辑:程序博客网 时间:2024/06/06 17:58
import nltkfrom nltk.corpus import gutenberga = gutenberg.fileids()print(a)emma = gutenberg.words("shakespeare-macbeth.txt")print(emma[1030 :1037])for fileid in gutenberg.fileids():    num_chars = len(gutenberg.raw(fileid))    num_words = len(gutenberg.words(fileid))    num_sents = len(gutenberg.sents(fileid))    print(num_chars , num_words, num_sents , fileid)

原创粉丝点击