毕业设计-基于深度神经网络的语音关键词检出系统-使用python脚本作词频统计-Librispeech
来源:互联网 发布:使用sql语句创建数据库 编辑:程序博客网 时间:2024/06/05 19:02
<span style="font-family: Arial, Helvetica, sans-serif; background-color: rgb(255, 255, 255);"></span><span style="font-family: Arial, Helvetica, sans-serif; background-color: rgb(255, 255, 255);">TIMIT之后,这次来分析Librispeech的词频,文件组织结构如图所示:</span>
librispeech文件夹下的dev-clean中含有多个多级子文件夹,每个末节文件夹下含有一个txt含有抄本外加多个音频文件是该抄本的朗读:
脚本任务是对所有txt抄本读取其中的词数并作统计,抄本内容样例如下
其中全部是大写单词,我们的操作分为两步:
1.利用os.walk()遍历所有文件,记下所有txt文件路径写入一个path.txt文本中
2.从path.txt读取文本路径打开,用python字典记录词与频率,写入keyword.txt作词频统计
代码如下:
import osimport os.pathrootDir = "dev-clean" #functions used"""This functions get in the file path and dictionary of keyword to count keywords"""def keywordCounter(fileDir, keywordContainer): f = open(fileDir) row = f.readline() while row != '': wordPart = row[row.index(" ")+1:-1] words = wordPart.split(" ") for word in words: if keywordContainer.has_key(word): keywordContainer[word] += 1 else: keywordContainer[word] = 1 row = f.readline()#1print "Step 1: Get absolute directory of all the transcript file inside this folder""""I got the script from cnblog without fully understanding, just use it as black box"""transPathDoc = open("dreaminghzAnalysedData\pathDoc.txt","w+")for parent, dirnames, filenames in os.walk(rootDir): #for dirname in dirnames: #print dirname for filename in filenames: if filename[-4:] == ".txt": #print os.path.join(parent,filename) transPathDoc.write(os.path.join(parent,filename)+"\n")transPathDoc.close() print "Step 1 finished"#2print "Step 2: Read in all the transcript file and do the word counting"keywordContainer = {}pathes = open("dreaminghzAnalysedData\pathDoc.txt")pathTmp = pathes.readline()#readin path and call the function keywordCounter to process itwhile pathTmp != '': keywordCounter(pathTmp[:-1], keywordContainer) pathTmp = pathes.readline()pathes.close()print "Step 2 finished"#claimprint "There's totally " + str(len(keywordContainer)) + " keywords"#3print "Step 3: Save keyword into dreaminghzAnalysedData\keyWords.txt"outfile = open("dreaminghzAnalysedData\keyWords.txt","w+")for ks in keywordContainer.keys(): outfile.write(ks + " " + str(keywordContainer[ks]) + "\n")outfile.close()
"""This script is used for finding high frequency keywords that appears more than given time number"""#use the file generated by LibriWordCounter.pykw = open("keyWords.txt")qualification = Falsewhile not qualification: try: num = int(raw_input("Input the lower bound of frequency as positive integer pls:")) if num <= 0: qualification = False else: qualification = True except: qualification = FalsehighFFilename = "keywords-noless-" + str(num) + "-times.txt"highFkw = open(highFFilename,"w+")#read in and write down qualified keywordsrow = kw.readline()ctr = 0while row != '': unit = row[:-1].split(' ') key, frequency = unit[0], int(unit[1]) if frequency >= num: highFkw.write(row) ctr += 1 row = kw.readline()kw.close()highFkw.close()print("Finished, there's totally " + str(ctr) + " records written into the file")
0 0
- 毕业设计-基于深度神经网络的语音关键词检出系统-使用python脚本作词频统计-Librispeech
- 毕业设计-基于深度神经网络的语音关键词检出系统-使用python脚本作词频统计-TIMIT
- 毕业设计- 基于深度神经网络的语音关键词检出系统-上手currennt-1
- 论文-基于深度学习的语音关键词检出
- python统计词频的方法
- 使用python的map和reduce统计词频
- Python字典使用--词频统计的GUI实现
- Python使用Hadoop进行词频统计
- 使用Python+NLTK实现英文单词词频统计
- 使用Python+NLTK实现英文单词词频统计
- 基于LUCENE的java词频统计
- python--更干净的词频统计
- 基于Python和R语言的分词/词频统计/词云图
- python 统计词频
- python统计汉字词频
- python 统计词频
- python统计文档词频
- Python新闻联播词频统计
- java_web初学笔记之<MyEclipse反编译jad插件配置>
- mac mini 系统相关操作
- Factorials
- 图像格式和图像类型
- 读取、修改、保存图像-----学习记录(1)
- 毕业设计-基于深度神经网络的语音关键词检出系统-使用python脚本作词频统计-Librispeech
- 黑马程序员——java基础(GUI)
- decimal system hdu 2106
- String to Integer (atoi)
- VMware虚拟机中Ubuntu下没声音
- 线段树水题 #1077 : RMQ问题再临-线段树
- jdbctemplate 增删查改sql
- 802.11关联帧解读
- [C++]LeetCode 19: Remove Nth Node From End of List(删除链表中倒数第n个节点)