ROS语音合成代码学习
来源:互联网 发布:江恩矩阵图软件 编辑:程序博客网 时间:2024/06/09 22:49
ROS语音合成的功能试用了一下,对于英文还是蛮方便的。然后分析了代码实现。
audio_capture: Provides code to capture audio from a microphone and transport it to a destination for playback.
audio_play: Receives audio messages from an audio_capture node. Outputs the messages to the local speakers.
audio_common_msgs: Message definitions for audio transport.
sound_play: A package to play sound files and synthesize speech
应用程序代码在ros安装路径下即有:talkback.py:
libsoundplay库和soundplay_node需要另行下载源码:https://github.com/ros-drivers/audio_common.git
ppeix:audio_common$ ll
total 36
drwxr-xr-x 8 ppeix ppeix 4096 4月 2 10:27 ./
drwxrwxr-x 14 ppeix ppeix 4096 4月 2 16:29 ../
drwxrwxr-x 4 ppeix ppeix 4096 4月 2 10:27 audio_capture/
drwxrwxr-x 2 ppeix ppeix 4096 4月 2 10:27 audio_common/
drwxrwxr-x 3 ppeix ppeix 4096 4月 2 10:27 audio_common_msgs/
drwxrwxr-x 4 ppeix ppeix 4096 4月 2 10:27 audio_play/
drwxrwxr-x 8 ppeix ppeix 4096 4月 2 10:27 .git/
-rw-rw-r-- 1 ppeix ppeix 6 4月 2 10:27 .gitignore
drwxrwxr-x 8 ppeix ppeix 4096 4月 2 10:27 sound_play/
主要分析sound_play
1. 实现了一个libsoundplay库,主要包含了SoundClient类,可由应用程序通过创建SoundClient类调用该库的功能。应用程序的作用是订阅/recognize/output topic,而这个库的功能是发布/robotsound topic与播放层进行接口。一般应用程序试用SoundClient的handle进行消息发布。handle.play handle.say handle.playwav等最终调用sendMsg函数来进行/robotsound topic的发布。
2. 实现了soundplay_node节点。该节点主要是创建了一个playbin类型的bin element,这样就可以使用gstreamer安装包进行语音播放。
这里面的代码,加深了对于python中dict,list,tuple,file类型的了解。特别是dict.
我们看class soundplay,里面定义了三个
self.builtinsounds = {}
self.filesounds = {}
self.voicesounds={}
self.hotlist = []
self.builtinsoundparams={:(,), :(,), :(,),}
再看 def callback(self, data): 函数定义 用到了data.sound 和sound.command
data是callback函数的入参。根据sub = rospy.Subscriber("robotsound", SoundRequest, self.callback) 这里的定义可看出data是SoundRequest类型的,见后文字段描述。
而字典中用到的data.sound是作为key来用的。data.command是另一个参数。传递到soundtype实例中可以匹配不同的操作。
data.sound soundRequest.PLAY_FILE .ALL .SAY
data.command .PLAY_STOP PLAY_ONCE PLAY_START
这里的三个字典key是data.arg value是sound.将data.arg作为参数实例化soundtype,然后返回值作为value.
而对于builtinsoundparams字典,key是data.sound, value是params,可以字典嵌套元组params[0],params[1]作为参数来实例化soundtype.返回值为其value.
sound从是根据key从字典中获取的value. 而字典的添加操作是从soundtype对象实例化返回值进行的。
因此,sound值是字典的value,同时是soundtype的实例。可以执行soundtype的操作。比如sound.command.
对于builtinsoundparams来讲,定义的时候已经给予了赋值。其他三个字典初始化时是空的。但在callback中对应类型的字典,都用soundtype类实例的对象进行添加过。
添加的时候,self.filesounds[data.arg] = soundtype(data.arg)
self.voicesounds[data.arg] = soundtype(wavfilename)
self.builtinsounds[data.sound] = soundtype(params[0], params[1])
if not data.sound in self.builtinsounds
params = self.builtinsoundparams[data.sound]
self.builtinsounds[data.sound] = soundtype(params[0], params[1])
sound = self.builtinsounds[]
sound.command(data.command) 最终的调用。因为sound是soundtype实例,自然可以执行command方法。而data.command方法是回调函数传参过来的。
另一方面,从publisher角度看,应用程序调用SoundClient类 发布robotsound时,调用的是sendMsg函数:
定义为:
def sendMsg(self, snd, cmd, s,arg2=""):
msg = SoundRequest()
msg.sound = snd
msg.command = cmd
msg.arg = s
msg.arg2 = arg2
self.pub.publish(msg)
调用方式举例:
self.client.sendMsg(self.snd, SoundRequest.PLAY_ONCE, self.arg)
这里,将sound,command,arg,arg2参数都进行了传递。
sound_play/SoundRequest Message
File: sound_play/SoundRequest.msg
# Use the sound_play::SoundClient C++ helper or the
# sound_play.libsoundplay.SoundClient Python helper.
# Sounds
int8 BACKINGUP = 1
int8 NEEDS_UNPLUGGING = 2
int8 NEEDS_PLUGGING = 3
int8 NEEDS_UNPLUGGING_BADLY = 4
int8 NEEDS_PLUGGING_BADLY = 5
# Sound identifiers that have special meaning
int8 ALL = -1 # Only legal with PLAY_STOP
int8 PLAY_FILE = -2
int8 SAY = -3
int8 sound # Selects which sound to play (see above)
# Commands
int8 PLAY_STOP = 0 # Stop this sound from playing
int8 PLAY_ONCE = 1 # Play the sound once
int8 PLAY_START = 2 # Play the sound in a loop until a stop request occurs
int8 command # Indicates what to do with the sound
string arg # file name or text to say
string arg2 # other arguments
Expanded Definition
int8 NEEDS_UNPLUGGING=2
int8 NEEDS_PLUGGING=3
int8 NEEDS_UNPLUGGING_BADLY=4
int8 NEEDS_PLUGGING_BADLY=5
int8 ALL=-1
int8 PLAY_FILE=-2
int8 SAY=-3
int8 PLAY_STOP=0
int8 PLAY_ONCE=1
int8 PLAY_START=2
int8 sound
int8 command
string arg
string arg2
另外ROS角度讲,该节点订阅/robotsound topic,然后发布/dignostics诊断消息。通过订阅消息回调函数,可以进行robotsound 消息体的识别。然后自身通过gstreamer机制进行状态机设置,然后使用gstreamer进行播放。
如果播放英文的话,从架构上讲,sound_play库完全可以不使用了。直接在应用程序中调用ekho即可。
或者保留libsoundplay库到soundplay_node的/robotsoudn topic 发布/订阅机制。然后修改soundplay_node的语音播放机制。在里面将其使用gstreamer的内容换成直接调用ekho命令行的形式。这样也行。
最后,如果gstreamer集成了中文播放功能,岂不是更简单了?
ppeix:audio_common$ gst-inspect |grep playbin
playback: playbin2: Player Bin 2
playback: playbin: Player Bin
ppeix:audio_common$ gst-inspect |grep playback
playback: subtitleoverlay: Subtitle Overlay
playback: playsink: Player Sink
playback: playbin2: Player Bin 2
playback: playbin: Player Bin
ppeix:audio_common$ gst-inspect |grep fest
festival: festival: Festival Text-to-Speech synthesizer
ppeix:audio_common$ gst-inspect |grep ekho
ppeix:audio_common$ gst-inspect |grep synthesizer
festival: festival: Festival Text-to-Speech synthesizer
这里有介绍如何定制bin
Custom bins
The application programmer can create custom bins packed with elements to perform a specific task. This allows you, for example, to write an Ogg/Vorbis decoder with just the following lines of code:
intmain (int argc, char *argv[]){ GstElement *player; /* init */ gst_init (&argc, &argv); /* create player */ player = gst_element_factory_make ("oggvorbisplayer", "player"); /* set the source audio file */ g_object_set (player, "location", "helloworld.ogg", NULL); /* start playback */ gst_element_set_state (GST_ELEMENT (player), GST_STATE_PLAYING);[..]}
- ROS语音合成代码学习
- ROS学习--语音合成&语音识别
- ROS语音合成-----sound_play应用
- 百度语音合成学习
- ROS语音学习
- 语音合成
- 语音合成
- 语音合成
- 语音合成
- 语音合成
- 语音合成
- Merlin TTS 深度学习的语音合成
- Ros语音
- 语音识别和语音合成
- iOS语音听写、语音合成
- 语音合成软件
- 多通道语音合成
- Android 语音合成
- php闭包
- SVN介绍
- 为什么每卖出一部安卓手机微软都能从中赚取5-15美元?(专利流氓)
- zookeeper的选主流程(源码分析)
- UITableView之局部刷新
- ROS语音合成代码学习
- asp.net实现文件上传带进度条(多种风格)
- 【算法】 括号匹配(二)
- C++格式化字符串
- 实用的jQuery显示密码小插件
- 《送你一颗子弹》读书笔记
- 腾讯面试题:10G 个整数,乱序排列,要求找出中位数。内存限制为 2G。
- 衣服还要一件一件穿——装饰模式
- push:not found问题解决