ROS语音合成代码学习

来源：互联网发布：江恩矩阵图软件编辑：程序博客网时间：2024/06/09 22:49

ROS语音合成的功能试用了一下，对于英文还是蛮方便的。然后分析了代码实现。

audio_capture: Provides code to capture audio from a microphone and transport it to a destination for playback.
audio_play: Receives audio messages from an audio_capture node. Outputs the messages to the local speakers.
audio_common_msgs: Message definitions for audio transport.
sound_play: A package to play sound files and synthesize speech

应用程序代码在ros安装路径下即有：talkback.py:

libsoundplay库和soundplay_node需要另行下载源码：https://github.com/ros-drivers/audio_common.git

ppeix:audio_common$ ll
total 36
drwxr-xr-x 8 ppeix ppeix 4096 4月 2 10:27 ./
drwxrwxr-x 14 ppeix ppeix 4096 4月 2 16:29 ../
drwxrwxr-x 4 ppeix ppeix 4096 4月 2 10:27 audio_capture/
drwxrwxr-x 2 ppeix ppeix 4096 4月 2 10:27 audio_common/
drwxrwxr-x 3 ppeix ppeix 4096 4月 2 10:27 audio_common_msgs/
drwxrwxr-x 4 ppeix ppeix 4096 4月 2 10:27 audio_play/
drwxrwxr-x 8 ppeix ppeix 4096 4月 2 10:27 .git/
-rw-rw-r-- 1 ppeix ppeix 6 4月 2 10:27 .gitignore
drwxrwxr-x 8 ppeix ppeix 4096 4月 2 10:27 sound_play/

主要分析sound_play

1. 实现了一个libsoundplay库，主要包含了SoundClient类，可由应用程序通过创建SoundClient类调用该库的功能。应用程序的作用是订阅/recognize/output topic，而这个库的功能是发布/robotsound topic与播放层进行接口。一般应用程序试用SoundClient的handle进行消息发布。handle.play handle.say handle.playwav等最终调用sendMsg函数来进行/robotsound topic的发布。

2. 实现了soundplay_node节点。该节点主要是创建了一个playbin类型的bin element，这样就可以使用gstreamer安装包进行语音播放。

这里面的代码，加深了对于python中dict,list,tuple,file类型的了解。特别是dict.

我们看class soundplay，里面定义了三个

self.builtinsounds = {}

self.filesounds = {}

self.voicesounds={}

self.hotlist = []

self.builtinsoundparams={:(,), :(,), :(,),}

再看 def callback(self, data): 函数定义用到了data.sound 和sound.command

data是callback函数的入参。根据sub = rospy.Subscriber("robotsound", SoundRequest, self.callback) 这里的定义可看出data是SoundRequest类型的，见后文字段描述。

而字典中用到的data.sound是作为key来用的。data.command是另一个参数。传递到soundtype实例中可以匹配不同的操作。

data.sound soundRequest.PLAY_FILE .ALL .SAY

data.command .PLAY_STOP PLAY_ONCE PLAY_START

这里的三个字典key是data.arg value是sound.将data.arg作为参数实例化soundtype，然后返回值作为value.

而对于builtinsoundparams字典，key是data.sound, value是params，可以字典嵌套元组params[0],params[1]作为参数来实例化soundtype.返回值为其value.

sound从是根据key从字典中获取的value. 而字典的添加操作是从soundtype对象实例化返回值进行的。

因此，sound值是字典的value，同时是soundtype的实例。可以执行soundtype的操作。比如sound.command.

对于builtinsoundparams来讲，定义的时候已经给予了赋值。其他三个字典初始化时是空的。但在callback中对应类型的字典，都用soundtype类实例的对象进行添加过。

添加的时候，self.filesounds[data.arg] = soundtype(data.arg)

self.voicesounds[data.arg] = soundtype(wavfilename)

self.builtinsounds[data.sound] = soundtype(params[0], params[1])

if not data.sound in self.builtinsounds

params = self.builtinsoundparams[data.sound]

self.builtinsounds[data.sound] = soundtype(params[0], params[1])

sound = self.builtinsounds[]

sound.command(data.command) 最终的调用。因为sound是soundtype实例，自然可以执行command方法。而data.command方法是回调函数传参过来的。

另一方面，从publisher角度看，应用程序调用SoundClient类发布robotsound时，调用的是sendMsg函数:

定义为：

def sendMsg(self, snd, cmd, s,arg2=""):

msg = SoundRequest()

msg.sound = snd

msg.command = cmd

msg.arg = s

msg.arg2 = arg2

self.pub.publish(msg)

调用方式举例：

self.client.sendMsg(self.snd, SoundRequest.PLAY_ONCE, self.arg)

这里，将sound,command,arg,arg2参数都进行了传递。

sound_play/SoundRequest Message

File: sound_play/SoundRequest.msg

# IMPORTANT: You should never have to generate this message yourself.
# Use the sound_play::SoundClient C++ helper or the
# sound_play.libsoundplay.SoundClient Python helper.

# Sounds
int8 BACKINGUP = 1
int8 NEEDS_UNPLUGGING = 2
int8 NEEDS_PLUGGING = 3
int8 NEEDS_UNPLUGGING_BADLY = 4
int8 NEEDS_PLUGGING_BADLY = 5

# Sound identifiers that have special meaning
int8 ALL = -1 # Only legal with PLAY_STOP
int8 PLAY_FILE = -2
int8 SAY = -3

int8 sound # Selects which sound to play (see above)

# Commands
int8 PLAY_STOP = 0 # Stop this sound from playing
int8 PLAY_ONCE = 1 # Play the sound once
int8 PLAY_START = 2 # Play the sound in a loop until a stop request occurs

int8 command # Indicates what to do with the sound

string arg # file name or text to say
string arg2 # other arguments

Expanded Definition

int8 BACKINGUP=1
int8 NEEDS_UNPLUGGING=2
int8 NEEDS_PLUGGING=3
int8 NEEDS_UNPLUGGING_BADLY=4
int8 NEEDS_PLUGGING_BADLY=5
int8 ALL=-1
int8 PLAY_FILE=-2
int8 SAY=-3
int8 PLAY_STOP=0
int8 PLAY_ONCE=1
int8 PLAY_START=2
int8 sound
int8 command
string arg
string arg2

另外ROS角度讲，该节点订阅/robotsound topic，然后发布/dignostics诊断消息。通过订阅消息回调函数，可以进行robotsound 消息体的识别。然后自身通过gstreamer机制进行状态机设置，然后使用gstreamer进行播放。

如果播放英文的话，从架构上讲，sound_play库完全可以不使用了。直接在应用程序中调用ekho即可。

或者保留libsoundplay库到soundplay_node的/robotsoudn topic 发布/订阅机制。然后修改soundplay_node的语音播放机制。在里面将其使用gstreamer的内容换成直接调用ekho命令行的形式。这样也行。

最后，如果gstreamer集成了中文播放功能，岂不是更简单了？

ppeix:audio_common$ gst-inspect |grep playbin
playback: playbin2: Player Bin 2
playback: playbin: Player Bin
ppeix:audio_common$ gst-inspect |grep playback
playback: subtitleoverlay: Subtitle Overlay
playback: playsink: Player Sink
playback: playbin2: Player Bin 2
playback: playbin: Player Bin

ppeix:audio_common$ gst-inspect |grep fest
festival: festival: Festival Text-to-Speech synthesizer
ppeix:audio_common$ gst-inspect |grep ekho
ppeix:audio_common$ gst-inspect |grep synthesizer
festival: festival: Festival Text-to-Speech synthesizer

这里有介绍如何定制bin

Custom bins

The application programmer can create custom bins packed with elements to perform a specific task. This allows you, for example, to write an Ogg/Vorbis decoder with just the following lines of code:

intmain (int   argc,      char *argv[]){  GstElement *player;  /* init */  gst_init (&argc, &argv);  /* create player */  player = gst_element_factory_make ("oggvorbisplayer", "player");  /* set the source audio file */  g_object_set (player, "location", "helloworld.ogg", NULL);  /* start playback */  gst_element_set_state (GST_ELEMENT (player), GST_STATE_PLAYING);[..]}

0 0