利用Google Speech API实现Speech To Text
来源:互联网 发布:知乎论坛网 编辑:程序博客网 时间:2024/04/28 15:25
很久很久以前, 网上流传着一个 免费的,识别率暴高的,稳定的 Speech To Text API, 那就是Google Speech API. 但是最近再使用的时候,总是返回500 Error. 后来通过查看源码知道需要增加一个参数: key=...
. 可能是为了防止滥用吧. 并且, 最近Chrome另外发布了一个 长连接实时 的识别接口, 这对开发者来说真是巨大的福音啊. 在这里主要对这两个接口的用法进行介绍.
- 博客: http://www.cnblogs.com/jhzhu
- 邮箱: jhzhuustc@gmail.com
- 作者: 知明所以
- 时间: 2014-03-28
关键字
SpeechToText,API,google,STT,ASR,SR,speech,recognition
申请Chromium API keys
本文使用的Google Speech API是为google自家的浏览器Chrome服务的. 可以通过这个Demo体验一下实际使用的效果: Google Speech To Text Demo .
Chrome来源于开源项目Chromium. 为了方便开发者调试使用, google 开放 了这个STT(Speech to Text)接口. 但是, 因为这个借口只供调试使用, 所以在流量和次数上都有限制.并且, 不提供购买.
好了, 背景介绍完毕, 我们来第一步: 申请Chromium开发者权限 .
具体步骤请参考 how to get chromium API keys ).
Acquiring Keys
- Make sure you are a member of chromium-dev@chromium.org (you can just subscribe to chromium-dev and choose not to receive mail).
For convenience, the APIs below are only visible to people subscribed to that group. - Make sure you are logged in with the Google account associated with the email address that you used to subscribe to chromium-dev.
- Go to https://cloud.google.com/console (请使用旧版console)
- Click the red Create project… button.
- (Optional) You may add other members of your organization or team on the Team tab.
- In the ‘APIs & auth’ > APIs tab, click the On/Off button to turn each of the following APIs to the On position, and read and agree to the Terms of Service that is shown:
(This list might be out of date; try searching for APIs starting with “Chrome” or having “for Chrome” in the name.) * Chrome Remote Desktop API- Chrome Spelling API
- Chrome Suggest API
- Chrome Sync API
- Chrome Translate Element
- Google Maps Geolocation API (requires enabling billing but is free to use; you can skip this one, in which case geolocation features of Chrome will not work)
- Safe Browsing API
- Speech API
- Time Zone API
- Google Cloud Messaging for Chrome
- Google Now For Chrome API
If any of these APIs are not shown, recheck step 1.
- Go to the Credentials tab under the APIs & auth tab.
- Click the red Create New Client ID button in the OAuth section to create an OAuth 2.0 client ID.
- You want “Installed Application” for the Application type section
- You want “Other” for the Installed application type section
- A new box should now appear titled “Client ID for installed applications”. In the next sections, we will refer to the values of the “Client ID” and “Client secret” fields in this box later (below).
- Click the red Create New Key button in the Public API Access section and create a new Browser key.
You want to leave the box on the “Create a browser key and configure allowed referers” empty. - A new box should appear titled “Key for browser applications”. The next sections will refer to the value of the “API key” field too.
好了, 到这里, 我们已经获得了应用key, 在下文我们用 {key}
表示这个key.
One Shot Recognition
我们用 curl
来向服务器发送请求:
curl -X POST \--data-binary @speech.flac \--user-agent 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_6_8) AppleWebKit/535.7 (KHTML, like Gecko) Chrome/16.0.912.77 Safari/535.7' \--header 'Content-Type: audio/x-flac; rate=8000;' \'https://www.google.com/speech-api/v1/recognize?client=chromium&lang=zh-CN&maxresults=5&pfilter=0&key=AIzaSyC6Tkf4*****Q0CdISn-qnHhwLaS3cg2a0'
speech.flac
–user-agent ‘…’http的参数,设置浏览器的 user-agent
信息–headerhttp的参数. 指定了传送内容的类型( audio/flac
)和音频频率( 8000Hz
). 注意, 只支持特定的几种频率(8000Hz,4000Hz
还有几个记不清了),上传的flac文件频率要和参数一致.https://www.google.com/…/&key=AIzaSyC6Tkf*****Q0CdISn-qnHhwLaS3cg2a0http请求地址,其中最后一部分的key,应该替换为您申请的 {key}
.等待一分钟左右, 如果你运气好的话, 能看到如下结果:
结果格式如下, 应该很清晰了吧:
{ "status": 0, "id": "b3447b5d98c5653e0067f35b32c0a8ca-1", "hypotheses": [ { "utterance": "i like pickles", "confidence": 0.9012539 }, { "utterance": "i like pickle" } ]}
如果您录音的格式不对的话, 可以用开源软件 sox
方便的转换格式和码率. 举个栗子:
sox ./speech.mp3 -b 8 speech.flac trim 0 15
Stream Recognition
后来, Google 提供了更先进的live的双向的识别接口. 即同时打开两个HTTP连接, 一个负责实时发送( POST
)音频流, 一个负责接受( GET
).
这里有一个 PHP
版本的Demo. 可以参考实现您自己的 Stream Recognition
:
Google Speech API – Full Duplex PHP Version
引用:
Google Speech API – Full Duplex PHP Version
http://mikepultz.com/2013/07/google-speech-api-full-duplex-php-version/Accessing Google Speech API / Chrome 11
http://mikepultz.com/2011/03/accessing-google-speech-api-chrome-11/Google Speech To Text API ( 9 months ago )
https://gist.github.com/alotaiba/1730160避开Google Voice Search利用Google Speech API实现Android语音识别
http://my.eoe.cn/sisuer/archive/5960.htmlHow to Use Google Speech API( with sox )
http://www.x2q.net/blog/2013/09/16/how-to-use-google-speech-api/Google Chomium Open Project
http://src.chromium.org/viewvc/chrome/trunk/src/content/browser/speech/
http://src.chromium.org/viewvc/chrome/trunk/src/content/browser/speech/google_one_shot_remote_engine.cc
Written with StackEdit .
原文地址:http://www.tuicool.com/articles/URBjQn
- 利用Google Speech API实现Speech To Text
- Google Text to Speech API and AIR
- csharp:Google TTS API text to speech
- csharp:Google TTS API text to speech
- C# 实现Text to Speech
- 利用微软Text-To-Speech朗读文本
- 利用微软Text-To-Speech朗读文本
- 初试Text-to-Speech
- 初试Text-To-Speech
- text-to-speech
- Text-to-Speech Tutorial
- Text-to-speech
- 了解Text-To-Speech
- C# Speech to Text
- Text To Speech 总结
- Google Speech API
- 避开Google Voice Search利用Google Speech API实现Android语音识别之Demo实现
- 避开Google Voice Search利用Google Speech API实现Android语音识别之原理
- 反射机制学习1(反射创建对象)
- Python 以txt格式保存和读取json数据
- opengl学习笔记二之绘制一个矩形
- 如何在单linux下操作嵌入式开发板
- 如何在开发板上缺少对应库的情况下让程序具有可执行性
- 利用Google Speech API实现Speech To Text
- Android开发SearchView+ListView实现搜索建议
- 另一种基于 WinCE 的 Silverlight 应用建立过程
- virtualbox桥接网卡虚拟机和使用wifi的宿主机ping不通
- HDU 1863:畅通工程(带权值的并查集)
- recover all files with git
- Three.js(一)LOD多细节层次
- hdu 1213 How Many Tables
- 如何理解<base href="<%=basePath%>"