基于C#的在线语音识别软件开发

来源:互联网 发布:ubuntu能做什么 编辑:程序博客网 时间:2024/04/29 21:55

本软件利用了百度语音识别提供的接口,自行开发出的一个在线的语音识别软件。所以,制作之前需要去百度语音识别的网站去注册一个项目,免费的除非你的需求量特别大不然不需要付费。百度语音识别地址
然后就需要自己写代码去解决以下问题

  • 获取麦克风输入的语音
  • 发送到百度语音识别的接口
  • 得到返还的信息识别。

获取麦克风输入的语音

要获取麦克风的输入,需要调用一些WindowsAPI及其他的东西。下面就慢慢梳理 我会分散的梳理,整合需要自己理解着去整合
首先,我们获取麦克风,使用winmm.dll

//调用wavein的dll[DllImport("winmm.dll")]//获取有多少可用输入设备public static extern int waveInGetNumDevs();[DllImport("winmm.dll")]//增加一个缓冲区public static extern int waveInAddBuffer(IntPtr hwi, ref WaveHdr pwh, UInt32 cbwh);[DllImport("winmm.dll")]//关闭麦克风public static extern int waveInClose(IntPtr hwi);[DllImport("winmm.dll")]//打开麦克风public static extern int waveInOpen(out IntPtr phwi, UInt32 uDeviceID, ref WaveFormatEx lpFormat, WaveDelegate dwCallback, UInt32 dwInstance, UInt32 dwFlags);[DllImport("winmm.dll")]//标记为可用的缓冲区 public static extern int waveInPrepareHeader(IntPtr hWaveIn, ref WaveHdr lpWaveInHdr, UInt32 uSize);[DllImport("winmm.dll")]//标记为不可用的缓冲区public static extern int waveInUnprepareHeader(IntPtr hWaveIn, ref WaveHdr lpWaveInHdr, UInt32 uSize);[DllImport("winmm.dll")]//把缓冲区内容重置 public static extern int waveInReset(IntPtr hwi);[DllImport("winmm.dll")]//开始录制public static extern int waveInStart(IntPtr hwi);[DllImport("winmm.dll")]//停止录制public static extern int waveInStop(IntPtr hwi);

然后 我们要把接收到的波形数据放入到一个缓冲区里面

[StructLayout(LayoutKind.Sequential)]    //接受的波形数据放入的缓冲区    public struct WaveHdr    {        public IntPtr lpData;//缓冲区        public UInt32 dwBufferLength;//缓冲区长度        public UInt32 dwBytesRecorded;//某一刻读取到了多少字节的数据        public UInt32 dwUser;//自定义数据        public UInt32 dwFlags;        public UInt32 dwLoops;//是否循环        public IntPtr lpNext;//链表的下一缓冲区        public UInt32 reserved;//没实际意义    }    [StructLayout(LayoutKind.Sequential)]    //波形格式    public struct WaveFormatEx    {         public UInt16 wFormatTag;//波形的类型        public UInt16 nChannels;//通道数(1,单声道   2,立体音)        public UInt32 nSamplesPerSec;//采样率        public UInt32 nAvgBytesPerSec;//字节率        public UInt16 nBlockAlign;        public UInt16 wBitsPerSample;//每个样多少位        public UInt16 cbSize;//长度    }

但是在这里我们需要一个delegate的委托事件,其作用是在缓冲区满了或者waveinopen和waveinclose的时候被调用。

public delegate void WaveDelegate(IntPtr hwi, UInt32 uMsg, UInt32 dwInstance, UInt32 dwParam1, UInt32 dwParam2);

上传到百度识别的接口

在全部获取到麦克风语音接收的信息之后,我们需要把识别的波形上传到百度识别的接口上,在这里我们就用HTTP协议来将我们获得的东西上传上去

    /// <summary>        /// 通过HTTP协议去上传base64数据        /// </summary>        /// <param name="URL">服务器的url</param>        /// <param name="strPostdata">上传的东西</param>        /// <param name="strEncoding">采用的编码格式</param>        /// <returns></returns>        public static string OpenReadWithHttps(string URL, string strPostdata, string strEncoding)        {            Encoding encoding = Encoding.Default;            //默认的编码格式为default(GB2312)            HttpWebRequest request = (HttpWebRequest)WebRequest.Create(URL);            //向自定义的URL链接发送请求 request            request.Method = "post";            //请求的方式为post            request.Accept = "*/*";            //告诉服务器能接受*/*(任意)的参数类型            request.ContentType = "application/x-www-form-urlencoded";            //最常见的post提交数据的方式            byte[] buffer = encoding.GetBytes(strPostdata);            //用一个byte数组接收发送的数据字节            request.ContentLength = buffer.Length;            //告诉服务器自己上传的数组长度            request.GetRequestStream().Write(buffer, 0, buffer.Length);            //写入请求流从第一位开始写入buffer数组,写入长度为buffer.Length的数据流            HttpWebResponse response = (HttpWebResponse)request.GetResponse();            //从服务器得到的数据为请求获得的数据            using (StreamReader reader = new StreamReader(response.GetResponseStream(), Encoding.GetEncoding(strEncoding)))            {                //返回从URL获得的内容信息                return reader.ReadToEnd();            }        }

判断是否在录入音频

每个语音都是一个缓冲区,等缓冲区满了,要提供新缓冲区,等缓冲区满了,要提供新缓冲区

static void waveInHandler(IntPtr hwi, UInt32 uMsg, UInt32 dwInstance, UInt32 dwParam1, UInt32 dwParam2)        {            switch (uMsg)            {                case 0x3BE: break;                case 0x3C0:                    unsafe                    {                        var waveHdr = (WaveHdr*)dwParam1;                     }                    break;                case 0x3BF: break;            }        }

Main函数总结

小的模块说的差不多了,下面就从Main函数说起,中间还会穿插一些小的模块
首先我们设置波纹的格式

static void Main(string[] args)        {            try            {                var inputFormat = new WaveFormatEx();//波形格式                inputFormat.wFormatTag = 1;//波形类型                inputFormat.nChannels = 1;                inputFormat.nSamplesPerSec = 8000;                inputFormat.nAvgBytesPerSec = 16000;                inputFormat.nBlockAlign = 2;                inputFormat.wBitsPerSample = 16;                inputFormat.cbSize = 0;

由于我们是语音识别不是就识别一次,所以我们下面要进入一个死循环

这里我们在等候语音的输入

                for (;;)                {                    waveInOpen(out inputDevice, UInt32.MaxValue, ref inputFormat, new WaveDelegate(waveInHandler), 0, 0x00030000);                    int bufferSize = 960000;                    var buffer1 = new WaveHdr();                    buffer1.lpData = Marshal.AllocHGlobal(bufferSize);                    buffer1.dwBufferLength = (UInt32)bufferSize;                    buffer1.dwLoops = 1;                    waveInPrepareHeader(inputDevice, ref buffer1, (UInt32)Marshal.SizeOf(typeof(WaveHdr)));                    waveInAddBuffer(inputDevice, ref buffer1, (UInt32)Marshal.SizeOf(typeof(WaveHdr)));                    SpeechRecognitionEngine recognizer = null;                    foreach (var installed in SpeechRecognitionEngine.InstalledRecognizers())                    {                        if (installed.Culture.Name.Equals("zh-CN", StringComparison.CurrentCultureIgnoreCase) && installed.Id.Equals("MS-2052-80-DESK"))                        {                            recognizer = new SpeechRecognitionEngine(installed);                            break;                        }                    }                    var grammars = new GrammarBuilder();                    grammars.AppendDictation();                    recognizer.LoadGrammar(new Grammar(grammars));                    recognizer.SetInputToDefaultAudioDevice();                    bool recognizeStarted = false;                    int speechCount = 0;                    int silenceCount = 0;                    Console.WriteLine("正在等候语音输入...");                    recognizer.RecognizeAsync(RecognizeMode.Multiple);                    waveInStart(inputDevice);

当说话的时候开始分析语音

for (;;)                    {                                               if (!recognizeStarted)                        {                            if (recognizer.AudioState == AudioState.Speech)                                speechCount++;                            else speechCount = 0;                        }                        if (!recognizeStarted && speechCount >= 2)                        {                            recognizeStarted = true;                            speechCount = 0;                            Console.WriteLine("检测到语音输入,正在录制...");                        }                        if (recognizeStarted)                        {                            if (recognizer.AudioState == AudioState.Silence)                                silenceCount++;                            else silenceCount = 0;                        }                        //checkingMutex.Set();                        if (recognizeStarted && silenceCount >= 220)                        {                            //checkingMutex.Reset();                            silenceCount = 0;                            unsafe                            {                                Console.WriteLine("正在分析语音数据...");                                waveInReset(inputDevice);                                waveInStop(inputDevice);                                recognizer.RecognizeAsyncStop();

在上面的代码中有判断环境噪音的代码

if (recognizer.AudioState == AudioState.Silence)                                silenceCount++;                            else silenceCount = 0;

silenceCount 就是统计静音状态持续了多久,到了一定值,就可以发送语音到识别平台了

然后我们就要用到百度给予的接口和key了

                          var apiKey = "百key";                                var secretKey = "百度给的密码key";                                var token = OpenReadWithHttps("百度给你提供的API接口地址http" + $"?grant_type={ "client_credentials" }&client_id={ apiKey }&client_secret={ secretKey }", String.Empty, "utf-8");                                var tokenPrefix = "\"access_token\":[\"";                                int i;                                token = token.Substring(i = token.IndexOf(tokenPrefix) + tokenPrefix.Length + 1, token.IndexOf("\"", i + tokenPrefix.Length) - i);                                var postData = new StringBuilder();                                postData.Append("{").Append($"\"format\":\"pcm\",\"rate\":8000,\"channel\":1,\"token\":\"{ token }\",\"cuid\":\"F96625D0-0FBC-491C-B617-9EC0B3A0D5A6\",\"lan\":\"en\",");                                var base64Data = new byte[buffer1.dwBytesRecorded];                                Marshal.Copy(buffer1.lpData, base64Data, 0, (int)buffer1.dwBytesRecorded);                                var base64 = Convert.ToBase64String(base64Data);                                postData.Append("\"speech\":\"").Append(base64).Append("\",").Append($"\"len\":{ buffer1.dwBytesRecorded }").Append("}");                                try                                {                                    Console.Write("\n识别结果: ");                                    Marshal.FreeHGlobal(buffer1.lpData);                                    var result = OpenReadWithHttps("http://vop.baidu.com/server_api", postData.ToString(), "utf-8");                                    var prefix = "\"result\":[\"";                                    result = result.Substring(i = result.IndexOf(prefix) + prefix.Length, result.LastIndexOf("\"]") - i + 1);                                    string[] restt = result.Split('\"');                                    var restlt = restt[0];                                    Console.WriteLine(restlt);                                    //string resultfinally = Recognize(restlt);                                    try                                    {                                        string resultfinally = Recognize(restlt);                                        loading(resultfinally, "word.txt");                                    }                                    catch (Exception ex)                                    {                                        Console.WriteLine(ex.Message);                                        //(new SpVoiceClass()).Speak("你说的有些不标准,请重新说");                                    }                                    Console.WriteLine();                                }                                catch (Exception ex)                                {                                    Console.WriteLine("无法识别所说的话语。\n");                                }                                //checkingMutex.Set();                            }                            //checking.Dispose(checkingFinished);                            break;                        }                        Thread.Sleep(1);                    }                    //checkingFinished.WaitOne();                }            }            catch (Exception exception)            {                Console.WriteLine(exception);            }        }    }}

在这里你会发现,我做了一个语音识别和回复,识别目录下的文档里的内容,然后对比,对比到以后将下一句转换为语音。需要用到两个自定义的函数
第一个是判断你说的话是否是在给定的文本里面

  public static void loading(string listen, string url)        {            var file = File.OpenRead(url);            var sr = new StreamReader(file);            List<string> include = new List<string>();            while (!sr.EndOfStream)            {                var str = sr.ReadLine();                foreach (var chara in str)                    if (!char.IsLetter(chara))                        str = str.Replace(chara, ' ');                str = str.Trim();                include.Add(str);            }            for (int i = 0; i < include.Count; i++)            {                if (String.Compare(listen.Trim(), include[i].Trim(), StringComparison.CurrentCultureIgnoreCase) == 0)                {                    SpeechSynthesizer speaker = new SpeechSynthesizer();                    speaker.SetOutputToDefaultAudioDevice();                    speaker.Speak(include[i + 1]);                    return;                }            }            SpeechSynthesizer speak = new SpeechSynthesizer();            speak.SetOutputToDefaultAudioDevice();            speak.Speak("口音有问题,请重说。");            throw new Exception("口音有问题,请重说。");        }

第二个是判断是否跟自定义的语句匹配并说出下一句

public static string Recognize(string getin)        {            var responses = new string[]            {               "楼主帅吗",               "当然了",               "聪明吗",               "必须的",             //你想写和你想输出的语句            };            getin = getin.ToLower();            foreach (var chara in getin)                if (!char.IsLetter(chara))                    getin = getin.Replace(chara, ' ');            getin = getin.Trim();            int matches;            var k = getin.Split();            for(var i = 0; i < responses.Length; i++)            {                responses[i] = responses[i].ToLower();                foreach (var chara in responses[i])                    if (!char.IsLetter(chara))                        responses[i] = responses[i].Replace(chara, ' ');                responses[i] = responses[i].Trim();            }            foreach (var repWord in responses)            {                matches = 0;                var j = repWord.Split();                foreach (var myword in k)                {                    if (j.Contains(myword))                    {                        matches++;                        if (((float)matches / j.Length) >= 0.5F)                            return repWord;                    }                }            }            return "你说错了,请重说";        }

这里还是有一个小问题,就是你说的语句返识别返还回来会有标点符号,这里我们就把符号全部给抛弃了
我这边做的是英语的语音识别,在发送的json串的时候最后的len用的是en,在语种选择的时候是不区分大小写的,但是好像只支持三种默认中文(zh)。 中文=zh、粤语=ct、英文=en。
总的来说就这些东西,如果有什么疑问和建议或者纠正,可以直接告诉我,期待大神们的指点。

此外。在此特别感谢给我这个程序最大的技术支持的人。我们群里的大佬RURI(也叫Azure)。