H323 AAC音频能力协商问题

来源：互联网发布：编程语言难度排名编辑：程序博客网时间：2024/06/14 13:28

AAC音频能力协商问题

视频会议中，通常音频能力的比较是比较简单的，通常是只是比较一下格式就行了。但是aac系列音频就是一个例外。它有一个复杂的能力表示方式，在交互的时候也不会明确的指明确切的采样率，通道数，而是像264格式一样，给出的是能力的level上限，需要我们去匹配比较。这里简单的介绍一下aac能力，和工作中碰到的问题的总结。

案例描述

视频会议的能力协商中关于音频的问题一般都集中在aac上，特别是在对外厂商的对通过程中这个问题尤其突出。这个案例就介绍一下aac能力描述和工作中碰到的例子加以说明。

案例分析

首先我来介绍下aac在H245中的表示(这个截图是在打开逻辑通道的时候获取的数据)。

AAC LD in H.245

这是一个最常见的aac-ld能力，通过抓包我们发现，aac-lc主要包含了profileAndLevel、formatType、 maxal-sduFrames、audioObjectType、audioSpecificConfig这5个参数，还有其他很多的参数比如 maxAudioObject、muxConfigPresent、streamMuxConfig、 errorProtection_SpecificConfig、EP_DataPresent等等，其中就前面5个是主要的参数主要用于aac-lc、后面几个参数通常是用于aac-ld。我对于这些参数的初略的了解了一下，因为很多东西对与我们能力对通来说并没有帮助，而且要彻底的理解这些参数要理解 aac的编解码过程。其中最重要的就是profileAndLevel、audioObjectType、audioSpecificConfig、 streamMuxConfig和formatType这几个参数基本上能确定这个aac能力的大小，和开通道的时候具体的通道数和采样率。下面来具体说明一下这几个重要的参数，以及怎样查找文档。

profileAndLevel:

这是一张没有截图完整的表，还有下面的的profile值。在日常的应用中我发现这个值实际用到没几个。如果是aac-lc的单声道用的是0×02也就是 Main Audio Profile/L2，如果是双声道的那么是0x0f也就是 High Quality Audio Profile/L2。如果是aac-ld单声道那么用的是0×18 Low Delay Audio Profile/L3，双声道的是0×19 Low Delay Audio Profile/L4。这些都是根据不同厂商使用中得出来的。通过14496-3的文档中能查到Low Delay Audio Profile/L4到底是多大的能力

AAC LD Level

通过查表我们看到Low Delay Audio Profile L4的能力最大支持2个声道，最大采样率为48K。这里面我们看到他能力描述都是通过最大能力，不是指明确切的单声道还是双声道。而是通过开通到的时候来指定到底开的是单声道还是双声道。这就是解析aac能力的难点。

AAC in H.245 capabilitySet

这个图中standard:0，代表的是profileAndLevel，并且这个profile的值为2代表的是Main Audio Profile/L2。

audioObjectType：

AAC Profile definitions – audioObjectType

这个参数比较简单，是一个可选参数，aac-lc就是2表示aac-lc，如果是aac-ld那么就是23。

AAC LC in H.245 capabilitySet

这个抓包里面standard:3表示的audioObjectType这个参数，而下面的2就是他的值，表示AAC LC。

formatType：

这个值是指示在如下的原始数据格式和音频格式之间的比特流格式类型的选择强制参数。通常这个参数填的都是0，表示原始数据格式

这个抓包中standard:1表示formatType这个参数，而NULL表示的是原始数据格式。

audioSpecificConfig：

这个参数也是个可选参数，但是他不能再能力集交换的时候出现，也就是说他要出现也只能出现在打开通道的参数中。而且是aaclc的开通道中，（那么aacld的呢，下面个参数会说到）它指示了这个aac-ld能力的具体采样率，和通道数。比如这个例子中

AAC LC in H.245 capabilitySet – audioSpecificConfig

Standard:4表示audioSpecificConfig，他是一个2字节的值具体的值可以通过查看二进制得到

AAC LC in H.245 capabilitySet – audioSpecificConfig – buf

通过上图我们看见这个具体的2个字节的值是0×1288这个两个字节到底是什么意思呢？

这张图为我们解释了到底这2个字节代表什么，从第一位开始每一位代表什么都详细的给列出来了。我们不妨以0×1288来解析一下，0×1288 = 0001 0010 1000 1000，根据上图前5个bit也就是00010代表的是aduioObjectType那么就是2，表示为aac-lc，后四个字节表示采样率 0101，也就是5,5相应的采样率为emFs32000，也就是32k，我们keda的aac的采样率用的都是这个默认的32k采样率。后面4bit为 0001，为1，那么代表这个音频是单声道的emChnl1。后面三个我们不需要关注。这样我们就能确切的知道，打开通道的具体通道数和采样率了，解码器也就知道以怎样的方式解码。

streamMuxConfig：

这个参数也是可选的参数，他的性质和audioSpecificConfig一样，也是只能存在于开通道的时候，只不过是上面这个用于aaclc，这个用于aacld，仅此而已。不过这两个参数之间的各个位表示的内容还是有一点关系的。其实streamMuxConfig中表示具体的通道数采样率也是用的audioSpecificConfig，只不过他还多了一些其他的位表示其他的含义。

AAC LD in H.245 capabilitySet – streamMuxConfig

这个是一个aacld开通道的时候streamMuxConfig参数，这个参数的具体7个字节含义见14496-3中的关于 streamMuxConfig的解释，上面有7个字节表示的语法表他的构成根据不同的参数的值而不同。所以这里很难说清楚。下图是一种常见的情况，其中我们需要注意的是（audioSpecificConfig）和（audioSpecificConfig end）中间这一段，这一段其实就是上面讲的audioSpecificConfig的参数，表示了具体的通道数和采样率。其他参数具体含义我不是很清楚，也不需要搞清楚，因为我们只需要了解通道数，和采样率的具体值。这样其实我们在协议栈抛上来的6个字节当中有选择的解析它的 audioSpecificConfig这2个字节的东西。

到这里，其实已经把aac在视频会议中能力交换中的东西，说明的差不多了。

解决过程

上面介绍了aac的参数，和具体用法。我们还是来看看在实际工作中碰到的问题。Aac对通中出现过很多的问题，因为之前我们的aaclc还算是标准的，但是aacld就不是很标准了，我们aacld开通道的时候带的config参数是aaclc的参数，导致外厂的终端拒绝我们的开通到，所以我们无法开启aacld的双声道通道。通过阅读14496-3协议，我们也把aacld标准化了以后，通道是打开了，但是双方都听不见声音。这个问题后来证实是因为双方的采样率不同导致的，我们自己的aacld采样率使用的32k的，但是大多数的外厂aacld的采样率都是用的48k的。所以导致双方无法解码。

总结

这个案例主要介绍一下aac的详细描述，以及其使用的方法。他的主要难点是他不像其他音频是指定了采样率，通道数。aac在能力交换的时候不会指定具体的能力，而是给出一个最大的能力，指示其能够最大支持到几通道，最大的采样率为多少。这和h264有点类似，所以处理起来比较麻烦。但是只要使用好 h_245和14496-3这两个文档，遇到问题查一下就能搞清楚。

h_245 文档参考值如下：

Annex H

ISO/IEC 14496-3 Capability Definitions

Table H.1 defines the capability identifier for ISO/IEC 14496-3 [50] and ISO/IEC 144963/Amd.1 [51] Capabilities. Tables H.2 to H.11 define the associated capability parameters for ISO/IEC 14496-3. These parameters shall only be included as genericAudioCapability within the AudioCapability structure and as genericAudioMode within the AudioMode structure. For capability exchange, profileAndLevel, formatType and maxAlsduAudioFrames shall be present, audioObjectType and maxAudioObjects may be present, and all other parameters shall be absent. If formatType indicates ISO/IEC 14496-3 Transport Stream format, maxAudioObjects shall be present for capability exchange. When opening a logical channel (forward or reverse), profileAndLevel, formatType and audioObjectType shall be present and all other parameters may be specified. For mode request, profileAndLevel and formatType shall be present and audioObjectType may be specified.

profileAndLevel of ISO/IEC 1446-3 and ISO/IEC 14496-3/Amd.1 may support several types of audio objects. The audio object shall be carried as one of two bitstream formats which are the raw data format and the ISO/IEC 14496-3 Transport Stream format. formatType indicate the choice of the bitstream format type. In applications using multi-rate or scalable transmission, it is useful to allow changes in the structure of the audio objects in one logical channel. This can be realized with the MPEG-4/Audio format which allows changing the configuration of the stream frame by frame. For low bit-rate transmission, the raw data format may be used to reduce redundancy of transmitting the configuration of the stream every frame.

Table H.1/H.245 – Capability Identifier for ISO/IEC 14496-3 Capability

Capability name

ISO/IEC 14496-3

Capability class

Audio Codec

Capability identifier type

Standard

Capability identifier value

{itu-t (0) recommendation (0) h (8) 245 generic-capabilities (1) audio (1) ISO/IEC 14496-3 (0)}

maxBitRate

This field shall always be included.

nonCollapsingRaw

This field shall not be included.

transport

This field shall not be included.

Table H.2/H.245 – Profile and Level for ISO/IEC 14496-3 Capability

Parameter name

profileAndLevel

Parameter description

This is a nonCollapsing GenericParameter.

profileAndLevel indicates the capability of processing the particular profiles in combination with the level as given in ISO/IEC 14496-1 and ISO/IEC 14496-1/Amd.1.

Parameter identifier value

Parameter status

Mandatory

Parameter type

unsignedMax. Shall be in the range 0..255.

Supersedes

–

Table H.3/H.245 – formatType for ISO/IEC 14496-3 Capability

Parameter name

formatType

Parameter description

This is a nonCollapsing GenericParameter.

formatType indicates the choice of the bitstream format type of an audio object between the raw data format and the audio format as follows:

• 0: the raw data format (ISO/IEC 14496-3 and ISO/IEC 144963/Amd.1)

• 1: the format defined as Low-overhead MPEG-4 Audio Transport Multiplex (LATM) in ISO/IEC 14496-3/Amd.1.

Parameter identifier value

Parameter status

Mandatory

Parameter type

logical

Supersedes

–

Table H.4/H.245 – maxAl-sduAudioFrames for ISO/IEC 14496-3 Capability

Parameter name

maxAl-sduAudioFrames

Parameter description

This is a collapsing GenericParameter. It specifies what is the maximum number of audio frames per AL-SDU

Parameter identifier value

Parameter status

Shall be present for capability exchange and logical channel signalling. Shall not be present for mode request.

Parameter type

unsignedMin. Shall be in the range 1..256.

Supersedes

–

Table H.5/H.245 – audioObjectType for ISO/IEC 14496-3 Capability

Parameter name

audioObjectType

Parameter description

This is a nonCollapsing GenericParameter.

audioObjectType indicates the set of tools to be used by the decoder of the bitstream contained in the logical channel as given in ISO/IEC 144963/Amd.1. It can be used to limit the capability within the specified profileAndLevel in capability exchange.

Parameter identifier value

Parameter status

Optional. May be present for Capability Exchange. Shall be present for Logical Channel Signalling. May be present for Mode Request.

Parameter type

unsignedMax. Shall be in the range 0..31.

Supersedes

–

Table H.6/H.245 – audioSpecificConfig for ISO/IEC 14496-3 Capability

Parameter name

audioSpecificConfig

Parameter description

This is a nonCollapsing GenericParameter.

audioSpecificConfig indicates how to configure the decoder for a particular object (refer to ISO/IEC 14496-3/Amd.1).

Parameter identifier value

Parameter status

Optional. Shall not be present for Capability Exchange and Mode Request. Shall be present for Logical Channel Signalling if formatType equals 0 (raw data format). If not, shall not be present for Logical Channel Signalling.

Parameter type

octetString

Supersedes

–

Table H.7/H.245 – maxAudioObjects for ISO/IEC 14496-3 Capability

Parameter name

maxAudioObjects

Parameter description

This is a Collapsing GenericParameter. It specifies what is the maximum number of multiplexed audio objects in the audio payload.

Parameter identifier value

Parameter status

Optional. If formatType equals 0 (raw data format), shall not be present for Capability Exchange and Logical Channel Signalling. If not, shall be present for Capability Exchange and Logical Channel Signalling. Shall not be present for Mode Request.

Parameter type

unsignedMin. Shall be in the range 1..16.

Supersedes

–

Table H.8/H.245 – muxConfigPresent for ISO/IEC 14496-3 Capability

Parameter name

muxConfigPresent

Parameter description

This is a nonCollapsing GenericParameter.

muxConfigPresent indicates whether audio payload configuration is multiplexed into the audio payload itself as given in ISO/IEC 144963/Amd.1:

0: audio payload configuration (streamMuxConfig) is not multiplexed into the audio payload.

1: streamMuxConfig is multiplexed into the audio payload.

Parameter identifier value

Parameter status

Optional. Shall not be present for Capability Exchange and Mode Request. Shall be present for Logical Channel Signalling if formatType equals 1 (LATM format). If not, shall not be present for Logical Channel Signalling.

Parameter type

logical

Supersedes

–

Table H.9/H.245 – EP_DataPresent for ISO/IEC 14496-3 Capability

Parameter name

EP_DataPresent

Parameter description

This is a nonCollapsing GenericParameter.

EP_DataPresent indicates whether the audio payload has error resiliency for bit error (not packet loss) as given in ISO/IEC 14496-3/Amd.1:

0: The audio payload has not error resiliency.

1: The audio payload has error resiliency. The configuration for the ISO/IEC 14496-3/Amd.1 EP tool (errorProtection_SpecificConfig) may be present for Logical Channel Signalling.

Parameter identifier value

Parameter status

Parameter type

logical

Supersedes

–

Table H.10/H.245 – streamMuxConfig for ISO/IEC 14496-3 Capability

Parameter name

streamMuxConfig

Parameter description

This is a nonCollapsing GenericParameter.

streamMuxConfig indicates configuration of the audio payload as given in ISO/IEC 14496-3/Amd.1.

Parameter identifier value

Parameter status

Parameter type

octetString

Supersedes

–

Table H.11/H.245 – errorProtection_SpecificConfig for ISO/IEC 14496-3 Capability

Parameter name

errorProtection_SpecificConfig

Parameter description

This is a nonCollapsing GenericParameter.

errorProtection_SpecificConfig indicates how to configure the ISO/IEC 14496-3/Amd.1 EP tool as given in the LATM EP_MuxElement() description in ISO/IEC 14496-3/Amd.1.

Parameter identifier value

Parameter status

Parameter type

octetString

Supersedes

–

声明：本文大部分内容转载

http://rg4.net/archives/1480.html 韦国华

0 0