flv 文件格式解析

来源：互联网发布：c语言在线调试工具编辑：程序博客网时间：2024/06/06 23:54

Overview

Flash Video(简称FLV),是一种流行的网络格式。目前国内外大部分视频分享网站都是采用的这种格式.

File Structure

从整个文件上开看,FLV是由The FLV header 和 The FLV File Body 组成.

1.The FLV header

FieldTypeCommentSignatureUI8Signature byte always 'F' (0x46)SignatureUI8Signature byte always 'L' (0x4C)SignatureUI8Signature byte always 'V' (0x56)VersionUI8File version (for example, 0x01 for FLV version 1)TypeFlagsReservedUB [5]Shall be 0TypeFlagsAudioUB [1]1 = Audio tags are presentTypeFlagsReservedUB [1]Shall be 0TypeFlagsVideoUB [1]1 = Video tags are presentDataOffsetUI32The length of this header in bytes

Signature: FLV 文件的前3个字节为固定的‘F’‘L’‘V’,用来标识这个文件是flv格式的.在做格式探测的时候，

如果发现前3个字节为“FLV”，就认为它是flv文件.

Version: 第4个字节表示flv版本号.

Flags: 第5个字节中的第0位和第2位,分别表示 video 与 audio 存在的情况.(1表示存在,0表示不存在)

DataOffset : 最后4个字节表示FLV header 长度.

2.The FLV File Body

FieldTypeCommentPreviousTagSize0UI32Always 0Tag1FLVTAGFirst tagPreviousTagSize1UI32

Size of previous tag, including its header, in bytes. For FLV version1,

this value is 11 plus the DataSize of the previous tag.

Tag2FLVTAGSecond tag.........PreviousTagSizeN-1UI32Size of second-to-last tag, including its header, in bytes.TagNFLVTAGLast tagPreviousTagSizeNUI32Size of last tag, including its header, in bytes

FLV header之后,就是 FLV File Body.

FLV File Body是由一连串的back-pointers + tags构成.back-pointers就是4个字节数据,表示前一个tag的size.

FLV Tag Definition

FLV文件中的数据都是由一个个TAG组成,TAG里面的数据可能是video、audio、scripts.

下表是TAG的结构:

1.FLVTAG

FieldTypeCommentReservedUB [2]Reserved for FMS, should be 0FilterUB [1]Indicates if packets are filtered.
0 = No pre-processing required.
1 = Pre-processing (such as decryption) of the packet is
required before it can be rendered.
Shall be 0 in unencrypted files, and 1 for encrypted tags.
See Annex F. FLV Encryption for the use of filters.TagTypeUB [5]

Type of contents in this tag. The following types are
defined:
8 = audio
9 = video
18 = script data

DataSizeUI24Length of the message. Number of bytes after StreamID to
end of tag (Equal to length of the tag – 11)TimestampUI24Time in milliseconds at which the data in this tag applies.
This value is relative to the first tag in the FLV file, which
always has a timestamp of 0.TimestampExtendedUI8Extension of the Timestamp field to form a SI32 value. This
field represents the upper 8 bits, while the previous
Timestamp field represents the lower 24 bits of the time in
milliseconds.StreamIDUI24Always 0.AudioTagHeaderIF TagType == 8
AudioTagHeader VideoTagHeaderIF TagType == 9
VideoTagHeader EncryptionHeaderIF Filter == 1
EncryptionTagHeader FilterParamsIF Filter == 1
FilterParams DataIF TagType == 8
AUDIODATA
IF TagType == 9
VIDEODATA
IF TagType == 18
SCRIPTDATAData specific for each media type.

TagType: TAG中第1个字节中的前5位表示这个TAG中包含数据的类型,8 = audio,9 = video,18 = script data.

DataSize:StreamID之后的数据长度.

Timestamp和TimestampExtended组成了这个TAG包数据的PTS信息,记得刚开始做FVL demux的时候，并没有考虑TimestampExtended的值,直接就把Timestamp默认为是PTS，后来发生的现象就是画面有跳帧的现象,后来才仔细看了一下文档发现真正数据的PTS是PTS= Timestamp | TimestampExtended<<24.

StreamID之后的数据就是每种格式的情况不一样了，接下格式进行详细的介绍.

Audio Tags

如果TAG包中的TagType==8时，就表示这个TAG是audio。

StreamID之后的数据就表示是AudioTagHeader，AudioTagHeader结构如下：

FieldTypeCommentSoundFormatUB [4]Format of SoundData. The following values are defined:
0 = Linear PCM, platform endian
1 = ADPCM
2 = MP3
3 = Linear PCM, little endian
4 = Nellymoser 16 kHz mono
5 = Nellymoser 8 kHz mono
6 = Nellymoser
7 = G.711 A-law logarithmic PCM
8 = G.711 mu-law logarithmic PCM
9 = reserved
10 = AAC
11 = Speex
14 = MP3 8 kHz
15 = Device-specific sound
Formats 7, 8, 14, and 15 are reserved.
AAC is supported in Flash Player 9,0,115,0 and higher.
Speex is supported in Flash Player 10 and higher.SoundRateUB [2]Sampling rate. The following values are defined:
0 = 5.5 kHz
1 = 11 kHz
2 = 22 kHz
3 = 44 kHzSoundSizeUB [1]

Size of each audio sample. This parameter only pertains to
uncompressed formats. Compressed formats always decode
to 16 bits internally.
0 = 8-bit samples
1 = 16-bit samples

SoundTypeUB [1]Mono or stereo sound
0 = Mono sound
1 = Stereo soundAACPacketTypeIF SoundFormat == 10
UI8The following values are defined:
0 = AAC sequence header
1 = AAC raw

AudioTagHeader的头1个字节，也就是接跟着StreamID的1个字节包含着音频类型、采样率等的基本信息.表里列的十分清楚.

AudioTagHeader之后跟着的就是AUDIODATA数据了，也就是audio payload 但是这里有个特例，如果音频格式（SoundFormat）是10 = AAC，AudioTagHeader中会多出1个字节的数据AACPacketType，这个字段来表示AACAUDIODATA的类型：0 = AAC sequence header，1 = AAC raw。

FieldTypeCommentData

IF AACPacketType ==0 AudioSpecificConfig

The AudioSpecificConfig is defined in ISO14496-3. Note that this is not the same as the contents of the esds box from an MP4/F4V file.

ELSE IF AACPacketType == 1 Raw AAC frame data in UI8 [ ]

audio payload

AAC sequence header也就是包含了AudioSpecificConfig，AudioSpecificConfig包含着一些更加详细音频的信息，AudioSpecificConfig的定义在ISO14496-3中1.6.2.1 AudioSpecificConfig，这里就不详细贴了。而且在ffmpeg中有对AudioSpecificConfig解析的函数，ff_mpeg4audio_get_config(),可以对比的看一下，理解更深刻。

AAC raw 这种包含的就是音频ES流了，也就是audio payload.

在FLV的文件中，一般情况下 AAC sequence header 这种包只出现1次，而且是第一个audio tag，为什么要提到这种tag，因为当时在做FLVdemux的时候，如果是AAC的音频，需要在每帧AAC ES流前边添加7个字节ADST头,ADST在音频的格式中会详细解读，这是解码器通用的格式，就是AAC的纯ES流要打包成ADST格式的AAC文件，解码器才能正常播放.就是在打包ADST的时候，需要samplingFrequencyIndex这个信息，samplingFrequencyIndex最准确的信息是在AudioSpecificConfig中，所以就对AudioSpecificConfig进行解析并得到了samplingFrequencyIndex。

到这步你就完全可以把FLV 文件中的音频信息及数据提取出来，送给音频解码器正常播放了。

Video Tags

如果TAG包中的TagType==9时，就表示这个TAG是video.

StreamID之后的数据就表示是VideoTagHeader，VideoTagHeader结构如下：

FieldTypeCommentFrame TypeUB [4]Type of video frame. The following values are defined:
1 = key frame (for AVC, a seekable frame)
2 = inter frame (for AVC, a non-seekable frame)
3 = disposable inter frame (H.263 only)
4 = generated key frame (reserved for server use only)
5 = video info/command frameCodecIDUB [4]Codec Identifier. The following values are defined:
2 = Sorenson H.263
3 = Screen video
4 = On2 VP6
5 = On2 VP6 with alpha channel
6 = Screen video version 2
7 = AVCAVCPacketTypeIF CodecID == 7
UI8

The following values are defined:
0 = AVC sequence header
1 = AVC NALU
2 = AVC end of sequence (lower level NALU sequence ender is not required or supported)

CompositionTimeIF CodecID == 7
SI24IF AVCPacketType == 1
Composition time offset
ELSE
0
See ISO 14496-12, 8.15.3 for an explanation of composition
times. The offset in an FLV file is always in milliseconds.

VideoTagHeader的头1个字节，也就是接跟着StreamID的1个字节包含着视频帧类型及视频CodecID最基本信息.表里列的十分清楚.

VideoTagHeader之后跟着的就是VIDEODATA数据了，也就是video payload.当然就像音频AAC一样，这里也有特例就是如果视频的格式是AVC（H.264）的话，VideoTagHeader会多出4个字节的信息.

AVCPacketType 和 CompositionTime。AVCPacketType 表示接下来 VIDEODATA （AVCVIDEOPACKET）的内容：

IF AVCPacketType == 0 AVCDecoderConfigurationRecord（AVC sequence header）
IF AVCPacketType == 1 One or more NALUs (Full frames are required)

AVCDecoderConfigurationRecord.包含着是H.264解码相关比较重要的sps和pps信息，再给AVC解码器送数据流之前一定要把sps和pps信息送出，否则的话解码器不能正常解码。而且在解码器stop之后再次start之前，如seek、快进快退状态切换等，都需要重新送一遍sps和pps的信息.AVCDecoderConfigurationRecord在FLV文件中一般情况也是出现1次，也就是第一个video tag.

AVCDecoderConfigurationRecord的定义在ISO 14496-15, 5.2.4.1中，这里不在详细贴，

SCRIPTDATA

如果TAG包中的TagType==18时，就表示这个TAG是SCRIPT.

SCRIPTDATA 结构十分复杂，定义了很多格式类型，每个类型对应一种结构.

FieldTypeCommentTypeUI8Type of the ScriptDataValue.
The following types are defined:
0 = Number
1 = Boolean
2 = String
3 = Object
4 = MovieClip (reserved, not supported)
5 = Null
6 = Undefined
7 = Reference
8 = ECMA array
9 = Object end marker
10 = Strict array
11 = Date
12 = Long stringScriptDataValueIF Type == 0
DOUBLE
IF Type == 1
UI8
IF Type == 2
SCRIPTDATASTRING
IF Type == 3
SCRIPTDATAOBJECT
IF Type == 7
UI16
IF Type == 8
SCRIPTDATAECMAARRAY
IF Type == 10
SCRIPTDATASTRICTARRAY
IF Type == 11
SCRIPTDATADATE
IF Type == 12
SCRIPTDATALONGSTRINGScript data value.
The Boolean value is (ScriptDataValue ≠ 0).

类型在FLV的官方文档中都有详细介绍.

onMetaData

onMetaData 是SCRIPTDATA中对我们来说十分重要的信息，结构如下表：

Property NameTypeCommentaudiocodecidNumberAudio codec ID used in the file (see E.4.2.1 for available SoundFormat values)audiodatarateNumberAudio bit rate in kilobits per secondaudiodelayNumberDelay introduced by the audio codec in secondsaudiosamplerateNumberFrequency at which the audio stream is replayedaudiosamplesizeNumberResolution of a single audio samplecanSeekToEndBooleanIndicating the last video frame is a key framecreationdateStringCreation date and timedurationNumberTotal duration of the file in secondsfilesizeNumberTotal size of the file in bytesframerateNumberNumber of frames per secondheightNumberHeight of the video in pixelsstereoBooleanIndicating stereo audiovideocodecidNumberVideo codec ID used in the file (see E.4.3.1 for available CodecID values)videodatarateNumberVideo bit rate in kilobits per secondwidthNumberWidth of the video in pixels

这里面的duration、filesize、视频的width、height等这些信息对我们来说很有用.

keyframes

当时在做flv demux的时候，发现官方的文档中并没有对keyframes index做描述，但是flv的这种结构每个tag又不像TS有同步头，如果没有keyframes index 的话，seek及快进快退的效果会非常差，因为需要一个tag一个tag的顺序读取。后来通过网络查一些资料，发现了一个keyframes的信息藏在SCRIPTDATA中。

keyframes几乎是一个非官方的标准，也就是民间标准.在网上已经很难看到flv文件格式，但是metadata里面不包含 keyframes项目的视频 . 两个常用的操作metadata的工具是flvtool2和FLVMDI，都是把keyframes作为一个默认的元信息项目.在FLVMDI的主页(http://www.buraks.com/flvmdi/)上有描述：

keyframes: (Object) This object is added only if you specify the /k switch. 'keyframes' is known to FLVMDI and if /k switch is not specified, 'keyframes' object will be deleted.
'keyframes' object has 2 arrays: 'filepositions' and 'times'. Both arrays have the same number of elements, which is equal to the number of key frames in the FLV. Values in times array are in 'seconds'. Each correspond to the timestamp of the n'th key frame. Values in filepositions array are in 'bytes'. Each correspond to the fileposition of the nth key frame video tag (which starts with byte tag type 9).

也就是说keyframes中包含着2个内容 'filepositions' and 'times'分别指的是关键帧的文件位置和关键帧的PTS.通过keyframes可以建立起自己的Index，然后再seek和快进快退的操作中，快速有效的跳转到你想要找的关键帧的位置进行处理。

FLV视频文件格式分析

廖雪峰 /编程 / 2012-2-2 7:02 / 阅读: 1

FLV视频格式是Adobe推出的Flash可直接播放的视频流。需要注意的概念是编码格式和文件封装格式。编码格式是指编码器输出的“裸”的视频流或音频流，常见的视频编码格式就是H.264，常见的音频编码格式是AAC和MP3。FLV是一种文件封装格式，它可以封装H264和AAC，其他常见的文件封装格式还有MP4、TS、MKV等等。不同的文件封装格式可以相互转换，只要把一种文件封装格式拆包，解出“裸”的视频流和音频流，再按另一种文件封装格式打包，就可以完成转换，不需要重新编码，因此速度非常快。

本文讨论FLV的文件封装格式，FLV的文件格式定义最权威的就是Adobe的官方文档：

该文档的Annex E列出了FLV的详细封装格式。FLV采用网络字节序（高字节在前），无符号整数。

我们先找一个FLV文件，包含H264视频和AAC音频：

464c 5601 0500 0000 0900 0000 0012 00003f00 0000 0000 0000 0200 0a6f 6e4d 65746144 6174 6108 0000 0002 0008 6475 72617469 6f6e 0040 27c8 b439 5810 6200 0c766964 656f 636f 6465 6369 6400 4000 00000000 0000 0000 0900 0000 4a08 0000 04000000 0000 0000 af00 1208 0000 000f 09000043 0000 0000 0000 0017 0000 0000 0142001f 0301 002f 6742 801f 9652 0283 f602a100 0003 0001 0000 0300 32e0 6003 0d400046 30ff 18e3 0300 186a 0002 3187 f8c70ed0 a152 4001 0004 68cb 8d48 0000 004e0900 0d1c 0000 0000 0000 0017 0100 029d0000 0d13 6588 8040 0db1 185c 0008 2d1f7893 de24 f789 f785 c2c4 f8a6 d3e2 43faf177 85ea f377 a930 f991 ea7c 4f2a f0b9

FLV的封装格式比较简单，文件开头3个标识字节“FLV”标识文件类型，紧跟一个version字节，当前版本始终是0x01，然后紧跟的一个字节0x05从低位开始第一位表示是否有视频流，第三位表示是否有音频流，最高5位保留，因此，判断是否有视频和音频：

has_video = (b & 0x01)==1has_audio = (b & 0x04)==4

上例中0x05说明该文件既含有视频又含有音频。随后4个字节表示FLV文件头的长度，一般是9，因为从文件头到此正好是9个字节，表示FLV文件头结束。

剩下的部分全部是FLV Body内容。FLV Body由一个一个的Tag构成，格式为：

Tag0的长度 | Tag1 | tag1的长度 | tag2 | tag2的长度 | ... | tagN | tagN的长度

第0个Tag也就是Tag0不存在，因此长度总是0，然后是Tag1的内容，紧跟Tag1的长度……Tag长度为4字节无符号整数。

tag有3类，ScriptTag = 0x12，又称Metadata Tag，存放视频元数据，如高、宽和关键帧等信息，VideoTag = 0x09，存放Video，AudioTag = 0x08，存放Audio。

Tag的结构如下：

Field: TypeReseved: UB(2)Filter: UB(1)TagType: UB(5)DataSize: UI24Timestamp: UI24TimestampExtended: UI8StreamID: UI24if TagType==8:    AudioTagHeader: 不定长if TagType==9:    VideoTagHeader: 不定长if Filter==1:    EncryptionTagHeader: 不定长Data: 不定长

Adobe的规范中，UB表示Unsigned Bit，UB(2)表示2个bit，UI表示Unsigned Integer，UI24表示24位整数，也就是3个Byte。

第一个字节的高2位bit保留，filter bit通常为0，低5位表示TagType，判断TagType的代码就是：

tagType = b & 0x1f

紧跟的3个byte是Tag数据的长度，即StreamID后面的数据长度，正好等于Tag的总长度减去11字节。

紧跟的3个byte是时间戳，后面再接一个byte的扩展时间戳。计算时间按如下公式计算：

扩展时间戳 << 24 + 时间戳

单位是毫秒。

紧跟的3个byte是StreamID，当前规范中始终为0。

根据TagType，紧跟的是AudioTagHeader或VideoTagHeader，剩下的就是真正的数据了。

解析FLV的Python代码如下：

def parse_flv(reader):    if reader.read_bytes(3)!='FLV':        raise StandardError('Bad FLV header')    version = reader.read_byte() # should be 0x01    b = reader.read_byte()    has_video = (b & 0x01)==0x01    has_audio = (b & 0x04)==0x04    reader.skip(4) # skip length    reader.skip(4) # skip tag 0's length    while not reader.eof():        tag_type, timestamp, data = parse_tag(reader)        if tag_type==0x12:            parse_script_tag(timestamp, data)        elif tag_type==0x08:            parse_audio_tag(timestamp, data)        elif tag_type==0x09:            parse_video_tag(timestamp, data)        else:            raise StandardError('Bad tag type')def parse_tag(reader):    tag_type = reader.read_byte() & 0x1f    data_size = reader.read_int24()    timestamp = reader.read_int24()    timestamp_ext = reader.read_int8()    reader.skip(3) # skip stream id    data = reader.read(data_size)    reader.read_int32() # tag size, should be data_size + 11    return tag_type, (timestamp_ext << 24) + timestamp, data

阅读全文

0 0