The Structure of an MPEG-DASH MPD(译)

来源:互联网 发布:美国与中国的差距知乎 编辑:程序博客网 时间:2024/05/29 18:05
from:  https://www.brendanlong.com/the-structure-of-an-mpeg-dash-mpd.html

//
The Structure of an MPEG-DASH MPD
March 20, 2015

The MPEG-DASH Media Presentation Description (MPD) is an XML document containing information about media segments, their relationships and information necessary to choose between them, and other metadata that may be needed by clients.In this post, I describe the most important pieces of the MPD, starting from the top level (Periods) and going to the bottom (Segments). In a later post, I cover common informative metadata. Other topics that I might cover include MPD events, in-band events ('emsg'), and encryption (DRM).For more information, refer to the latest version of ISO/IEC 23009-1, which ISO makes available for free.

//Periods
Periods, contained in the top-level MPD element, describe a part of the content with a start time and duration. Multiple Periods can be used for scenes or chapters, or to separate ads from program content.

//Adaptation Sets
Adaptation Sets contain a media stream or set of media streams. In the simplest case, a Period could have one Adaptation Set containing all audio and video for the content, but to reduce bandwidth, each stream can be split into a different Adaptation Set. A common case is to have one video Adaptation Set, and multiple audio 
Adaptation Sets (one for each supported language). Adaptation Sets can also contain subtitles or arbitrary metadata.Adaptation Sets are usually chosen by the user, or by a user agent (web browser or TV) using the user's preferences (like their language or accessibility needs).

//Representations
Representations allow an Adaptation Set to contain the same content encoded in different ways. In most cases, Representations will be provided in multiple screen sizes and bandwidths. This allows clients to request the highest quality content that they can play without waiting to buffer, without wasting bandwith on unneeded pixels (a 
720p TV doesn't need 1080p content). Representations can also be encoded with different codecs, allowing support for clients with different supported codecs (as occurs 
in browsers, with some supporting MPEG-4 AVC / h.264 and some supporting VP8), or to provide higher quality Representations to newer clients while still supporting legacy clients (providing both h.264 and h.265, for example). Multiple codecs can also be useful on battery-powered devices, where a device might chose an older codec because it has hardware support (lower battery usage), even if it has software support for a newer codec.Representations are usually chosen automatically, but some players allow users to override the choices (especially the resolution). A user might choose to make their own representation choices if they don't want to waste bandwidth in a particular video (maybe they only care about the audio), or if they're willing to have the video stop and buffer in exchange for higher quality.

//SubRepresentations
SubRepresentations contain information that only applies to one media stream in a Representation. For example, if a Representation contain both audio and video, it could have a SubRepresentation to give additional information which only applies to the audio. This additional information could be specific codecs, sampling rates, embedded subtiles. SubRepresentations also provide information necessary to extract one stream from a multiplexed container, or to extract a lower quality version of a stream (like only I-frames, which is useful in fast-forward mode).

//Media Segments
Media segments are the actual media files that the DASH client plays, generally by playing them back-to-back as if they were the same file (although things can get much more complicated when switching between representations). Formats will be covered in more detail by my post on profiles, but the two containers described by MPEG are the ISO Base Media File Format (ISOBMFF), which is similar to the MPEG-4 container format, and MPEG-TS. WebM in DASH is described in a document on the WebM project's wiki.

Media segments是dash客户端实际播放的文件。当一个接一个的播放时就好像播放同一个文件一样(虽然在不同Representations切换很复杂)。详细的格式可在我的配置文件帖子里面看到,MPEG的二种容器是ISOBMFF格式(该格式和MPEG-4容器格式及MPEG-TS格式很相似)。DASH中WebM格式中WebM项目的wiki文档里面有描述。

Media Segment locations can be described using BaseURL for a single-segment Representation, a list of segments (SegmentList) or a template (SegmentTemplate). Information that applies to all segments can be found in a SegmentBase. Segment start times and durations can be described with a SegmentTimeline (especially important for live streaming, so a client can quickly determine the latest segment). This information can also appear at higher levels in the MPD, in which case the information provides is the default unless overridden by information lower in the XML hierarchy. This is particularly useful with SegmentTemplate.
译:Media Segment位置可以通过BaseURL来描述(对单Segment Representation而言),也可以通过单个SegmentList来描述,还可以通过SegmentTemplate(对多Segment Representation而言)来描述。适用于所有Segments的信息可以在SegmentBase找到。Segment开始时间和时长可以用SegmentTimeline描述(对直播流特别重要,因为客户端可以快速定位上一个Segment)。该信息也可以在更高等级的MPD找到,信息是默认提供的,除非在XML层次较低的信息覆盖。这对SegmentTemplate特别有用。

Segments can be in separate files (common for live streaming), or they can be byte ranges within a single file (common for static / "on-demand").
Segments也可以在独立的文件(一般是直播),也可以是一个独立文件里面的字节范围。

// Index Segments
Index Segments come in two types: one Representation Index Segment for the entire Representation, or a Single Index Segment per Media Segment. A Representation Index Segment is always a separate file, but a Single Index Segment can be a byte range in the same file as the Media Segment.
有2种索引Segments的方法:一个Representation Index Segment用于整个Representation, 或者一个Index Segment对于一个Media Segment。一个Representation Index Segment总是一个独立的文件,但是一个Index Segment可以是Media Segment里面的字节范围。

Index Segments contain ISOBMFF 'sidx' boxes, with information about Media Segment durations (in both bytes and time), stream access point types, and optionally subsegment information in 'ssix' boxes (the same information, but within segments). In the case of a Representation Index Segment, the 'sidx' boxes come one after another, but they are preceded by an 'sidx' for the index segment itself.
索引Segments包含ISOBMFF的名为sidx的boxes, 拥有Media Segment的时间信息(包括字节数和时长)。

// Example
Before finishing, I'll include a commented example of an MPD, to show how these parts work together.

在完成之前,我会引入一个mpd的例子,来演示这些部分如何工作的。


<?xml version="1.0"?>
<MPD xmlns="urn:mpeg:dash:schema:mpd:2011" profiles="urn:mpeg:dash:profile:full:2011" minBufferTime="PT1.5S">
    <!-- Ad --> <!-- 广告 -->
    <Period duration="PT30S"><!-- 持续时间为30s -->
        <BaseURL>ad/</BaseURL>
        <!-- Everything in one Adaptation Set --> <!-- 所有信息在一个Adaption Set -->
        <AdaptationSet mimeType="video/mp2t">
            <!-- 720p Representation at 3.2 Mbps --> <!-- 带宽3.2Mbps时使用720p的Representation -->
            <Representation id="720p" bandwidth="3200000" width="1280" height="720">
                <!-- Just use one segment, since the ad is only 30 seconds long --> <!-- 只有一个Segment,因为广告只有30s -->
                <BaseURL>720p.ts</BaseURL>
                <SegmentBase>
                    <RepresentationIndex sourceURL="720p.sidx"/>
                </SegmentBase>
            </Representation>
            <!-- 1080p Representation at 6.8 Mbps --> <!-- 带宽6.8Mbps时使用1080p的Representation -->
            <Representation id="1080p" bandwidth="6800000" width="1920" height="1080">
                <BaseURL>1080p.ts</BaseURL>
                <SegmentBase>
                    <RepresentationIndex sourceURL="1080p.sidx"/>
                </SegmentBase>
            </Representation>
        </AdaptationSet>

    </Period>


    <!-- Normal Content --><!-- 常规内容 -->
    <Period duration="PT5M">
        <BaseURL>main/</BaseURL>
        <!-- Just the video --> <!-- 只是video -->
        <AdaptationSet mimeType="video/mp2t">
            <BaseURL>video/</BaseURL>
            <!-- 720p Representation at 3.2 Mbps --> <!-- 带宽3.2Mbps时使用720p的Representation -->
            <Representation id="720p" bandwidth="3200000" width="1280" height="720">
                <BaseURL>720p/</BaseURL>
                <!-- First, we'll just list all of the segments --> <!--  首先,我们只是列出所有的segments -->
                <!-- Timescale is "ticks per second", so each segment is 1 minute long --> <!-- 5400000 / 90000 = 60, 可知1分钟 -->
                <SegmentList timescale="90000" duration="5400000">
                    <RepresentationIndex sourceURL="representation-index.sidx"/>
                    <SegmentURL media="segment-1.ts"/>
                    <SegmentURL media="segment-2.ts"/>
                    <SegmentURL media="segment-3.ts"/>
                    <SegmentURL media="segment-4.ts"/>
                    <SegmentURL media="segment-5.ts"/>
                    <SegmentURL media="segment-6.ts"/>
                    <SegmentURL media="segment-7.ts"/>
                    <SegmentURL media="segment-8.ts"/>
                    <SegmentURL media="segment-9.ts"/>
                    <SegmentURL media="segment-10.ts"/>
                </SegmentList>
            </Representation>
            <!-- 1080p Representation at 6.8 Mbps --> <!-- 带宽6.8Mbps时使用1080p的Representation -->
            <Representation id="1080p" bandwidth="6800000" width="1920" height="1080">
                <BaseURL>1080/</BaseURL>
                <!-- Since all of our segments have similar names, this time we'll use a SegmentTemplate --> <!-- 因为所有segments拥有相同的名称,这次我们使用SegmentTemplate -->
                <SegmentTemplate media="segment-$Number$.ts" timescale="90000">
                    <RepresentationIndex sourceURL="representation-index.sidx"/>
                    <!-- Let's add a SegmentTimeline so the client can easily see how many segments there are --> <!-- 让我们添加一个SegmentTimeline,这样客户端可以很容易看出有多少个segments -->
                    <SegmentTimeline>
                        <!-- This reads: Starting from time 0, there are 10 segments with a duration of
                             (5400000 / @timescale) seconds --> <!-- 从time: 0开始;有10个segments;时间范围为:5400000  / @timescale -->
                        <S t="0" r="10" d="5400000"/>
                    </SegmentTimeline>
                </SegmentTemplate>
            </Representation>
        </AdaptationSet>
        <!-- Just the audio --> <!-- 只是audio -->
        <AdaptationSet mimeType="audio/mp2t">
            <BaseURL>audio/</BaseURL>
            <!-- We're just going to offer one audio representation, since audio bandwidth isn't very
                 important. --> <!-- 我们打算提供一个音频representation, 因为音频带宽不是特别重要 -->
            <Representation id="audio" bandwidth="128000">
                <SegmentTemplate media="segment-$Number$.ts" timescale="90000">
                    <RepresentationIndex sourceURL="representation-index.sidx"/>
                    <SegmentTimeline>
                        <S t="0" r="10" d="5400000"/>
                    </SegmentTimeline>
                </SegmentTemplate>
            </Representation>
        </AdaptationSet>
    </Period>
</MPD>
Conclusion
This should provide enough information to understand the structure of an MPD, and the general idea of how a basic DASH client works. Next time, I'll discuss additional 


metadata, which can be used to make a client much smarter, and provide a better user experience.


Contact
To respond to this post, send me an email at self@brendanlong.com.
0 0
原创粉丝点击