Tutorial 06: Synching Audio同步音频

来源：互联网发布：软件测试二八原则编辑：程序博客网时间：2024/06/18 14:56

Tutorial 06: Synching Audio同步音频

Code: tutorial06.c§

Synching Audio同步音频

So now we have a decent enough player to watch a movie, so let's see what kind of loose ends we have lying around. Last time, we glossed over synchronization a little bit, namely sychronizing audio to a video clock rather than the other way around. We're going to do this the same way as with the video: make an internal video clock to keep track of how far along the video thread is and sync the audio to that. Later we'll look at how to generalize things to sync both audio and video to an external clock, too

现在我们有个像样的播放器可以观看一部电影了，那么我们来看下究竟我们用到了什么知识点。上一次，我们做了一点同步的工作，也就是采用了同步音频到一个视频时钟上的方式，而没有用另外一种方式。我们要对视频也这么做：做一个内部的视频时钟，跟踪视频线程运行了多久了，并让音频同步到视频线程上。随后，我们来看一下如何做一些事情，也可以同步音频和视频到一个外部的时钟上。

=========================================

Implementing the video clock 实现视频时钟

Now we want to implement a video clock similar to the audio clock we had last time: an internal value that gives the current time offset of the video currently being played. At first, you would think that this would be as simple as updating the timer with the current PTS of the last frame to be shown. However, don't forget that the time between video frames can be pretty long when we get down to the millisecond level. The solution is to keep track of another value, the time at which we set the video clock to the PTS of the last frame. That way the current value of the video clock will be PTS_of_last_frame + (current_time - time_elapsed_since_PTS_value_was_set). This solution is very similar to what we did withget_audio_clock.

现在，我们想实现一个视频时钟，这与我们上次做的音频时钟类似：一个内部值给出了当前正在播放的视频的时间偏移量。起初，你会认为这可能和使用将要显示的上一个的帧的当前PTS，来更新timer一样简单。然而，不要忘了，当在毫秒级别时，视频帧之间的时间间隔可以相当长。解决方法就是跟踪另一个值，这个值就是我们给上一个帧的PTS所设置的视频时钟值。这一方法中，视频时钟的当前值是PTS_of_last_frame + (current_time - time_elapsed_since_PTS_value_was_set)。这一解决方案与get_audio_clock所做的极其类似。

So, in our big struct, we're going to put a double video_current_pts and a int64_t video_current_pts_time. The clock updating is going to take place in the video_refresh_timerfunction:

因此，在我们的big struct中，我们要放入一个double类型的 video_current_pts和int64_t video_current_pts_time. 这一时钟更新将会发生在video_refresh_timer函数中。

Don't forget to initialize it in stream_component_open:

不要在忘记stream_component_open中初始化她：

And now all we need is a way to get the information:

现在我们要做的是需要一个方式可以获取这些信息：

=======================================

Abstracting the clock 抽取时钟

But why force ourselves to use the video clock? We'd have to go and alter our video sync code so that the audio and video aren't trying to sync to each other. Imagine the mess if we tried to make it a command line option like it is in ffplay. So let's abstract things: we're going to make a new wrapper function,get_master_clock that checks an av_sync_type variable and then call get_audio_clock,get_video_clock, or whatever other clock we want to use.

但为啥要强迫我们自己使用视频时钟呢？我们必须去改变我的视频同步代码，这样，音频和视频就不用互相同步到彼此了。可以想象下如同ffplay中所采用的命令行的选项一样，我们做成命令行会带来多少混乱。那么，我们来抽取这些：我们要做一个新的封装函数，get_master_clock（得到主时钟），她可以检查一个av_sync_type的变量，然后调用get_audio_clock,get_video_clock,或者是其他我们想要使用的时钟。

We could even use the computer clock, which we'll call get_external_clock:

我们甚至可以采用计算机时钟，函数是get_external_clock（获取外部时钟）:

====================================================

Synchronizing the Audio同步视频

Now the hard part: synching the audio to the video clock. Our strategy is going to be to measure where the audio is, compare it to the video clock, and then figure out how many samples we need to adjust by, that is, do we need to speed up by dropping samples or do we need to slow down by adding them?

现在到了最难的部分了：同步音频到视频时钟上。我们的策略是，测量出音频在哪里，将音频和视频时钟对比，然后确定我们需要调整多少采样，就是说，是否需要通过丢弃采样点来加速或者是添加采样点以减速呢？

We're going to run a synchronize_audio function each time we process each set of audio samples we get to shrink or expand them properly. However, we don't want to sync every single time it's off because process audio a lot more often than video packets. So we're going to set a minimum number of consecutive calls to the synchronize_audio function that have to be out of sync before we bother doing anything. Of course, just like last time, "out of sync" means that the audio clock and the video clock differ by more than our sync threshold.

当我们处理每组音频采样点的时候，每次都要运行一个synchronize_audio（同步音频）的函数，让这些采样点合适的收缩或者扩展。然而，我们不想每个时间点都要同步，因为处理音频比要要处理视频包更有经常性。这样，我们要设定最小数量的对synchronize_audio函数的连续调用，这样在我们处理任何事情之前，难免会出现不同步的情况。好吧，就像上次，“out of sync”异味着音频时钟和视频时钟都比我们的同步阏值差异更大。

Note: What the heck is going on here? This equation looks like magic! Well, it's basically a weighted mean using a geometric series as weights. I don't know if there's a name for this (I even checked Wikipedia!) but for more info, here's an explanation§ (or at weightedmean.txt)§So we're going to use a fractional coefficient, say c, and So now let's say we've gotten N audio sample sets that have been out of sync. The amount we are out of sync can also vary a good deal, so we're going to take an average of how far each of those have been out of sync. So for example, the first call might have shown we were out of sync by 40ms, the next by 50ms, and so on. But we're not going to take a simple average because the most recent values are more important than the previous ones. So we're going to use a fractional coefficient, say c, and sum the differences like this: diff_sum = new_diff + diff_sum*c. When we are ready to find the average difference, we simply calculate avg_diff = diff_sum * (1-c).

注意，到这里，我们讨论了些什么呢？是不是有点玄幻的色彩了！好吧，基本上是这样的，好比加权平均值，是用几何级数来加权的。参考更多，查阅here's an explanation§ 或者weightedmean.txt。我们现在要用分数系数，称为C，现在我们有了N音频采样点没有同步。没有同步的数量差异相当大。我们要对那些不同步的采样点取一个平均的值。举例来说，第一次调用可能告诉我们40ms不同步，下一次就有50ms，等等。但是我们不能简单的取一个平均值，因为大多数最近的值比先前的值要重要。那么我们要用一个分数系数，称为C，对如同diff_sum = new_diff + diff_sum*c.的差分值进行求和。如果想要得出平均值，计算avg_diff = diff_sum * (1-c).

Here's what our function looks like so far:我们的函数如下所示：

/* Add or subtract samples to get a better sync, return new   audio buffer size 添加或者抽取采样点，以得到更好的同步。返回新的音频缓冲区大小。*/int synchronize_audio(VideoState *is, short *samples,              int samples_size, double pts) {  //同步音频  int n;  double ref_clock;   n = 2 * is->audio_st->codec->channels;   if(is->av_sync_type != AV_SYNC_AUDIO_MASTER) {    double diff, avg_diff;    int wanted_size, min_size, max_size, nb_samples;       ref_clock = get_master_clock(is);    diff = get_audio_clock(is) - ref_clock;     if(diff < AV_NOSYNC_THRESHOLD) {      // accumulate the diffs      is->audio_diff_cum = diff + is->audio_diff_avg_coef    * is->audio_diff_cum;      if(is->audio_diff_avg_count < AUDIO_DIFF_AVG_NB) {    is->audio_diff_avg_count++;      } else {    avg_diff = is->audio_diff_cum * (1.0 - is->audio_diff_avg_coef);        /* Shrinking/expanding buffer code.... */ 缩小或者是增大缓冲区大小的代码       }    } else {      /* difference is TOO big; reset diff stuff */        /*差别太大了，重新设置填充物*/      is->audio_diff_avg_count = 0;      is->audio_diff_cum = 0;    }  }  return samples_size;}

So we're doing pretty well; we know approximately how off the audio is from the video or whatever we're using for a clock. So let's now calculate how many samples we need to add or lop off by putting this code where the "Shrinking/expanding buffer code" section is:

我们现在做的已经相当好了。我们近似的知道了音频距离视频有多远，知道了对于时钟要用到多少。现在我们来计算下我们需要添加或者减少多少采样点，

"Shrinking/expanding buffer code"这部分的代码如下：

Remember that audio_length * (sample_rate * # of channels * 2) is the number of samples in audio_length seconds of audio. Therefore, number of samples we want is going to be the number of samples we already have plus or minus the number of samples that correspond to the amount of time the audio has drifted. We'll also set a limit on how big or small our correction can be because if we change our buffer too much, it'll be too jarring to the user.

记住，音频长度*（采样率*#通道*2的）是在audio_length音频秒数中的采样数。因此，我们所需要的是采样数是将和音频drift的时间量相对应的采样数进行乘法和除法之后得到的。如果我们改变缓冲区太多的话，对于用户不太好，所以，我们要对修正的大小做些限制。

==================================

Correcting the number of samples 修正采样率

Now we have to actually correct the audio. You may have noticed that our synchronize_audio function returns a sample size, which will then tell us how many bytes to send to the stream. So we just have to adjust the sample size to the wanted_size. This works for making the sample size smaller. But if we want to make it bigger, we can't just make the sample size larger because there's no more data in the buffer! So we have to add it. But what should we add? It would be foolish to try and extrapolate audio, so let's just use the audio we already have by padding out the buffer with the value of the last sample.

现在开始修正音频了。你可能注意到了，synchronize_audio函数返回一个采样值，她告诉我们有多少字节发送给流。这样我们要根据wanted_size调整样本大小。这种方式，可以让采样值小一些。如果想要采样值大点，我们是不能仅仅把采样点的大小变大的，因为缓冲区里头没有更多的数据了，咋办呢，给缓冲区添加数据呗。加点啥数据呢？试图外推音频的想法是愚蠢的，要用上一个采样点的值填充缓冲区。

====================================

Now we return the sample size, and we're done with that function. All we need to do now is use it:

现在函数返回的是采样值，我们已经做好那个函数。现在就要用她了：

All we did is inserted the call to synchronize_audio. (Also, make sure to check the source code where we initalize the above variables I didn't bother to define.)

One last thing before we finish: we need to add an if clause to make sure we don't sync the video if it is the master clock:

我们所作的就是插入了对synchronize_audio的调用。（而且，确保检查上面我们用于初始化上面的我不愿意再麻麻烦烦的去定义的变量的代码）

完成之前，还要注意一点：我们需要添加一个if 语句以确保如果是master 时钟的话，我们没有同步视频。

FFPlay still doesn't "know if this is the best guess." */

逃过或者重复这个帧。考虑到延迟的情况。FFPLAY依旧不“明白是否这是最好的设想”

And that does it! Make sure you check through the source file to initialize any variables that I didn't bother defining or initializing. Then compile it:

gcc -o tutorial06 tutorial06.c -lavutil -lavformat -lavcodec -lz -lm`sdl-config --cflags --libs`

成功了！确保你检查了源码文件，初始化好了任何我不想费劲去定义或者初始化的变量。然后编译他。

and you'll be good to go.

Next time we'll make it so you can rewind and fast forward your movie.

下一节，我们将完成这个播放器，这样你就可以重复播放和快进电影了。

>> Seeking§