iOS 音视频采集与编码

来源：互联网发布：在淘宝买手机卡靠谱吗编辑：程序博客网时间：2024/06/05 06:53

学习过音视频的都知道，不常用的话就会容易忘记。因此，记下以前学些的点滴。（这不是入门贴）

主要分为以下主题做学习记录：

视频采集与编码
音频采集与编码

video.png

视频采集与编码

视频采集

获取输入设备

NSError *deviceError;    AVCaptureDeviceInput *inputDevice;    for (AVCaptureDevice *device in [AVCaptureDevice devicesWithMediaType:AVMediaTypeVideo])    {        if ([device position] == AVCaptureDevicePositionFront)        {            inputDevice = [AVCaptureDeviceInput deviceInputWithDevice:device error:&deviceError];        }    }

输出设备

 AVCaptureVideoDataOutput *outputDevice = [[AVCaptureVideoDataOutput alloc]init];    NSString *key = (NSString *)kCVPixelBufferPixelFormatTypeKey;        NSNumber *val = [NSNumber numberWithUnsignedInt:kCVPixelFormatType_420YpCbCr8BiPlanarFullRange];    NSDictionary *videoSettings = [NSDictionary dictionaryWithObject:val forKey:key];    outputDevice.videoSettings = videoSettings;    [outputDevice setSampleBufferDelegate:self queue:dispatch_get_main_queue()];

启动相机

    AVCaptureSession *captureSession = [[AVCaptureSession alloc]init];    [captureSession addInput:inputDevice];    [captureSession addOutput:outputDevice];        [captureSession beginConfiguration];    captureSession.sessionPreset = AVCaptureSessionPreset640x480;    [outputDevice connectionWithMediaType:AVMediaTypeVideo];    [captureSession commitConfiguration];        AVCaptureVideoPreviewLayer *previewLayer = [[AVCaptureVideoPreviewLayer alloc]initWithSession:captureSession];    previewLayer.videoGravity = AVLayerVideoGravityResizeAspectFill;    [self.view.layer addSublayer:previewLayer];    previewLayer.frame = self.view.bounds;    [captureSession startRunning];

获取输出的CMSampleBufferRef

-(void) captureOutput:(AVCaptureOutput*)captureOutput didOutputSampleBuffer:(CMSampleBufferRef)sampleBuffer fromConnection:(AVCaptureConnection*)connection{      CVImageBufferRef imageBuffer = (CVImageBufferRef)CMSampleBufferGetImageBuffer(sampleBuffer);      //TODO ： Send to server}

基于现在是颜值的社会，美颜什么的功能必须要有。因此需要在视频录制的时候直接做处理。GPUImage是不二之选。下面的Demo code 创建了一个摄像机，并把美化后的图像显示在屏幕上。

//Camera configurationGPUImageVideoCamera *videoCamera = [[GPUImageVideoCamera alloc] initWithSessionPreset:AVCaptureSessionPreset640x480 cameraPosition:AVCaptureDevicePositionBack];videoCamera.outputImageOrientation = UIInterfaceOrientationPortrait;videoCamera.frameRate = 24;//FilterGPUImageFilter *brightnessFilter = [[GPUImageBrightnessFilter alloc] init];GPUImageWhiteBalanceFilter *whiteBalanceFilter = [[GPUImageWhiteBalanceFilter alloc] init];GPUImageOutput<GPUImageInput> *output = [[GPUImageFilter alloc] init];//DisplayGPUImageView *filteredVideoView = [[GPUImageView alloc] initWithFrame:CGRectMake(0.0, 0.0, viewWidth, viewHeight)];//Targets[videoCamera addTarget: brightnessFilter];[brightnessFilter addTarget: whiteBalanceFilter];[whiteBalanceFilter addTarget: output];[whiteBalanceFilter addTarget:filteredVideoView];//Get the render output[output setFrameProcessingCompletionBlock:^(GPUImageOutput *output, CMTime time) {        GPUImageFramebuffer *imageFramebuffer = output.framebufferForOutput;        CVPixelBufferRef pixelBuffer = [imageFramebuffer pixelBuffer];        //TODO : Send to server    }];[videoCamera startCameraCapture];

在GPUImage 里面，Filter是非常重要的。理解它也很简单。首先是原图输入，然后是一系列的filter处理（duang ～～～加各种特效）。最后就是输出。

filters.png

从上面的demo code可以看到，使用系统库拿到的是CVImageBufferRef ，GPUImage 拿到的是CVPixelBufferRef ，其实它们是同个东西。

typedef CVImageBufferRef CVPixelBufferRef;

视频数据有了，接下来就是编码了。

视频编码

视频编码有硬编码和软编码。

区别:
软编码：使用CPU进行编码
硬编码：使用非CPU进行编码，如显卡GPU、专用的DSP、FPGA、ASIC芯片等

比较
软编码：实现直接、简单，参数调整方便，升级易，但CPU负载重，性能较硬编码低，低码率下质量通常比硬编码要好一点。
硬编码：性能高，低码率下通常质量低于软编码器，但部分产品在GPU硬件平台移植了优秀的软编码算法（如X264）的，质量基本等同于软编码。

下面主要讲iOS 的硬编码。配置编码器，不熟悉的请自行google。

    VTCompressionSessionRef compressionSession = NULL;    OSStatus status = VTCompressionSessionCreate(NULL,width,height, kCMVideoCodecType_H264, NULL, NULL, NULL, VideoCompressonOutputCallback, (__bridge void *)self, &compressionSession);        CGFloat videoBitRate = 800*1024;    CGFloat videoFrameRate = 24;    CGFloat videoMaxKeyframeInterval = 48;    VTSessionSetProperty(compressionSession, kVTCompressionPropertyKey_MaxKeyFrameInterval, (__bridge CFTypeRef)@(videoMaxKeyframeInterval));    VTSessionSetProperty(compressionSession, kVTCompressionPropertyKey_MaxKeyFrameIntervalDuration, (__bridge CFTypeRef)@(videoMaxKeyframeInterval/videoFrameRate));    VTSessionSetProperty(compressionSession, kVTCompressionPropertyKey_ExpectedFrameRate, (__bridge CFTypeRef)@(videoFrameRate));    VTSessionSetProperty(compressionSession, kVTCompressionPropertyKey_AverageBitRate, (__bridge CFTypeRef)@(videoBitRate));    NSArray *limit = @[@(videoBitRate * 1.5/8), @(1)];    VTSessionSetProperty(compressionSession, kVTCompressionPropertyKey_DataRateLimits, (__bridge CFArrayRef)limit);    VTSessionSetProperty(compressionSession, kVTCompressionPropertyKey_RealTime, kCFBooleanTrue);    VTSessionSetProperty(compressionSession, kVTCompressionPropertyKey_ProfileLevel, kVTProfileLevel_H264_Main_AutoLevel);    VTSessionSetProperty(compressionSession, kVTCompressionPropertyKey_AllowFrameReordering, kCFBooleanTrue);    VTSessionSetProperty(compressionSession, kVTCompressionPropertyKey_H264EntropyMode, kVTH264EntropyMode_CABAC);    VTCompressionSessionPrepareToEncodeFrames(compressionSession);

有几个概念比较重要：

码率
简单来说就是指在压缩视频的时候给这个视频指定一个参数，用以告诉压缩软件期望的压缩后视频的大小。码率的英文名为bps(bit per second)，就是用平均每秒多少bit来衡量一个视频大小。更多请点击查看
帧率
用于测量显示帧数的量度。所谓的测量单位为每秒显示帧数(Frames per Second，简称：FPS）或“赫兹”（Hz）

代码中配置的码率是800 * 1024 bit/s ,帧率是24。还有一个叫最大关键帧间隔，可设置为帧率的两倍。

VideoCompressonOutputCallback 是硬编码后的回调，

static void VideoCompressonOutputCallback(void *VTref, void *VTFrameRef, OSStatus status, VTEncodeInfoFlags infoFlags, CMSampleBufferRef sampleBuffer){    if (!sampleBuffer) return;    CFArrayRef array = CMSampleBufferGetSampleAttachmentsArray(sampleBuffer, true);    if (!array) return;    CFDictionaryRef dic = (CFDictionaryRef)CFArrayGetValueAtIndex(array, 0);    if (!dic) return;    BOOL keyframe = !CFDictionaryContainsKey(dic, kCMSampleAttachmentKey_NotSync);    uint64_t timeStamp = [((__bridge_transfer NSNumber *)VTFrameRef) longLongValue];    HardwareVideoEncoder *videoEncoder = (__bridge HardwareVideoEncoder *)VTref;    if (status != noErr) {        return;    }    if (keyframe && !videoEncoder->sps) {        CMFormatDescriptionRef format = CMSampleBufferGetFormatDescription(sampleBuffer);        size_t sparameterSetSize, sparameterSetCount;        const uint8_t *sparameterSet;        OSStatus statusCode = CMVideoFormatDescriptionGetH264ParameterSetAtIndex(format, 0, &sparameterSet, &sparameterSetSize, &sparameterSetCount, 0);        if (statusCode == noErr) {            size_t pparameterSetSize, pparameterSetCount;            const uint8_t *pparameterSet;            OSStatus statusCode = CMVideoFormatDescriptionGetH264ParameterSetAtIndex(format, 1, &pparameterSet, &pparameterSetSize, &pparameterSetCount, 0);            if (statusCode == noErr) {                videoEncoder->sps = [NSData dataWithBytes:sparameterSet length:sparameterSetSize];                videoEncoder->pps = [NSData dataWithBytes:pparameterSet length:pparameterSetSize];            }        }    }    CMBlockBufferRef dataBuffer = CMSampleBufferGetDataBuffer(sampleBuffer);    size_t length, totalLength;    char *dataPointer;    OSStatus statusCodeRet = CMBlockBufferGetDataPointer(dataBuffer, 0, &length, &totalLength, &dataPointer);    if (statusCodeRet == noErr) {        size_t bufferOffset = 0;        static const int AVCCHeaderLength = 4;        while (bufferOffset < totalLength - AVCCHeaderLength) {            uint32_t NALUnitLength = 0;            memcpy(&NALUnitLength, dataPointer + bufferOffset, AVCCHeaderLength);            NALUnitLength = CFSwapInt32BigToHost(NALUnitLength);            VideoFrame *videoFrame = [VideoFrame new];            videoFrame.timestamp = timeStamp;            videoFrame.data = [[NSData alloc] initWithBytes:(dataPointer + bufferOffset + AVCCHeaderLength) length:NALUnitLength];            videoFrame.isKeyFrame = keyframe;            videoFrame.sps = videoEncoder->sps;            videoFrame.pps = videoEncoder->pps;            if (videoEncoder.h264Delegate && [videoEncoder.h264Delegate respondsToSelector:@selector(videoEncoder:videoFrame:)]) {                [videoEncoder.h264Delegate videoEncoder:videoEncoder videoFrame:videoFrame];            }            bufferOffset += AVCCHeaderLength + NALUnitLength;        }    }}

程序调用：encodeVideoData:timeStamp 进行硬编码，数据处理完之后会从VideoCompressonOutputCallback输出。

- (void)encodeVideoData:(CVPixelBufferRef)pixelBuffer timeStamp:(uint64_t)timeStamp {    if(_isBackGround) return;    frameCount++;    CMTime presentationTimeStamp = CMTimeMake(frameCount, (int32_t)videoFrameRate);    VTEncodeInfoFlags flags;    CMTime duration = CMTimeMake(1, (int32_t)videoFrameRate);    NSDictionary *properties = nil;    if (frameCount % (int32_t)_configuration.videoMaxKeyframeInterval == 0) {        properties = @{(__bridge NSString *)kVTEncodeFrameOptionKey_ForceKeyFrame: @YES};    }    NSNumber *timeNumber = @(timeStamp);    OSStatus status = VTCompressionSessionEncodeFrame(compressionSession, pixelBuffer, presentationTimeStamp, duration, (__bridge CFDictionaryRef)properties, (__bridge_retained void *)timeNumber, &flags);    if(status != noErr){        [self resetCompressionSession];    }}

整个过程如下图，从摄像机拿到的视频数据，使用CompressionSession进行硬编码，最后输出准备用于传输的videoFrame。

videoEncode.png

在硬编码这个过程中，有一点需要注意的。当手机遇到电量较低、充电时，必然会导致手机电池严重发热发烫；此种情况下iPhone手机的h264硬编码性能有相当大概率的性能衰减，编码输出帧率严重下降；

手机H264编码器编码统计得到的实际输出帧率低于预期帧率，如摄像头采集帧率30fps、H264硬编码预期帧率20fps、实际输出帧率小于15fps；手机发热后性能H264硬编码器性能下降，二输入帧率30fps加剧编码器的能耗压力；

解决：采取主动平滑丢帧的策略，将输入帧率降低到编码器实际输出帧率以上，如实际输出帧率15fps，输入帧率调整成18fps，以降低编码器压力；待编码器实际编码输出帧率逐渐升高到18fps后，再提高输入帧率使编码实际输出帧率符合设计预期。

音频采集与编码

音频采集

获得输入设备

AudioComponentDescription acd;acd.componentType = kAudioUnitType_Output;acd.componentSubType = kAudioUnitSubType_RemoteIO;acd.componentManufacturer = kAudioUnitManufacturer_Apple;acd.componentFlags = 0;acd.componentFlagsMask = 0;AudioComponent *component = AudioComponentFindNext(NULL, &acd);

创建AudioUnit

AudioComponentInstance componetInstance;OSStatus status = noErr;status = AudioComponentInstanceNew(self.component, &_componetInstance);if (noErr != status) {    [self handleAudioInitializeError];}UInt32 flagOne = 1;AudioUnitSetProperty(self.componetInstance, kAudioOutputUnitProperty_EnableIO, kAudioUnitScope_Input, 1, &flagOne, sizeof(flagOne));AudioStreamBasicDescription desc = {0};desc.mSampleRate = 44100;desc.mFormatID = kAudioFormatLinearPCM;desc.mFormatFlags = kAudioFormatFlagIsSignedInteger | kAudioFormatFlagsNativeEndian | kAudioFormatFlagIsPacked;desc.mChannelsPerFrame = 2;desc.mFramesPerPacket = 1;desc.mBitsPerChannel = 16;desc.mBytesPerFrame = desc.mBitsPerChannel / 8 * desc.mChannelsPerFrame;desc.mBytesPerPacket = desc.mBytesPerFrame * desc.mFramesPerPacket;AURenderCallbackStruct cb;cb.inputProcRefCon = (__bridge void *)(self);cb.inputProc = InputBufferCallback;AudioUnitSetProperty(componetInstance, kAudioUnitProperty_StreamFormat, kAudioUnitScope_Output, 1, &desc, sizeof(desc));AudioUnitSetProperty(componetInstance, kAudioOutputUnitProperty_SetInputCallback, kAudioUnitScope_Global, 1, &cb, sizeof(cb));status = AudioUnitInitialize(componetInstance);if (noErr != status) {    [self handleAudioInitializeError];}

获取音频数据，其中handleInputBuffer为AudioUnit的输入回调。

static OSStatus handleInputBuffer(void *inRefCon,                                  AudioUnitRenderActionFlags *ioActionFlags,                                  const AudioTimeStamp *inTimeStamp,                                  UInt32 inBusNumber,                                  UInt32 inNumberFrames,                                  AudioBufferList *ioData) {    @autoreleasepool {        AudioCapture *source = (__bridge AudioCapture *)inRefCon;        if (!source) return -1;        AudioBuffer buffer;        buffer.mData = NULL;        buffer.mDataByteSize = 0;        buffer.mNumberChannels = 1;        AudioBufferList buffers;        buffers.mNumberBuffers = 1;        buffers.mBuffers[0] = buffer;        OSStatus status = AudioUnitRender(source.componetInstance,                                          ioActionFlags,                                          inTimeStamp,                                          inBusNumber,                                          inNumberFrames,                                          &buffers);        if (source.muted) {            for (int i = 0; i < buffers.mNumberBuffers; i++) {                AudioBuffer ab = buffers.mBuffers[i];                memset(ab.mData, 0, ab.mDataByteSize);            }        }        if (!status) {            if (source.delegate && [source.delegate respondsToSelector:@selector(captureOutput:audioData:)]) {                [source.delegate captureOutput:source audioData:[NSData dataWithBytes:buffers.mBuffers[0].mData length:buffers.mBuffers[0].mDataByteSize]];            }        }        return status;    }}

至于音频的处理，主要是用到WebRTC，后面会有专门讲WebRTC 的文件，介绍如何消除回音等。

音频编码

创建编码器

AudioConverterRef m_converter;AudioStreamBasicDescription inputFormat = {0};inputFormat.mSampleRate = 44100;inputFormat.mFormatID = kAudioFormatLinearPCM;inputFormat.mFormatFlags = kAudioFormatFlagIsSignedInteger | kAudioFormatFlagsNativeEndian | kAudioFormatFlagIsPacked;inputFormat.mChannelsPerFrame = (UInt32)2;inputFormat.mFramesPerPacket = 1;inputFormat.mBitsPerChannel = 16;inputFormat.mBytesPerFrame = inputFormat.mBitsPerChannel / 8 * inputFormat.mChannelsPerFrame;inputFormat.mBytesPerPacket = inputFormat.mBytesPerFrame * inputFormat.mFramesPerPacket;AudioStreamBasicDescription outputFormat; // 这里开始是输出音频格式memset(&outputFormat, 0, sizeof(outputFormat));outputFormat.mSampleRate =44100;       // 采样率保持一致outputFormat.mFormatID = kAudioFormatMPEG4AAC;            // AAC编码 kAudioFormatMPEG4AAC kAudioFormatMPEG4AAC_HE_V2outputFormat.mChannelsPerFrame = 2;outputFormat.mFramesPerPacket = 1024;                     // AAC一帧是1024个字节const OSType subtype = kAudioFormatMPEG4AAC;AudioClassDescription requestedCodecs[2] = {    {        kAudioEncoderComponentType,        subtype,        kAppleSoftwareAudioCodecManufacturer    },    {        kAudioEncoderComponentType,        subtype,        kAppleHardwareAudioCodecManufacturer    }};OSStatus result = AudioConverterNewSpecific(&inputFormat, &outputFormat, 2, requestedCodecs, &m_converter);;UInt32 outputBitrate = 96000;UInt32 propSize = sizeof(outputBitrate);if(result == noErr) {    result = AudioConverterSetProperty(m_converter, kAudioConverterEncodeBitRate, propSize, &outputBitrate);}

编码器准备好了，可以对采集的PCM数据进行编码。这里会有两个缓冲区：AudioBuffe，AACBuffer，大小都是audioBufferSize。当PCM的数据和AudioBuffer中的数据大小超过AudioBuffer 的大小，才送到AAC Encoder。

AudioEncode.png

- (void)encodeAudioData:(nullable NSData*)audioData timeStamp:(uint64_t)timeStamp {    if(leftLength + audioData.length >= audioBufferSize){        NSInteger totalSize = leftLength + audioData.length;        NSInteger encodeCount = totalSize / audioBufferSize;        char *totalBuf = malloc(totalSize);        char *p = totalBuf;                memset(totalBuf, (int)totalSize, 0);        memcpy(totalBuf, audioBuffer, leftLength);        memcpy(totalBuf + leftLength, audioData.bytes, audioData.length);                for(NSInteger index = 0;index < encodeCount;index++){            [self encodeBuffer:p  timeStamp:timeStamp];            p += audioBufferSize;        }                leftLength = totalSize % audioBufferSize;        memset(audioBuffer, 0, audioBufferSize);        memcpy(audioBuffer, totalBuf + (totalSize -leftLength), leftLength);        free(totalBuf);    }else{        memcpy(audioBuffer+leftLength, audioData.bytes, audioData.length);        leftLength = leftLength + audioData.length;    }}

接下来就是编码从AudioBuffer中送过来的数据。其中inputDataProc 为编码器转化数据时的回调。

- (void)encodeBuffer:(char*)buf timeStamp:(uint64_t)timeStamp{        AudioBuffer inBuffer;    inBuffer.mNumberChannels = 1;    inBuffer.mData = buf;    inBuffer.mDataByteSize = audioBufferSize;        AudioBufferList buffers;    buffers.mNumberBuffers = 1;    buffers.mBuffers[0] = inBuffer;            // 初始化一个输出缓冲列表    AudioBufferList outBufferList;    outBufferList.mNumberBuffers = 1;    outBufferList.mBuffers[0].mNumberChannels = inBuffer.mNumberChannels;    outBufferList.mBuffers[0].mDataByteSize = inBuffer.mDataByteSize;   // 设置缓冲区大小    outBufferList.mBuffers[0].mData = aacBuffer;           // 设置AAC缓冲区    UInt32 outputDataPacketSize = 1;    if (AudioConverterFillComplexBuffer(m_converter, inputDataProc, &buffers, &outputDataPacketSize, &outBufferList, NULL) != noErr) {        return;    }        AudioFrame *audioFrame = [AudioFrame new];    audioFrame.timestamp = timeStamp;    audioFrame.data = [NSData dataWithBytes:aacBuffer length:outBufferList.mBuffers[0].mDataByteSize];        char exeData[2];    ///// flv编码音频头 44100 为0x12 0x10    exeData[0] = 0x12;    exeData[1] = 0x10;    audioFrame.audioInfo = [NSData dataWithBytes:exeData length:2];    if (self.aacDeleage && [self.aacDeleage respondsToSelector:@selector(audioEncoder:audioFrame:)]) {        [self.aacDeleage audioEncoder:self audioFrame:audioFrame];    }    }

OSStatus inputDataProc(AudioConverterRef inConverter, UInt32 *ioNumberDataPackets, AudioBufferList *ioData, AudioStreamPacketDescription * *outDataPacketDescription, void *inUserData){    AudioBufferList bufferList = *(AudioBufferList *)inUserData;    ioData->mBuffers[0].mNumberChannels = 1;    ioData->mBuffers[0].mData = bufferList.mBuffers[0].mData;    ioData->mBuffers[0].mDataByteSize = bufferList.mBuffers[0].mDataByteSize;    return noErr;}

那么44100 为0x12 0x10 ，这数据是怎么得出的？

获取sampleIndex

//https://wiki.multimedia.cx/index.php?title=MPEG-4_Audio- (NSInteger)sampleRateIndex:(NSInteger)frequencyInHz {    NSInteger sampleRateIndex = 0;    switch (frequencyInHz) {    case 96000:        sampleRateIndex = 0;        break;    case 88200:        sampleRateIndex = 1;        break;    case 64000:        sampleRateIndex = 2;        break;    case 48000:        sampleRateIndex = 3;        break;    case 44100:        sampleRateIndex = 4;        break;    case 32000:        sampleRateIndex = 5;        break;    case 24000:        sampleRateIndex = 6;        break;    case 22050:        sampleRateIndex = 7;        break;    case 16000:        sampleRateIndex = 8;        break;    case 12000:        sampleRateIndex = 9;        break;    case 11025:        sampleRateIndex = 10;        break;    case 8000:        sampleRateIndex = 11;        break;    case 7350:        sampleRateIndex = 12;        break;    default:        sampleRateIndex = 15;    }    return sampleRateIndex;}

根据公式就可以计算出44100的asc。

asc[0] = 0x10 | ((sampleRateIndex>>1) & 0x7);asc[1] = ((sampleRateIndex & 0x1)<<7) | ((numberOfChannels & 0xF) << 3);

asc[0] = 0x10 | ((4>>1) & 0x7) = 0x12
asc[1] = ((4 & 0x1)<<7) | ((2 & 0xF) << 3) = 0x10

结论

经过音视频编码后，最终得到的VideoFrame和Video Frame ，它们包含了当前数据的时间戳和数据。

作者：vedon_fu
链接：http://www.jianshu.com/p/11bb9f2a9233
來源：简书
著作权归作者所有。商业转载请联系作者获得授权，非商业转载请注明出处。

阅读全文

0 0