NVidia TensorRT 运行 Caffe 模型

来源:互联网 发布:php小偷源码 编辑:程序博客网 时间:2024/05/29 13:53

前面的话

NVidia发布了TensorRT,支持fp16,可以在TX1和Pascal架构的显卡,如gtx1080上运行半精度。官方说法是TensorRT对inference的加速很明显,往往可以有一倍的性能提升。而且还支持使用caffe的模型。
但不足的是还不支持自定义层,只有常用的一些层可以选用,NVidia论坛上也表示近期不太可能支持自定义层。
目前网上关于如何将TensorRT运行caffe模型的内容比较少。终于在NVidia一个workshop的PPT上发现了一段看起来很清晰的流程。抄在下面。

代码

/* Importing a Caffe Model */// create the network definitionINetworkDefinition* network = infer->createNetwork();// create a map from caffe blob names to GIE tensorsstd::unordered_map<std::string, infer1::Tensor> blobNameToTensor;// populate the network definition and mapCaffeParser* parser = new CaffeParser;parser->parse(deployFile, modelFile, *network, blobNameToTensor);// tell GIE which tensors are required outputsfor (auto& s : outputs)     network->setOutput(blobNameToTensor[s]);/*Engine Creation*/// Specify the maximum batch size and scratch sizeCudaEngineBuildContext buildContext;buildContext.maxBatchSize = maxBatchSize;buildContext.maxWorkspaceSize = 1 << 20;// create the engineICudaEngine* engine =     infer->createCudaEngine(buildContext, *network);// serialize to a C++ streamengine->serialize(gieModelStream);/*Binding Buffers*/// get array bindings for input and outputint inputIndex = engine->getBindingIndex(INPUT_BLOB_NAME),     outputIndex = engine->getBindingIndex(OUTPUT_BLOB_NAME);// set array of input and output buffersvoid* buffers[2];buffers[inputIndex] = gpuInputBuffer;/*Running the Engine*/// Specify the batch sizeCudaEngineContext context;context.batchSize = batchSize;// add GIE kernels to the given streamengine->enqueue(context, buffers, stream, NULL);//<…>// wait on the streamcudaStreamSynchronize(stream);'''
1 0