小白的进阶之路—Caffe移动端的编译及jni开发

来源：互联网发布：分布式网络拓扑结构编辑：程序博客网时间：2024/06/10 05:19

目前深度学习很火，但是在移动端的表现并不是很好，尤其是缺乏GPU硬件支撑下，效率也是个很关键的因素。即使最近facebook开源的caffe2，号称轻量级和高度模块化，由于教程不是太多，所以过段时间再去熟悉。

本文项目来源：静态手势的识别，目前不管是数据集还是识别均没有较好的方案。在这种情况下，本文着力于找到一种合适的移动端深度学习运行方案。本文所涉及的基础知识（c++，安卓开发，linux编译安卓so，NDK，eclipse）。

1.深度学习框架选择：Caffe ， tesorflow ， mxnet ， tinycnn。检测方案：离线，在线。

由于本人在pc端一直使用caffe进行手势训练，对caffe更加熟悉，所以选择caffe进行移植。

Tinycnn，纯c++11编写的神经网络库，稳定性和准确率需要验证。

Tensorflow，源码太大，编译复杂。

Caffe-mobile，https://github.com/solrex/caffe-mobile.git，依赖库较少，适合移动端。

Caffe-android-lib，https://github.com/sh1r0/caffe-android-lib.git，需要编译的库太多，编译较为繁琐。

2.开始

API LEVEL>=16。

在尝试多种lib之后，采用caffe-mobile。期间以前版本不支持一些layer，导致读取resnet的prototxt发生错误，向作者提交issue后，更新代码最终成功读取和识别。

编译环境:ubuntu14.04+caffe-mobile。

编译过程：编译过程会下载protobuf和openblas两个依赖库。参考read.md中build android部分。这里需要注意，编译脚本中设置好api level。这里有一个坑，源码和依赖库的api level一定要一致，否则就会编译错误，之前编译不过就是这个问题。根据需要运行的板子，选择一种对应的target。目前支持armeabi 和armeabi-v7a，armeabi-64。选择armeabi-v7a，会发现生成了libcaffe.a，libproto.a，libcaffejni.so。github原教程里是说最终只使用libcaffejni.so，意思是说在开发中不需要再开发jni了。但是我们实际上jni还有其他需要实现的功能，所以这里只使用libcaffe.a这个库。为了方便，把生成caffe.a的脚本中的static改成shared，把caffe核心库设置成动态库即可。

开发工具：Eclipse with windows。环境配置：NDK+opencv+caffe。

NDK：介绍见http://blog.chinaunix.net/uid-26524139-id-3376699.html。

配置：鉴于本文是面向广大小白的，所以就详细的说一下步骤吧。

A：eclipse opencv配置

如果没有下载过OpenCV4AndroidSDK，先下载https://sourceforge.net/projects/opencvlibrary/files/opencv-android/，我用的是2.4.10。官方文档见：http://docs.opencv.org/2.4/doc/tutorials/introduction/android_binary_package/O4A_SDK.html。下载后解压至英文路径，防止中文路径错误，下面NDK同理。

然后在eclipse中Importopencv工程。如下图，选中目录，点击确定。

B：NDK 配置. 先下载https://developer.android.google.cn/ndk/downloads/index.html。下载后解压。

至此，即可新建android工程或者打开已有项目。

切换至C/C++ view，打开项目属性->在项目的属性框->Android中，点击Add，选择opencv，导入library。

jni的头文件目录包含：

C：jni编写

将上面生成的libcaffe.so拷贝至jni目录中，并建立include文件夹，将caffe的include头文件拷贝至该目录，如果编译时提示缺少头文件，请至git源码中拷贝。也有可能会提示缺少cuda等头文件，这是由于没有设置cpu宏，只需要define该宏即可。最后提示一下，jni的接口函数一定要和包名匹配，否则加载so会出错。

最后贴一下我的android.mk如下：

LOCAL_PATH := $(call my-dir)include $(CLEAR_VARS)LOCAL_MODULE    := caffeLOCAL_SRC_FILES := libcaffe.soinclude $(PREBUILT_SHARED_LIBRARY)include $(CLEAR_VARS)#OPENCV_CAMERA_MODULES:=off#OPENCV_INSTALL_MODULES:=offOPENCV_LIB_TYPE:=STATICinclude  C:/Works/OpenCV-2.4.10-android-sdk/sdk/native/jni/OpenCV.mkLOCAL_MODULE := DetectionBasedTrackerLOCAL_SHARED_LIBRARIES += caffeLOCAL_LDLIBS     += -llog -ldlLOCAL_C_INCLUDES += $(LOCAL_PATH)/includeLOCAL_SRC_FILES  := caffe_jni.cpp#caffe_mobile.cppinclude $(BUILD_SHARED_LIBRARY)

jni.cpp

#include <jni.h>#include <android/log.h>#include "caffe/caffe.hpp"#include "caffe_jni.h"#define CPU_ONLY#define USE_NEON_MATH//#include "caffe_mobile.h"#include <opencv2/core/core.hpp>#include <opencv2/opencv.hpp>#include <cascadebuffer.h>#include <string>#include <vector>#include <algorithm>#include <iosfwd>#include <memory>#include <string>#include <utility>#include <android/log.h>#define LOG_TAG "FaceDetection/DetectionBasedTracker"#define LOGD(...) ((void)__android_log_print(ANDROID_LOG_DEBUG, LOG_TAG, __VA_ARGS__))#define LOGF(...) ((void)__android_log_print(ANDROID_LOG_FATAL, LOG_TAG, __VA_ARGS__))#define LOGI(...) ((void)__android_log_print(ANDROID_LOG_INFO, LOG_TAG, __VA_ARGS__))//using namespace std;using namespace cv;using namespace caffe;using std::string;/* Pair (label, confidence) representing a prediction. */typedef std::pair<string, float> Prediction;class Classifier { public:  Classifier();  string model_file;  string trained_file;  string mean_file;  string label_file;  void LoadModel();  std::vector<Prediction> Classify(const cv::Mat& img, int N = 5); private:  void SetMean(const string& mean_file);  std::vector<float> Predict(const cv::Mat& img);  void WrapInputLayer(std::vector<cv::Mat>* input_channels);  void Preprocess(const cv::Mat& img,                  std::vector<cv::Mat>* input_channels); private:  shared_ptr<Net<float> > net_;  cv::Size input_geometry_;  int num_channels_;  cv::Mat mean_;  std::vector<string> labels_;};Classifier::Classifier(){}Classifier Caffeclassifier; //caffe 分类器void Classifier::LoadModel(){#ifdef CPU_ONLY  Caffe::set_mode(Caffe::CPU);#else  Caffe::set_mode(Caffe::GPU);#endif  LOGF("1,Load  Model !");  /* Load the network. */  net_.reset(new Net<float>(model_file, TEST));  LOGF("8888888888,Load  Model !");  net_->CopyTrainedLayersFrom(trained_file);  LOGF("111,Load  Model !");  LOGF("2,Load  Model  %d !" , net_->input_blobs().size());  Blob<float>* input_layer = net_->input_blobs()[0];  LOGF("22,Load  Model !");  num_channels_ = input_layer->channels();  input_geometry_ = cv::Size(input_layer->width(), input_layer->height());  LOGF("3,Load  Model !");  /* Load the binaryproto mean file. */  SetMean(mean_file);  LOGF("4,Load  Model !");  /* Load labels. */  std::ifstream labels(label_file.c_str());  string line;  while (std::getline(labels, line))    labels_.push_back(string(line));  LOGF("5,Load  Model !");  Blob<float>* output_layer = net_->output_blobs()[0];  LOGF("6,Load  Model !");}static bool PairCompare(const std::pair<float, int>& lhs,                        const std::pair<float, int>& rhs) {  return lhs.first > rhs.first;}/* Return the indices of the top N values of vector v. */static std::vector<int> Argmax(const std::vector<float>& v, int N) {  std::vector<std::pair<float, int> > pairs;  for (size_t i = 0; i < v.size(); ++i)    pairs.push_back(std::make_pair(v[i], static_cast<int>(i)));  std::partial_sort(pairs.begin(), pairs.begin() + N, pairs.end(), PairCompare);  std::vector<int> result;  for (int i = 0; i < N; ++i)    result.push_back(pairs[i].second);  return result;}/* Return the top N predictions. */std::vector<Prediction> Classifier::Classify(const cv::Mat& img, int N) {  std::vector<float> output = Predict(img);  N = std::min<int>(labels_.size(), N);  std::vector<int> maxN = Argmax(output, N);  std::vector<Prediction> predictions;  for (int i = 0; i < N; ++i) {    int idx = maxN[i];    predictions.push_back(std::make_pair(labels_[idx], output[idx]));  }  return predictions;}/* Load the mean file in binaryproto format. */void Classifier::SetMean(const string& mean_file) { /*BlobProto blob_proto;  LOGF("5,Load  Model %s" , mean_file.c_str());  ReadProtoFromBinaryFileOrDie(mean_file.c_str(), &blob_proto);  LOGF("7,Load  Model fail!");  // Convert from BlobProto to Blob<float>  Blob<float> mean_blob;  LOGF("7,Load  Model %d" , num_channels_);  mean_blob.FromProto(blob_proto);  LOGF("8,Load  Model %d" , num_channels_); //  The format of the mean file is planar 32-bit float BGR or grayscale.  std::vector<cv::Mat> channels;  float* data = mean_blob.mutable_cpu_data();  for (int i = 0; i < num_channels_; ++i) {    // Extract an individual channel.    cv::Mat channel(mean_blob.height(), mean_blob.width(), CV_32FC1, data);    channels.push_back(channel);    data += mean_blob.height() * mean_blob.width();  }  // Merge the separate channels into a single image.  cv::Mat mean;  cv::merge(channels, mean);*/  /* Compute the global mean pixel value and create a mean image   * filled with this value. */cv::Scalar channel_mean;channel_mean[0] = 127.311;channel_mean[1] = 127.67;channel_mean[2] = 130.743;mean_ = cv::Mat(input_geometry_, CV_32FC3, channel_mean);}std::vector<float> Classifier::Predict(const cv::Mat& img) {  Blob<float>* input_layer = net_->input_blobs()[0];  input_layer->Reshape(1, num_channels_,                       input_geometry_.height, input_geometry_.width);  /* Forward dimension change to all layers. */  net_->Reshape();  std::vector<cv::Mat> input_channels;  WrapInputLayer(&input_channels);  Preprocess(img, &input_channels);  net_->Forward();  /* Copy the output layer to a std::vector */  Blob<float>* output_layer = net_->output_blobs()[0];  const float* begin = output_layer->cpu_data();  const float* end = begin + output_layer->channels();  return std::vector<float>(begin, end);}/* Wrap the input layer of the network in separate cv::Mat objects * (one per channel). This way we save one memcpy operation and we * don't need to rely on cudaMemcpy2D. The last preprocessing * operation will write the separate channels directly to the input * layer. */void Classifier::WrapInputLayer(std::vector<cv::Mat>* input_channels) {  Blob<float>* input_layer = net_->input_blobs()[0];  int width = input_layer->width();  int height = input_layer->height();  float* input_data = input_layer->mutable_cpu_data();  for (int i = 0; i < input_layer->channels(); ++i) {    cv::Mat channel(height, width, CV_32FC1, input_data);    input_channels->push_back(channel);    input_data += width * height;  }}void Classifier::Preprocess(const cv::Mat& img,                            std::vector<cv::Mat>* input_channels) {  /* Convert the input image to the input image format of the network. */  cv::Mat sample;  if (img.channels() == 3 && num_channels_ == 1)    cv::cvtColor(img, sample, cv::COLOR_BGR2GRAY);  else if (img.channels() == 4 && num_channels_ == 1)    cv::cvtColor(img, sample, cv::COLOR_BGRA2GRAY);  else if (img.channels() == 4 && num_channels_ == 3)    cv::cvtColor(img, sample, cv::COLOR_BGRA2BGR);  else if (img.channels() == 1 && num_channels_ == 3)    cv::cvtColor(img, sample, cv::COLOR_GRAY2BGR);  else    sample = img;  cv::Mat sample_resized;  if (sample.size() != input_geometry_)    cv::resize(sample, sample_resized, input_geometry_);  else    sample_resized = sample;  cv::Mat sample_float;  if (num_channels_ == 3)    sample_resized.convertTo(sample_float, CV_32FC3);  else    sample_resized.convertTo(sample_float, CV_32FC1);  cv::Mat sample_normalized;  cv::subtract(sample_float, mean_, sample_normalized);  /* This operation will write the separate BGR planes directly to the   * input layer of the network because it is wrapped by the cv::Mat   * objects in input_channels. */  cv::split(sample_normalized, *input_channels);  CHECK(reinterpret_cast<float*>(input_channels->at(0).data)        == net_->input_blobs()[0]->cpu_data())    << "Input channels are not wrapping the input layer of the network.";}Prediction TestModel(Mat img){//调用caffe进行二次识别double time1 = clock();resize(img , img , cv::Size(100,100));std::vector<Prediction> predictions = Caffeclassifier.Classify(img);double time2 = clock();double delay = (double) (time2 - time1) / CLOCKS_PER_SEC;Prediction p = predictions[0];LOGF("caffe predict , res=%s , prob=%f , time = %fs" , p.first.c_str() , p.second , delay);return p;}JNIEXPORT jlong JNICALL Java_org_opencv_samples_facedetect_DetectionBasedTracker_nativeInitial(JNIEnv * jenv, jclass, jstring jFilePath)//jbyteArray arrycascade{const char* jnamestr = jenv->GetStringUTFChars(jFilePath, NULL);string strFilePath(jnamestr);Caffeclassifier.model_file = strFilePath + "/deploy.prototxt";Caffeclassifier.trained_file = strFilePath + "/caffemodel.caffemodel";Caffeclassifier.mean_file = strFilePath + "/handnet_mean.binaryproto";Caffeclassifier.label_file = strFilePath + "/synset_words.txt";LOGF("3,%s" , Caffeclassifier.model_file.c_str());Caffeclassifier.LoadModel();LOGF("3,Load caffe Model success!");return 1;}JNIEXPORT void JNICALL Java_org_opencv_samples_facedetect_DetectionBasedTracker_nativeDetect(JNIEnv * jenv, jclass, jlong thiz, jlong imageRgba, jlong facesrect){Mat temp = *((Mat*) imageRgba);Mat matcolororiginal;cvtColor(temp, matcolororiginal, CV_RGBA2BGR);int timebegin = 0;int timeend = 0;timebegin = clock();//读每一帧识别Prediction p  = TestModel(matcolororiginal);timeend = clock();double delaytime = (double) (timeend - timebegin) / CLOCKS_PER_SEC;string result_text = format("Time=%f s ", delaytime);putText(matcolororiginal, result_text, Point(0, 20*(1+1)), FONT_HERSHEY_PLAIN, 3.0, CV_RGB(0,255,0), 2.0);result_text = format("res=%s , prob=%f", p.first.c_str() , p.second);putText(matcolororiginal, result_text, Point(0, 50*(1+1)), FONT_HERSHEY_PLAIN, 3.0, CV_RGB(0,255,0), 2.0);cvtColor(matcolororiginal, temp, CV_BGR2RGBA);}

3.总结：Android平台可运行Caffe模型的方法(盗用了同事的周报)：

a.opencvDNN模块，tinny-cnn，caffe-android-lib，caffe-mobile

b. opencvDNN模块PC上成功运行，速度较原生caffe慢7-8倍,执行精度与原生caffe相当

c.tinny-cnn库RK3288上成功运行，速度较原生caffe快，执行精度远低于原生caffe

d.caffe-android-lib是caffe原生android执行库，速度一般，Android 6.0可成功运行，4.4编译不成功

e.caffe-mobile是精简的caffe执行库，目前RK3288已成功移植，速度较快，简单网络60ms，复杂网络600ms

经过测试，在rk3288上，执行精度与原生caffe相当，执行速度较快

ps.第一次写博客，也是希望能帮助一些有需要的人吧。

2 0