张新栋 · 2020年04月08日

详解MNN的tflite-MobilenetSSD-c++部署流程

关于MNN以及Mobilenet SSD的介绍,大家可以参考我早些时候写过的一篇文章实战MNN之Mobilenet SSD部署(含源码)。本文我们少一些分析,多一些实例,来和大家一步一步的使用MNN在端上做mobilenet ssd推断。笔者使用的硬件平台为RK3399,系统为Android-8.1。
首发:https://zhuanlan.zhihu.com/p/71648953
作者:张新栋

导出pb文件

这里假设你已经完成了基于tensorflow的MSSD检测器的训练,那么你需要导出固化的模型文件,用以后续的处理。

object_detection/export_tflite_ssd_graph.py          \
        --pipeline_config_path=$CONFIG_FILE          \
        --trained_checkpoint_prefix=$CHECKPOINT_PATH \
        --output_directory=$OUTPUT_DIR               \
        --add_postprocessing_op=false

其中CONFIG\_FILE为训练MSSD时候的配置文件,CHECKPOINT\_PATH为训练产生的中间ckpt文件,OUTPUT\_DIR为导出的pb文件所在的文件夹目录,add\_postprocessing\_op这里需要设置成false(MNN中不支持Postprocessing的处理,我们会在MNN做完一次前向传播后,在CPU端去做Postprocessing处理,实际上就是decoding和NMS)

裁剪网络

网络的裁剪需要借助tensorflow提供的工具toco,进入到tensorflow文件夹目录下,执行如下命令裁剪网络

bazel run --config=opt tensorflow/lite/toco:toco -- \
--input_file=$OUTPUT_DIR/tflite_graph.pb            \
--output_file=$OUTPUT_DIR/detect.tflite             \
--input_shapes=1,224,224,3                          \
--input_arrays=normalized_input_image_tensor        \
--output_arrays='concat','concat_1'                 \
--inference_type=FLOAT                              \
--change_concat_input_ranges=false        

配置依赖项

笔者使用的平台为RK3399-Android-8.1,依赖项主要由如下的Android.mk文件进行设置,

LOCAL_PATH := $(call my-dir)

OpenCV_BASE = /Users/xindongzhang/armnn-tflite/OpenCV-android-sdk/
MNN_BASE    = /Users/xindongzhang/mnn/

include $(CLEAR_VARS)
LOCAL_MODULE := MNN
LOCAL_SRC_FILES := $(MNN_BASE)/benchmark/build/libMNN.so
include $(PREBUILT_SHARED_LIBRARY)

include $(CLEAR_VARS)
LOCAL_MODULE := MNN_CL
LOCAL_SRC_FILES := $(MNN_BASE)/benchmark/build/source/backend/opencl/libMNN_CL.so
include $(PREBUILT_SHARED_LIBRARY)

include $(CLEAR_VARS)
LOCAL_MODULE := MNN_Vulkan
LOCAL_SRC_FILES := $(MNN_BASE)/benchmark/build/source/backend/vulkan/libMNN_Vulkan.so
include $(PREBUILT_SHARED_LIBRARY)


include $(CLEAR_VARS)

OpenCV_INSTALL_MODULES := on
OPENCV_LIB_TYPE := STATIC
include $(OpenCV_BASE)/sdk/native/jni/OpenCV.mk
LOCAL_MODULE := mssd

LOCAL_C_INCLUDES += $(OPENCV_INCLUDE_DIR)
LOCAL_C_INCLUDES += $(MNN_BASE)/include
LOCAL_C_INCLUDES += $(MNN_BASE)/tools
LOCAL_C_INCLUDES += $(MNN_BASE)/tools/cpp
LOCAL_C_INCLUDES += $(MNN_BASE)/source
LOCAL_C_INCLUDES += $(MNN_BASE)/source/backend
LOCAL_C_INCLUDES += $(MNN_BASE)/source/core
LOCAL_C_INCLUDES += $(MNN_BASE)/source/cv
LOCAL_C_INCLUDES += $(MNN_BASE)/source/math
LOCAL_C_INCLUDES += $(MNN_BASE)/source/shape

LOCAL_SRC_FILES :=       \
                mssd.cpp \
        $(MNN_BASE)/tools/cpp/revertMNNModel.cpp


LOCAL_LDLIBS := -landroid -llog -ldl -lz 
LOCAL_CFLAGS   := -O2 -fvisibility=hidden -fomit-frame-pointer -fstrict-aliasing \
-ffunction-sections -fdata-sections -ffast-math -ftree-vectorize -fPIC -Ofast    \
-ffast-math -w -std=c++14
LOCAL_CPPFLAGS := -O2 -fvisibility=hidden -fvisibility-inlines-hidden            \
-fomit-frame-pointer -fstrict-aliasing -ffunction-sections -fdata-sections       \
-ffast-math -fPIC -Ofast -ffast-math -std=c++14
LOCAL_LDFLAGS  += -Wl,--gc-sections
LOCAL_CFLAGS   += -fopenmp
LOCAL_CPPFLAGS += -fopenmp
LOCAL_LDFLAGS  += -fopenmp
LOCAL_ARM_NEON := true

APP_ALLOW_MISSING_DEPS = true

LOCAL_SHARED_LIBRARIES :=          \
                        MNN        \
            MNN_CL     \
            MNN_Vulkan                    

include $(BUILD_EXECUTABLE)

C++业务代码

业务代码的主要逻辑主要为三个部分,第一为数据预处理,第二为核心网络推断(inference),最后为后处理。在MobilenetSSD中,预处理需要跟训练时的预处理匹配,核心网络推断为基于MobileNet的特征提取和多尺度特征融合的物体位置回归。后处理为decoding及NMS。

我分成三个部分给大家进行演示,第一部分为模型加载及数据预处理。这里需要注意的是数据的输入格式,需要确定是NCHW还是NHWC。由于该转化后的MNN模型要求输入为NCHW,跟一开始的tflite的NHWC不一样,所以预处理后进行了格式的转换,否则推断的结果就会有巨大偏差。

    std::string image_name = "./image.jpg";
    std::string model_name = "./face_det.mnn";
    int forward = MNN_FORWARD_CPU;
    int precision = 2;

    // read image 
    cv::Mat raw_image    = cv::imread(image_name.c_str());
    int raw_image_height = raw_image.rows;
    int raw_image_width  = raw_image.cols; 
    cv::Mat image;
    cv::resize(raw_image, image, cv::Size(INPUT_SIZE, INPUT_SIZE));    
    // load and config mnn model
    auto revertor = std::unique_ptr<Revert>(new Revert(model_name.c_str()));
    revertor->initialize();
    auto modelBuffer      = revertor->getBuffer();
    const auto bufferSize = revertor->getBufferSize();
    auto net = std::shared_ptr<MNN::Interpreter>(MNN::Interpreter::createFromBuffer(modelBuffer, bufferSize));
    revertor.reset();
    MNN::ScheduleConfig config;
    config.numThread = 4;
    config.type      = static_cast<MNNForwardType>(forward);
    MNN::BackendConfig backendConfig;
    backendConfig.precision = (MNN::BackendConfig::PrecisionMode)precision;
    config.backendConfig = &backendConfig;

    // preprocessing
    float img_mean = 123.0f;
    float img_std  = 58.0f;
    image.convertTo(image, CV_32FC3);
    image = (image - img_mean) / img_std;

    // wrapping input tensor, convert nhwc to nchw    
    std::vector<int> dims{1, INPUT_SIZE, INPUT_SIZE, 3};
    auto nhwc_Tensor = MNN::Tensor::create<float>(dims, NULL, MNN::Tensor::TENSORFLOW);
    auto nhwc_data   = nhwc_Tensor->host<float>();
    auto nhwc_size   = nhwc_Tensor->size();
    ::memcpy(nhwc_data, image.data, nhwc_size);

网络的推断就比较简单,大体过程为先创建会话,然后进行host到device的输入数据拷贝,然后执行inference,最后进行device到host的输出数据拷贝。

    auto session = net->createSession(config);
    std::string input_tensor = "normalized_input_image_tensor";
    auto inputTensor  = net->getSessionInput(session, nullptr);
    inputTensor->copyFromHostTensor(nhwc_Tensor);


    // run network
    net->runSession(session);

    // get output data
    std::string output_tensor_name0 = "concat";
    std::string output_tensor_name1 = "concat_1";

    MNN::Tensor *tensor_scores = net->getSessionOutput(session, output_tensor_name0.c_str());
    MNN::Tensor *tensor_boxes  = net->getSessionOutput(session, output_tensor_name1.c_str());


    MNN::Tensor tensor_scores_host(tensor_scores, tensor_scores->getDimensionType());
    MNN::Tensor tensor_boxes_host(tensor_boxes, tensor_boxes->getDimensionType());

    tensor_scores->copyToHostTensor(&tensor_scores_host);
    tensor_boxes->copyToHostTensor(&tensor_boxes_host);

最后是后处理的操作,这里需要注意的是MNN的输出格式,仍需要确定是NCHW还是NHWC。此处的输出格式与Tensorflow的MNN移植的输出格式是不一样的。

   // find biggest face
    float maxProb = 0.0f;
    auto scores_dataPtr = tensor_scores_host.host<float>();
    auto boxes_dataPtr  = tensor_boxes_host.host<float>();
    cv::Rect biggest_face;
    for(int i = 0; i < OUTPUT_NUM; ++i)
    {
        // location decoding
        float ycenter =     boxes_dataPtr[i + 0 * OUTPUT_NUM] / Y_SCALE  * anchors[2][i] + anchors[0][i];
        float xcenter =     boxes_dataPtr[i + 1 * OUTPUT_NUM] / X_SCALE  * anchors[3][i] + anchors[1][i];
        float h       = exp(boxes_dataPtr[i + 2 * OUTPUT_NUM] / H_SCALE) * anchors[2][i];
        float w       = exp(boxes_dataPtr[i + 3 * OUTPUT_NUM] / W_SCALE) * anchors[3][i];

        float ymin    = ( ycenter - h * 0.5 ) * raw_image_height;
        float xmin    = ( xcenter - w * 0.5 ) * raw_image_width;
        float ymax    = ( ycenter + h * 0.5 ) * raw_image_height;
        float xmax    = ( xcenter + w * 0.5 ) * raw_image_width;

        // probability decoding, softmax
        float nonface_prob = exp(scores_dataPtr[i*2 + 0]);
        float face_prob    = exp(scores_dataPtr[i*2 + 1]);
        float ss           = nonface_prob + face_prob;
        nonface_prob       /= ss;
        face_prob          /= ss;
        
        if (face_prob > face_prob_thresh && face_prob > maxProb) {
            if (xmin > 0 && ymin > 0 && xmax < raw_image_width && ymax < raw_image_height) {
                maxProb = face_prob;
                biggest_face.x = (int) xmin;
                biggest_face.y = (int) ymin;
                biggest_face.width  = (int) (xmax - xmin);
                biggest_face.height = (int) (ymax - ymin); 
            }
        }
    }
    

最后

进行完如上过程后,你就可以在Android中进行MNN-MobilenetSSD的inference测试,自此基于TFLite和Tensorflow的MNN部署过程都已经给大家介绍完了。欢迎大家留言讨论、订阅专栏,本专栏专注介绍对嵌入式设备友好的AI算法及实现,谢谢大家!


推荐阅读

专注嵌入式端的AI算法实现,欢迎关注作者微信公众号和知乎嵌入式AI算法实现专栏

WX20200305-192544.png
更多嵌入式AI相关的技术文章请关注极术嵌入式AI专栏

推荐阅读
关注数
18838
内容数
1372
嵌入式端AI,包括AI算法在推理框架Tengine,MNN,NCNN,PaddlePaddle及相关芯片上的实现。欢迎加入微信交流群,微信号:aijishu20(备注:嵌入式)
目录
极术微信服务号
关注极术微信号
实时接收点赞提醒和评论通知
安谋科技学堂公众号
关注安谋科技学堂
实时获取安谋科技及 Arm 教学资源
安谋科技招聘公众号
关注安谋科技招聘
实时获取安谋科技中国职位信息