详解MNN的tf-MobilenetSSD-cpp部署流程 - 极术社区 - 连接开发者与智能计算生态

关于MNN以及Mobilenet SSD的介绍，大家可以参考我早些时候写过的一篇文章实战MNN之Mobilenet SSD部署（含源码）。本文我们少一些分析，多一些实例，来和大家一步一步的使用MNN在端上做mobilenet ssd推断。
首发：https://zhuanlan.zhihu.com/p/70610865
作者：张新栋

导出pb文件

这里假设你已经完成了基于tensorflow的MSSD检测器的训练，那么你需要导出固化的模型文件，用以后续的处理。

object_detection/export_tflite_ssd_graph.py          \
        --pipeline_config_path=$CONFIG_FILE          \
        --trained_checkpoint_prefix=$CHECKPOINT_PATH \
        --output_directory=$OUTPUT_DIR               \
        --add_postprocessing_op=false

其中CONFIG\_FILE为训练MSSD时候的配置文件，CHECKPOINT\_PATH为训练产生的中间ckpt文件，OUTPUT\_DIR为导出的pb文件所在的文件夹目录，add\_postprocessing\_op这里需要设置成false（MNN中不支持Postprocessing的处理，我们会在MNN做完一次前向传播后，在CPU端去做Postprocessing处理，实际上就是decoding和NMS）。

将pb文件转化成mnn文件

MNN提供了工具来对来自不同frontend的模型进行解析，最后生成MNN能执行的模型文件。

./MNNConvert -f TF --modelFile YOU_TF_PB_FILE --MNNModel OUTPUT_MNN_FILE --bizCode MNN

其中YOU\_TF\_PB\_FILE为导出的pb文件坐在的路径，OUTPUT\_MNN\_FILE为导出的MNN模型所在路径。

CPP加载MNN模型文件

    std::string model_name = "./tf_body_det.mnn";
    auto revertor = std::unique_ptr<Revert>(new Revert(model_name.c_str()));
    revertor->initialize();
    auto modelBuffer      = revertor->getBuffer();
    const auto bufferSize = revertor->getBufferSize();
    auto net = std::shared_ptr<MNN::Interpreter>(MNN::Interpreter::createFromBuffer(modelBuffer, bufferSize));
    revertor.reset();
    MNN::ScheduleConfig config;
    config.numThread = 4;
    config.type      = static_cast<MNNForwardType>(forward);
    MNN::BackendConfig backendConfig;
    backendConfig.precision = (MNN::BackendConfig::PrecisionMode)precision;
    config.backendConfig = &backendConfig;

数据预处理

    int INPUT_SIZE = 300;
    std::string image_name = "./body.jpg";
    // load image
    cv::Mat raw_image    = cv::imread(image_name.c_str());
    int raw_image_height = raw_image.rows;
    int raw_image_width  = raw_image.cols; 
    cv::Mat image;
    cv::resize(raw_image, image, cv::Size(INPUT_SIZE, INPUT_SIZE));
    
    // preprocessing
    image.convertTo(image, CV_32FC3);
    image = (image * 2 / 255.0f) - 1;

    // wrapping input tensor, convert nhwc to nchw    
    std::vector<int> dims{1, INPUT_SIZE, INPUT_SIZE, 3};
    auto nhwc_Tensor = MNN::Tensor::create<float>(dims, NULL, MNN::Tensor::TENSORFLOW);
    auto nhwc_data   = nhwc_Tensor->host<float>();
    auto nhwc_size   = nhwc_Tensor->size();
    ::memcpy(nhwc_data, image.data, nhwc_size);

这里需要注意的两个点。第一点是模型输入的预处理操作，这个需要跟你在训练MSSD时的需处理要保持一致；第二点是，需要注意输入到MNN的tensortype，这里在debug的时候发现输入的tensortype为CAFFE。CAFFE的输入类型为NCHW，但是训练tensorflow的输入为NHWC，所以在输入的时候要注意进行转换。详细步骤可以参考上面的代码patch。

网络运行及获取结果

    // run network
    net->runSession(session);

    // get output data
    std::string output_tensor_name0 = "convert_scores";
    std::string output_tensor_name1 = "Squeeze";
    std::string output_tensor_name2 = "anchors";

    MNN::Tensor *tensor_scores  = net->getSessionOutput(session, output_tensor_name0.c_str());
    MNN::Tensor *tensor_boxes   = net->getSessionOutput(session, output_tensor_name1.c_str());
    MNN::Tensor *tensor_anchors = net->getSessionOutput(session, output_tensor_name2.c_str());

    MNN::Tensor tensor_scores_host(tensor_scores, tensor_scores->getDimensionType());
    MNN::Tensor tensor_boxes_host(tensor_boxes, tensor_boxes->getDimensionType());
    MNN::Tensor tensor_anchors_host(tensor_anchors, tensor_anchors->getDimensionType());

    tensor_scores->copyToHostTensor(&tensor_scores_host);
    tensor_boxes->copyToHostTensor(&tensor_boxes_host);
    tensor_anchors->copyToHostTensor(&tensor_anchors_host);

    // post processing steps
    auto scores_dataPtr  = tensor_scores_host.host<float>();
    auto boxes_dataPtr   = tensor_boxes_host.host<float>();
    auto anchors_dataPtr = tensor_anchors_host.host<float>();

数据后处理

由于我们裁剪掉了MSSD中的Postprocessing，意味着我们在获取结果convert\_scores、Squeeze及anchors以后，要自己进行Postprocessing操作。由于Postprocessing操作中含较多的逻辑判断操作，该类操作并行化的开销很大。其中convert\_scores为输出的分数（需要注意其类型是Identity、sigmoid还是softmax），Squeeze为输出的encoded box（所以我们还要做decoding的操作），anchors为先验框（后面需要用anchors来做decoding）。下面第一个代码片为decoding操作，

    int OUTPUT_NUM = 1917;
    float X_SCALE    = 10.0;
    float Y_SCALE    = 10.0;   
    float H_SCALE    = 5.0;  
    float W_SCALE    = 5.0;
    float score_threshold = 0.5f;
    float nms_threshold   = 0.45f;
    // location and score decoding
    std::vector<cv::Rect> tmp_faces;
    for(int i = 0; i < OUTPUT_NUM; ++i)
    {
        // location decoding
        float ycenter =     boxes_dataPtr[i*4 + 0] / Y_SCALE  * anchors_dataPtr[i*4 + 2] + anchors_dataPtr[i*4 + 0];
        float xcenter =     boxes_dataPtr[i*4 + 1] / X_SCALE  * anchors_dataPtr[i*4 + 3] + anchors_dataPtr[i*4 + 1];
        float h       = exp(boxes_dataPtr[i*4 + 2] / H_SCALE) * anchors_dataPtr[i*4 + 2];
        float w       = exp(boxes_dataPtr[i*4 + 3] / W_SCALE) * anchors_dataPtr[i*4 + 3];

        float ymin    = ( ycenter - h * 0.5 ) * raw_image_height;
        float xmin    = ( xcenter - w * 0.5 ) * raw_image_width;
        float ymax    = ( ycenter + h * 0.5 ) * raw_image_height;
        float xmax    = ( xcenter + w * 0.5 ) * raw_image_width;

        // probability decoding, softmax
        float nonface_prob = scores_dataPtr[i*2 + 0];
        float face_prob    = scores_dataPtr[i*2 + 1];

        if (face_prob > nonface_prob && face_prob > score_threshold) {
            cv::Rect tmp_face;
            tmp_face.x = xmin;
            tmp_face.y = ymin;
            tmp_face.width  = xmax - xmin;
            tmp_face.height = ymax - ymin;
            tmp_faces.push_back(tmp_face); 
        }
    }

下面的代码片为NMS操作，目的把很多交并比值很高的框进行“去重”

    // perform NMS
    int N = tmp_faces.size();
    std::vector<int> labels(N, -1); 
    for(int i = 0; i < N-1; ++i)
    {
        for (int j = i+1; j < N; ++j)
        {
            cv::Rect pre_box = tmp_faces[i];
            cv::Rect cur_box = tmp_faces[j];
            float iou_ = iou(pre_box, cur_box);
            if (iou_ > nms_threshold) {
                labels[j] = 0;
            }
        }
    }

    std::vector<cv::Rect> faces;
    for (int i = 0; i < N; ++i)
    {
        if (labels[i] == -1)
            faces.push_back(tmp_faces[i]);
    }

结尾

至此，部署的主要步骤已完毕。详细部署到rk3399-android上的工程代码可参考，MNN\_MSSD。感觉有帮助的看官，高抬贵手，点下小星星，谢谢！

参考

MNN\_MSSD: 链接

推荐阅读

专注嵌入式端的AI算法实现，欢迎关注作者微信公众号和知乎嵌入式AI算法实现专栏。

更多嵌入式AI相关的技术文章请关注极术嵌入式AI专栏。