关于MNN以及Mobilenet SSD的介绍,大家可以参考我早些时候写过的一篇文章实战MNN之Mobilenet SSD部署(含源码)。本文我们少一些分析,多一些实例,来和大家一步一步的使用MNN在端上做mobilenet ssd推断。笔者使用的硬件平台为RK3399,系统为Android-8.1。
首发:https://zhuanlan.zhihu.com/p/71648953
作者:张新栋
导出pb文件
这里假设你已经完成了基于tensorflow的MSSD检测器的训练,那么你需要导出固化的模型文件,用以后续的处理。
object_detection/export_tflite_ssd_graph.py \
--pipeline_config_path=$CONFIG_FILE \
--trained_checkpoint_prefix=$CHECKPOINT_PATH \
--output_directory=$OUTPUT_DIR \
--add_postprocessing_op=false
其中CONFIG\_FILE为训练MSSD时候的配置文件,CHECKPOINT\_PATH为训练产生的中间ckpt文件,OUTPUT\_DIR为导出的pb文件所在的文件夹目录,add\_postprocessing\_op这里需要设置成false(MNN中不支持Postprocessing的处理,我们会在MNN做完一次前向传播后,在CPU端去做Postprocessing处理,实际上就是decoding和NMS)。
裁剪网络
网络的裁剪需要借助tensorflow提供的工具toco,进入到tensorflow文件夹目录下,执行如下命令裁剪网络
bazel run --config=opt tensorflow/lite/toco:toco -- \
--input_file=$OUTPUT_DIR/tflite_graph.pb \
--output_file=$OUTPUT_DIR/detect.tflite \
--input_shapes=1,224,224,3 \
--input_arrays=normalized_input_image_tensor \
--output_arrays='concat','concat_1' \
--inference_type=FLOAT \
--change_concat_input_ranges=false
配置依赖项
笔者使用的平台为RK3399-Android-8.1,依赖项主要由如下的Android.mk文件进行设置,
LOCAL_PATH := $(call my-dir)
OpenCV_BASE = /Users/xindongzhang/armnn-tflite/OpenCV-android-sdk/
MNN_BASE = /Users/xindongzhang/mnn/
include $(CLEAR_VARS)
LOCAL_MODULE := MNN
LOCAL_SRC_FILES := $(MNN_BASE)/benchmark/build/libMNN.so
include $(PREBUILT_SHARED_LIBRARY)
include $(CLEAR_VARS)
LOCAL_MODULE := MNN_CL
LOCAL_SRC_FILES := $(MNN_BASE)/benchmark/build/source/backend/opencl/libMNN_CL.so
include $(PREBUILT_SHARED_LIBRARY)
include $(CLEAR_VARS)
LOCAL_MODULE := MNN_Vulkan
LOCAL_SRC_FILES := $(MNN_BASE)/benchmark/build/source/backend/vulkan/libMNN_Vulkan.so
include $(PREBUILT_SHARED_LIBRARY)
include $(CLEAR_VARS)
OpenCV_INSTALL_MODULES := on
OPENCV_LIB_TYPE := STATIC
include $(OpenCV_BASE)/sdk/native/jni/OpenCV.mk
LOCAL_MODULE := mssd
LOCAL_C_INCLUDES += $(OPENCV_INCLUDE_DIR)
LOCAL_C_INCLUDES += $(MNN_BASE)/include
LOCAL_C_INCLUDES += $(MNN_BASE)/tools
LOCAL_C_INCLUDES += $(MNN_BASE)/tools/cpp
LOCAL_C_INCLUDES += $(MNN_BASE)/source
LOCAL_C_INCLUDES += $(MNN_BASE)/source/backend
LOCAL_C_INCLUDES += $(MNN_BASE)/source/core
LOCAL_C_INCLUDES += $(MNN_BASE)/source/cv
LOCAL_C_INCLUDES += $(MNN_BASE)/source/math
LOCAL_C_INCLUDES += $(MNN_BASE)/source/shape
LOCAL_SRC_FILES := \
mssd.cpp \
$(MNN_BASE)/tools/cpp/revertMNNModel.cpp
LOCAL_LDLIBS := -landroid -llog -ldl -lz
LOCAL_CFLAGS := -O2 -fvisibility=hidden -fomit-frame-pointer -fstrict-aliasing \
-ffunction-sections -fdata-sections -ffast-math -ftree-vectorize -fPIC -Ofast \
-ffast-math -w -std=c++14
LOCAL_CPPFLAGS := -O2 -fvisibility=hidden -fvisibility-inlines-hidden \
-fomit-frame-pointer -fstrict-aliasing -ffunction-sections -fdata-sections \
-ffast-math -fPIC -Ofast -ffast-math -std=c++14
LOCAL_LDFLAGS += -Wl,--gc-sections
LOCAL_CFLAGS += -fopenmp
LOCAL_CPPFLAGS += -fopenmp
LOCAL_LDFLAGS += -fopenmp
LOCAL_ARM_NEON := true
APP_ALLOW_MISSING_DEPS = true
LOCAL_SHARED_LIBRARIES := \
MNN \
MNN_CL \
MNN_Vulkan
include $(BUILD_EXECUTABLE)
C++业务代码
业务代码的主要逻辑主要为三个部分,第一为数据预处理,第二为核心网络推断(inference),最后为后处理。在MobilenetSSD中,预处理需要跟训练时的预处理匹配,核心网络推断为基于MobileNet的特征提取和多尺度特征融合的物体位置回归。后处理为decoding及NMS。
我分成三个部分给大家进行演示,第一部分为模型加载及数据预处理。这里需要注意的是数据的输入格式,需要确定是NCHW还是NHWC。由于该转化后的MNN模型要求输入为NCHW,跟一开始的tflite的NHWC不一样,所以预处理后进行了格式的转换,否则推断的结果就会有巨大偏差。
std::string image_name = "./image.jpg";
std::string model_name = "./face_det.mnn";
int forward = MNN_FORWARD_CPU;
int precision = 2;
// read image
cv::Mat raw_image = cv::imread(image_name.c_str());
int raw_image_height = raw_image.rows;
int raw_image_width = raw_image.cols;
cv::Mat image;
cv::resize(raw_image, image, cv::Size(INPUT_SIZE, INPUT_SIZE));
// load and config mnn model
auto revertor = std::unique_ptr<Revert>(new Revert(model_name.c_str()));
revertor->initialize();
auto modelBuffer = revertor->getBuffer();
const auto bufferSize = revertor->getBufferSize();
auto net = std::shared_ptr<MNN::Interpreter>(MNN::Interpreter::createFromBuffer(modelBuffer, bufferSize));
revertor.reset();
MNN::ScheduleConfig config;
config.numThread = 4;
config.type = static_cast<MNNForwardType>(forward);
MNN::BackendConfig backendConfig;
backendConfig.precision = (MNN::BackendConfig::PrecisionMode)precision;
config.backendConfig = &backendConfig;
// preprocessing
float img_mean = 123.0f;
float img_std = 58.0f;
image.convertTo(image, CV_32FC3);
image = (image - img_mean) / img_std;
// wrapping input tensor, convert nhwc to nchw
std::vector<int> dims{1, INPUT_SIZE, INPUT_SIZE, 3};
auto nhwc_Tensor = MNN::Tensor::create<float>(dims, NULL, MNN::Tensor::TENSORFLOW);
auto nhwc_data = nhwc_Tensor->host<float>();
auto nhwc_size = nhwc_Tensor->size();
::memcpy(nhwc_data, image.data, nhwc_size);
网络的推断就比较简单,大体过程为先创建会话,然后进行host到device的输入数据拷贝,然后执行inference,最后进行device到host的输出数据拷贝。
auto session = net->createSession(config);
std::string input_tensor = "normalized_input_image_tensor";
auto inputTensor = net->getSessionInput(session, nullptr);
inputTensor->copyFromHostTensor(nhwc_Tensor);
// run network
net->runSession(session);
// get output data
std::string output_tensor_name0 = "concat";
std::string output_tensor_name1 = "concat_1";
MNN::Tensor *tensor_scores = net->getSessionOutput(session, output_tensor_name0.c_str());
MNN::Tensor *tensor_boxes = net->getSessionOutput(session, output_tensor_name1.c_str());
MNN::Tensor tensor_scores_host(tensor_scores, tensor_scores->getDimensionType());
MNN::Tensor tensor_boxes_host(tensor_boxes, tensor_boxes->getDimensionType());
tensor_scores->copyToHostTensor(&tensor_scores_host);
tensor_boxes->copyToHostTensor(&tensor_boxes_host);
最后是后处理的操作,这里需要注意的是MNN的输出格式,仍需要确定是NCHW还是NHWC。此处的输出格式与Tensorflow的MNN移植的输出格式是不一样的。
// find biggest face
float maxProb = 0.0f;
auto scores_dataPtr = tensor_scores_host.host<float>();
auto boxes_dataPtr = tensor_boxes_host.host<float>();
cv::Rect biggest_face;
for(int i = 0; i < OUTPUT_NUM; ++i)
{
// location decoding
float ycenter = boxes_dataPtr[i + 0 * OUTPUT_NUM] / Y_SCALE * anchors[2][i] + anchors[0][i];
float xcenter = boxes_dataPtr[i + 1 * OUTPUT_NUM] / X_SCALE * anchors[3][i] + anchors[1][i];
float h = exp(boxes_dataPtr[i + 2 * OUTPUT_NUM] / H_SCALE) * anchors[2][i];
float w = exp(boxes_dataPtr[i + 3 * OUTPUT_NUM] / W_SCALE) * anchors[3][i];
float ymin = ( ycenter - h * 0.5 ) * raw_image_height;
float xmin = ( xcenter - w * 0.5 ) * raw_image_width;
float ymax = ( ycenter + h * 0.5 ) * raw_image_height;
float xmax = ( xcenter + w * 0.5 ) * raw_image_width;
// probability decoding, softmax
float nonface_prob = exp(scores_dataPtr[i*2 + 0]);
float face_prob = exp(scores_dataPtr[i*2 + 1]);
float ss = nonface_prob + face_prob;
nonface_prob /= ss;
face_prob /= ss;
if (face_prob > face_prob_thresh && face_prob > maxProb) {
if (xmin > 0 && ymin > 0 && xmax < raw_image_width && ymax < raw_image_height) {
maxProb = face_prob;
biggest_face.x = (int) xmin;
biggest_face.y = (int) ymin;
biggest_face.width = (int) (xmax - xmin);
biggest_face.height = (int) (ymax - ymin);
}
}
}
最后
进行完如上过程后,你就可以在Android中进行MNN-MobilenetSSD的inference测试,自此基于TFLite和Tensorflow的MNN部署过程都已经给大家介绍完了。欢迎大家留言讨论、订阅专栏,本专栏专注介绍对嵌入式设备友好的AI算法及实现,谢谢大家!
推荐阅读
专注嵌入式端的AI算法实现,欢迎关注作者微信公众号和知乎嵌入式AI算法实现专栏。
更多嵌入式AI相关的技术文章请关注极术嵌入式AI专栏