模型转换实战分享：OpenPose手部关键点检测模型的迁移部署

背景

当你在开源平台上看到一个优质的深度学习模型并想使用它时，很多时候会遇到一个棘手的问题，就是这个模型所使用的深度学习框架与你所熟悉的框架并不相同，导致你难以快速的使用这个模型。

深度学习模型跨框架迁移一直是一件不太容易的事情，面对这个问题时一般有两个选择，一是手动转换代码至你所熟悉的框架并重新训练模型；二是使用各种模型转换工具对模型进行直接一键的转换。前者难度高、耗时长而且还需要算力的支持；而后者使用方便快捷，但是普适性不强，对于一些特殊的模型直接使用工具转换可能是行不通的。

今天就通过一个实例：使用飞桨X2Paddle将Caffe框架训练的OpenPose 手部关键点检测模型的迁移至PaddlePaddle框架上，并实现推理部署，介绍一下如何使用模型转换工具来解决深度学习模型跨框架迁移的问题。

本文包含了模型从转换到部署预测的相关代码实现，并以教程的形式开放在AI Studio平台上，大家可以直接在AI Studio平台上直接运行本文中的完整代码。（AI Studio链接：https://aistudio.baidu.com/ai...

X2Paddle简介
X2Paddle是飞桨官方开发的一个模型转换工具，支持将其它深度学习框架训练得到的模型，转换至PaddlePaddle模型。目前X2Paddle支持如下三种模型转换方式，即：TensorFlow、Caffe和ONNX。上述的三种转换方式基本能够直接或者间接地覆盖大部分主流框架，如：TensorFlow、Caffe和其他支持导出ONNX模型的框架（Pytorch、MXNet等）。

X2Paddle安装：一般使用pip的方式来安装X2Paddle，命令如下：

pip install x2paddle --index https://pypi.Python.org/simple/

模型介绍

手部关键点检测，旨在找出给定图片中手指上的关节点及指尖关节点，其类似于面部关键点检测(Facial Landmark Detection) 和人体关键点检测(Human Body Pose Estimation)。手部关键点检测的应用场景包括：手势识别、手语识别与理解和手部的行为识别等。

模型检测效果如下图：

模型转换

转换模型首先需要下载源模型，对于一个Caffe模型，一般包括如下两个文件，即模型计算图文件（ .prototxt）和模型权重文件（ .caffemodel），本次转换的模型就包含如下文件：pose_deploy.prototxt和pose_iter_102000.caffemodel。

模型准备好之后，就可以使用X2Paddle进行模型转换。只需要通过下面的命令，就可以将上述的Caffe模型转换成PaddlePaddle的模型了：

x2paddle --framework=caffe \ # 源模型类型 (tensorflow、caffe、onnx)
        --prototxt=pose_deploy.prototxt \ # 指定caffe模型的proto文件路径
        --weight=pose_iter_102000.caffemodel \ # 指定caffe模型的参数文件路径
        --save_dir=pd_model \ # 指定转换后的模型保存目录路径
--params_merge # 当指定该参数时，转换完成后，inference_model中的所有模

执行完上述命令，等待转换完成后，你将会在指定的模型保存目录下看到可供PaddlePaddle框架调用的模型，该目录下包含如下两个文件夹：model_with_code和inference_model。前者包含Paddle模型代码和训练可加载模型权重文件；后者则为Paddle推理模型。因为本次只介绍推理部署，所以主要关注于转换生成的推理模型。

在生成的推理模型文件夹中同样包含模型计算图文件（__model__）和模型权重文件（__params__）两个文件，这样的推理模型可直接被Paddle Inference高性能推理引擎调用，完成推理部署的操作。

模型部署

PaddlePaddle推理模型部署一般使用Paddle Inference高性能推理引擎进行部署，下面就通过代码讲解一下如何Paddle Inference的使用方法，需要如下十个步骤。

1. 导入必要的包。
import os
import numpy as np 

from paddle.fluid.core import AnalysisConfig, PaddleTensor
from paddle.fluid.core import create_paddle_predictor

2. 设置模型路径。
modelpath = 'pd_model/inference_model'
model = os.path.join(modelpath, "__model__")
params = os.path.join(modelpath, "__params__")

3. 使用AnalysisConfig对模型进行配置。
config = AnalysisConfig(model, params)
config.disable_gpu() # 关闭GPU，使用CPU进行推理
config.enable_mkldnn() # 启用MKLDNN加速
config.disable_glog_info() # 禁用预测中的glog日志
config.switch_use_feed_fetch_ops(False) # 删去 feed 和 fetchops op
config.switch_specify_input_names(True) 

4. 使用create_paddle_predictor创建模型预测器。
predictor = create_paddle_predictor(config)

5. 获取模型的输入输出向量。
input_names = predictor.get_input_names()
output_names = predictor.get_output_names()
input_tensor = predictor.get_input_tensor(input_names[0])
output_tensor = predictor.get_output_tensor(output_names[0])

6. 准备输入数据。
input_shape = (1, 3, 224, 224)
img = np.zeros(input_shape).astype('float32')
print('input_shape:', input_shape)
print(img)

7. 将输入数据拷贝到输入向量中。
input_tensor.copy_from_cpu(img)

8. 执行模型预测。
predictor.zero_copy_run()

9. 将输出数据从输出向量中拷贝出来。
output = output_tensor.copy_to_cpu()

10. 打印输出。
print('output_shape:', output.shape)
print(output)

通过上述的步骤，就可以完成一个简单的部署推理操作，但上述的操作只验证了模型能够正常的完成前向计算，要实现其完整的功能还需要在代码中加入数据预处理和输出后处理的代码。

对于这个模型来讲，它的输入应该是一张手部图像，然后需要经过缩放、归一化，才能最终转换成模型所能接受的输入数据，输入的数据形状应该为[batch_size, 3, h, w]，具体的预处理代码如下：

# 设置输入图像高度
inHeight = 368
# 读取图像
img_cv2 = cv2.imread(imgfile)
# 获取图像宽高
img_height, img_width, _ = img_cv2.shape
# 计算长宽比
aspect_ratio = img_width / img_height
# 计算输入图像宽度
inWidth = int(((aspect_ratio * inHeight) * 8) // 8)
# 对输入图像进行缩放、归一化
inpBlob = cv2.dnn.blobFromImage(img_cv2, 1.0 / 255, (inWidth, inHeight), (0, 0, 0), swapRB=False, crop=False)

而这个模型的输出数据的形状为[batch_size, 22, h_out, w_out]，输出的数据是22张分辨率为w_outh_out的热力图，代表每个关键点的在不同位置的概率，可通过缩放至输入尺寸来可视化这些热力图，具体可视化代码如下：

def vis_heatmaps(imgfile, net_outputs):
    # 读取输入图像
        img_cv2 = cv2.imread(imgfile)
        # 创建画布
        plt.figure(figsize=[10, 10])
        # 遍历每张热力图
        for pdx in range(22):
            probMap = net_outputs[0, pdx, :, :]
            # 缩放至原图尺寸
            probMap = cv2.resize(probMap, (img_cv2.shape[1], img_cv2.shape[0]))
            # 设置绘图规格
            plt.subplot(5, 5, pdx+1)
            # 绘制输入图像
            plt.imshow(cv2.cvtColor(img_cv2, cv2.COLOR_BGR2RGB))
            # 绘制热力图并叠加到输入图像上
            plt.imshow(probMap, alpha=0.6)
            # 显示热力图颜色标尺
            plt.colorbar()
            # 关闭坐标轴
            plt.axis("off")
        # 显示整体图像
        plt.show()

绘制出来的热力图样式如下：

当然也可以通过这些热力图来解算出各个关键的的具体坐标，只需要对每张热力图求全局最大值的位置，即可找出对应关键的位置，具体代码如下：

 # 遍历热力图
        points = []
        for idx in range(22):
            probMap = output[0, idx, :, :]
            # 缩放热力图到输入图像尺寸
            probMap = cv2.resize(probMap, (img_width, img_height))

            # 寻找热力图中全局最大值的位置
            minVal, prob, minLoc, point = cv2.minMaxLoc(probMap)

            # 如果该位置概率大于阈值则将其加入关键的列表
            # 否则就填充None占位
            if prob > 0.1:
                points.append((int(point[0]), int(point[1])))
            else:
                points.append(None)

        return points

得到了具体的关键点坐标之后，就可以通过可视化代码对关键点进行可视化，具体代码如下：

# 设置好需要绘制连线的关键点对，
point_pairs = [[0,1],[1,2],[2,3],[3,4],
                            [0,5],[5,6],[6,7],[7,8],
                            [0,9],[9,10],[10,11],[11,12],
                            [0,13],[13,14],[14,15],[15,16],
                            [0,17],[17,18],[18,19],[19,20]]

def vis_pose(imgfile, points):
    # 读取输入图像
        img_cv2 = cv2.imread(imgfile)
        # 复制一份图像
        img_cv2_copy = np.copy(img_cv2)
        # 遍历关键的坐标绘制关键点
        for idx in range(21):
            if points[idx]:
                cv2.circle(img_cv2_copy, points[idx], 8, (0, 255, 255), thickness=-1,
                           lineType=cv2.FILLED)
                cv2.putText(img_cv2_copy, "{}".format(idx), points[idx], cv2.FONT_HERSHEY_SIMPLEX,
                            1, (0, 0, 255), 2, lineType=cv2.LINE_AA)

        # 遍历关键点对，绘制之间的连线
        for pair in point_pairs:
            partA = pair[0]
            partB = pair[1]

           if points[partA] and points[partB]:
                cv2.line(img_cv2, points[partA], points[partB], (0, 255, 255), 3)
                cv2.circle(img_cv2, points[partA], 8, (0, 0, 255), thickness=-1, lineType=cv2.FILLED)
        # 设置绘图画布
        plt.figure(figsize=[10, 10])
        plt.subplot(1, 2, 1)
        # 绘制图像
        plt.imshow(cv2.cvtColor(img_cv2, cv2.COLOR_BGR2RGB))
        plt.axis("off")
        plt.subplot(1, 2, 2)
        plt.imshow(cv2.cvtColor(img_cv2_copy, cv2.COLOR_BGR2RGB))
        plt.axis("off")
        # 显示图像
        plt.show()

使用上述代码就能将关键点坐标绘制到输入图像上，效果图如下，可以看到模型输出的21个手部关键点的位置还是比较准确：

介绍完所有步骤，接下来就是将上述的模型部署，数据前后处理结合起来，就可以组成一个完整的图像手部关键点检测的程序了，完整代码如下：

#!/usr/bin/python3
#!--*-- coding: utf-8 --*--
from __future__ import division
import os
import cv2
import time
import numpy as np
import matplotlib.pyplot as plt
from paddle.fluid.core import AnalysisConfig, PaddleTensor
from paddle.fluid.core import create_paddle_predictor

class general_pose_model(object):
    # 初始化
    def __init__(self, modelpath):
        self.num_points = 21
        self.inHeight = 368
        self.threshold = 0.1
        self.point_pairs = [[0,1],[1,2],[2,3],[3,4],
                            [0,5],[5,6],[6,7],[7,8],
                            [0,9],[9,10],[10,11],[11,12],
                            [0,13],[13,14],[14,15],[15,16],
                            [0,17],[17,18],[18,19],[19,20]]

       self.hand_net = self.get_hand_model(modelpath)
        self.input_names = self.hand_net.get_input_names()
        self.output_names = self.hand_net.get_output_names()
        self.input_tensor = self.hand_net.get_input_tensor(self.input_names[0])
        self.output_tensor = self.hand_net.get_output_tensor(self.output_names[0])

    # 模型加载
    def get_hand_model(self, modelpath):
        model = os.path.join(modelpath, "__model__")
        params = os.path.join(modelpath, "__params__")
        config = AnalysisConfig(model, params)
        config.disable_gpu()
        config.enable_mkldnn()
        config.disable_glog_info()
        config.switch_ir_optim(True)
        config.switch_use_feed_fetch_ops(False)
        config.switch_specify_input_names(True)
        predictor = create_paddle_predictor(config)
        return predictor

    # 模型推理预测
    def predict(self, imgfile):
        # 图像预处理
        img_cv2 = cv2.imread(imgfile)
        img_height, img_width, _ = img_cv2.shape
        aspect_ratio = img_width / img_height
        inWidth = int(((aspect_ratio * self.inHeight) * 8) // 8)
        inpBlob = cv2.dnn.blobFromImage(img_cv2, 1.0 / 255, (inWidth, self.inHeight), (0, 0, 0), swapRB=False, crop=False)

        # 模型推理
        self.input_tensor.copy_from_cpu(inpBlob)
        self.hand_net.zero_copy_run()
        output = self.output_tensor.copy_to_cpu()

        # 可视化热力图
        self.vis_heatmaps(imgfile, output)

        # 关键点计算
        points = []
        for idx in range(self.num_points):
            # confidence map
            probMap = output[0, idx, :, :]
            probMap = cv2.resize(probMap, (img_width, img_height))

            # Find global maxima of the probMap.
            minVal, prob, minLoc, point = cv2.minMaxLoc(probMap)

           if prob > self.threshold:
                points.append((int(point[0]), int(point[1])))
            else:
                points.append(None)

        return points

    # 热力图可视化函数
    def vis_heatmaps(self, imgfile, net_outputs):
        img_cv2 = cv2.imread(imgfile)
        plt.figure(figsize=[10, 10])

        for pdx in range(self.num_points+1):
            probMap = net_outputs[0, pdx, :, :]
            probMap = cv2.resize(probMap, (img_cv2.shape[1], img_cv2.shape[0]))
            plt.subplot(5, 5, pdx+1)
            plt.imshow(cv2.cvtColor(img_cv2, cv2.COLOR_BGR2RGB))
            plt.imshow(probMap, alpha=0.6)
            plt.colorbar()
            plt.axis("off")
        plt.show()

   # 手部姿势可视化函数
    def vis_pose(self, imgfile, points):
        img_cv2 = cv2.imread(imgfile)
        img_cv2_copy = np.copy(img_cv2)
        for idx in range(len(points)):
            if points[idx]:
                cv2.circle(img_cv2_copy, points[idx], 8, (0, 255, 255), thickness=-1,
                           lineType=cv2.FILLED)
                cv2.putText(img_cv2_copy, "{}".format(idx), points[idx], cv2.FONT_HERSHEY_SIMPLEX,
                            1, (0, 0, 255), 2, lineType=cv2.LINE_AA)

        # Draw Skeleton
        for pair in self.point_pairs:
            partA = pair[0]
            partB = pair[1]

            if points[partA] and points[partB]:
                cv2.line(img_cv2, points[partA], points[partB], (0, 255, 255), 3)
                cv2.circle(img_cv2, points[partA], 8, (0, 0, 255), thickness=-1, lineType=cv2.FILLED)

        plt.figure(figsize=[10, 10])
        plt.subplot(1, 2, 1)
        plt.imshow(cv2.cvtColor(img_cv2, cv2.COLOR_BGR2RGB))
        plt.axis("off")
        plt.subplot(1, 2, 2)
        plt.imshow(cv2.cvtColor(img_cv2_copy, cv2.COLOR_BGR2RGB))
        plt.axis("off")
        plt.show()

if __name__ == '__main__':
    print('Hand pose estimation')

    # 设置测试图片目录
    imgs_path = path to imgs
    img_files = [os.path.join(imgs_path, img_file) for img_file in os.listdir(imgs_path) if 'jpg' in img_file]

    # 加载模型
    start = time.time()
    modelpath = "pd_model/inference_model"
    pose_model = general_pose_model(modelpath)
    print("Model loads time: ", time.time() - start)

    # 模型推理和结果输出
    for img_file in img_files:
        start = time.time()
        res_points = pose_model.predict(img_file)
        print("Model predicts time: ", time.time() - start)
        pose_model.vis_pose(img_file, res_points)

    print("Done.")

运行上述代码，可以看到输出了如下图像：

这样一个完整的模型部署流程就走完了，可以看出逻辑还是比较清晰的，对于大部分的模型来说模型推理的代码都是相似的，不同的点一般在于输入数据的预处理还有输出结果的后处理上，所以在部署模型是应该将关注点多放在这两部分，完成了这两部分也就意味着模型的部署基本完成了。