OCR移动端网络汇总与PyTorch实现

转载自：AI人工智能初学者
作者：ChaucerG

1 轻量化网络简介

前面所提网络在向着越来越大、越来越深的方向发展，但在实际应用中计算性能有限，但又有着极强的业务需求。对于效率问题，可以想到的方法通常是对模型进行压缩与剪枝，降低网络的参数量，从而降低计算量加快推理速度。相较于对模型进行后处理的方法，轻量化模型设计则是另辟蹊径。

轻量化模型主要是设计更加高效的网络计算方式，在降低网络参数的同时，不损失性能。

常用的四个轻量化模型系列：_SqueezeNet、MobileNet、ShuffleNet、Xception_。这些模型在实际场景中都得到了广泛的应用。

1.1 SqueezeNet

SqueezeNet是一个人工设计的轻量化网络，它在ImageNet上实现了和AlexNet相同水平的正确率，但是只使用了1/50的参数。更进一步，使用模型压缩技术，可以将SqueezeNet压缩到0.5MB，这是AlexNet的1/510。
引入了两个术语CNN微结构和CNN宏结构。

CNN微结构：由层或几个卷积层组成的小模块，如inception模块。
CNN宏结构：由层或模块组成的完整的网络结构，此时深度是一个重要的参数。

网络结构的设计策略：

（1）代替3x3的滤波器为1x1，这样会减少9倍的参数。
（2）减少输入到3x3滤波器的输入通道，这样可以进一步减少参数，本文使用squeeze层来实现。
（3）降采样操作延后，可以给卷积层更大的激活特征图，意味着保留的信息更多，可以提升准确率。

策略是减少参数的方案，是在限制参数预算的情况下最大化准确率。如下图所示作者引入了Fire模块来构造CNN，此模块成功地应用了上述的3个策略。

模块由squeeze层和expand层组成，squeeze层由1x1的卷积层组成，,可以减少输入expand层的特征图的输入通道。expand层由1x1和3x3的卷积混合而成，<，称为扩展了特征图。

PyTorch实现Fire模块如下：

class Fire(nn.Module):  
  
    def __init__(self, inplanes, squeeze_planes, expand1x1_planes, expand3x3_planes):  
        super(Fire, self).__init__()  
        self.inplanes = inplanes  
        self.squeeze = nn.Conv2d(inplanes, squeeze_planes, kernel_size=1)  
        self.squeeze_activation = nn.ReLU(inplace=True)  
        self.expand1x1 = nn.Conv2d(squeeze_planes, expand1x1_planes, kernel_size=1)  
        self.expand1x1_activation = nn.ReLU(inplace=True)  
        self.expand3x3 = nn.Conv2d(squeeze_planes, expand3x3_planes, kernel_size=3, padding=1)  
        self.expand3x3_activation = nn.ReLU(inplace=True)  
  
    def forward(self, x):  
        x = self.squeeze_activation(self.squeeze(x))  
        return torch.cat([  
            self.expand1x1_activation(self.expand1x1(x)),  
            self.expand3x3_activation(self.expand3x3(x))  
        ], 1)

整体结构以普通的卷积层(conv1)开始，接着连接8个Fire(2-9)模块，最后以卷积层(conv10)结束。每个Fire模块的filter数量逐渐增加，并且在conv1，Fire4，Fire8和conv10后使用步长为2的max-pooling，这种相对延迟的pooling符合了策略(3)。如下作者对比了添加跳跃层的squeezenet：PyTorch实现如下：

class SqueezeNet(nn.Module):  
  
    def __init__(self, version=1.0, num_classes=1000):  
        super(SqueezeNet, self).__init__()  
        if version not in [1.0, 1.1]:  
            raise ValueError("Unsupported SqueezeNet version {version}: 1.0 or 1.1 expected".format(version=version))  
        self.num_classes = num_classes  
        if version == 1.0:  
            self.features = nn.Sequential(  
                nn.Conv2d(3, 96, kernel_size=7, stride=2),  
                nn.ReLU(inplace=True),  
                nn.MaxPool2d(kernel_size=3, stride=2, ceil_mode=True),  
                Fire(96, 16, 64, 64),  
                Fire(128, 16, 64, 64),  
                Fire(128, 32, 128, 128),  
                nn.MaxPool2d(kernel_size=3, stride=2, ceil_mode=True),  
                Fire(256, 32, 128, 128),  
                Fire(256, 48, 192, 192),  
                Fire(384, 48, 192, 192),  
                Fire(384, 64, 256, 256),  
                nn.MaxPool2d(kernel_size=3, stride=2, ceil_mode=True),  
                Fire(512, 64, 256, 256),  
            )  
        else:  
            self.features = nn.Sequential(  
                nn.Conv2d(3, 64, kernel_size=3, stride=2),  
                nn.ReLU(inplace=True),  
                nn.MaxPool2d(kernel_size=3, stride=2, ceil_mode=True),  
                Fire(64, 16, 64, 64),  
                Fire(128, 16, 64, 64),  
                nn.MaxPool2d(kernel_size=3, stride=2, ceil_mode=True),  
                Fire(128, 32, 128, 128),  
                Fire(256, 32, 128, 128),  
                nn.MaxPool2d(kernel_size=3, stride=2, ceil_mode=True),  
                Fire(256, 48, 192, 192),  
                Fire(384, 48, 192, 192),  
                Fire(384, 64, 256, 256),  
                Fire(512, 64, 256, 256),  
            )  
        # Final convolution is initialized differently form the rest  
        final_conv = nn.Conv2d(512, self.num_classes, kernel_size=1)  
        self.classifier = nn.Sequential(  
            nn.Dropout(p=0.5), final_conv, nn.ReLU(inplace=True),  
            nn.AdaptiveAvgPool2d((1, 1))  
        )  
  
        for m in self.modules():  
            if isinstance(m, nn.Conv2d):  
                if m is final_conv:  
                    init.normal_(m.weight, mean=0.0, std=0.01)  
                else:  
                    init.kaiming_uniform_(m.weight)  
                if m.bias is not None:  
                    init.constant_(m.bias, 0)  
  
    def forward(self, x):  
        x = self.features(x)  
        x = self.classifier(x)  
        return x.view(x.size(0), self.num_classes)

1.2 MobileNet

mobilenet做的一个工作相当于把卷积核拆分,并提出了逐点卷积depthwise separable convolutions。如下图：

简单解释一下depthwise separable convolutions：假设输入的feature map有C1层，先对每一层先进行3x3的卷积，得到新的C1层的feature map，对新的feature map进行1x1卷积，得到输出的C2层feature map。所以整个过程参数为：3x3xC1+1x1xC1xC2。

class MobileNet(nn.Module):  
    def __init__(self):  
        super(MobileNet, self).__init__()  
  
        def conv_bn(inp, oup, stride):  
            return nn.Sequential(  
                nn.Conv2d(inp, oup, 3, stride, 1, bias=False),  
                nn.BatchNorm2d(oup),  
                nn.ReLU(inplace=True)  
            )  
  
        def conv_dw(inp, oup, stride):  
            return nn.Sequential(  
                nn.Conv2d(inp, inp, 3, stride, 1, groups=inp, bias=False),  
                nn.BatchNorm2d(inp),  
                nn.ReLU(inplace=True),  
      
                nn.Conv2d(inp, oup, 1, 1, 0, bias=False),  
                nn.BatchNorm2d(oup),  
                nn.ReLU(inplace=True),  
            )  
  
        self.model = nn.Sequential(  
            conv_bn(  3,  32, 2),   
            conv_dw( 32,  64, 1),  
            conv_dw( 64, 128, 2),  
            conv_dw(128, 128, 1),  
            conv_dw(128, 256, 2),  
            conv_dw(256, 256, 1),  
            conv_dw(256, 512, 2),  
            conv_dw(512, 512, 1),  
            conv_dw(512, 512, 1),  
            conv_dw(512, 512, 1),  
            conv_dw(512, 512, 1),  
            conv_dw(512, 512, 1),  
            conv_dw(512, 1024, 2),  
            conv_dw(1024, 1024, 1),  
            nn.AvgPool2d(7), )  
        self.fc = nn.Linear(1024, 1000)  
  
    def forward(self, x):  
        x = self.model(x)  
        x = x.view(-1, 1024)  
        x = self.fc(x)  
        return x

后续的Mobilenet v2针对Mobilenet v1，作者进一步提出了改进方案，作者发现在通道数少的情况下不应该接relu这个激活函数，会造成大量节点变0的情况，因此作者提出了类似于resnet的残差概念，将前面还没有置零的部分直接加和到下一层。除此之外，在通道数较少的层仅仅采用线性函数，而取消了非线性的relu激活。

1.3 ShuffleNet

ShuffleNet是由2017年07月发布的轻量级网络，设计用于移动端设备，在MobileNet之后的网络架构。主要的创新点在于使用了分组卷积(group convolution)和通道打乱(channel shuffle)。

分组卷积(group convolution)

分组卷积最早由AlexNet中使用。由于当时的硬件资源有限，训练AlexNet时卷积操作不能全部放在同一个GPU处理，因此作者把特征图分给多个GPU分别进行处理，最后把特征图结果进行连接。

对于输入特征图，将其按照channel-level分为组，每组的通道数为,因此卷积核的深度也要变为，此时可认为有个卷积核去对应处理一个特征图生成个通道的特征图。若预期输出的特征图为通道数为，则总卷积核数也是。

通道打乱(channel shuffle)

通道打乱并不是随机打乱，而是有规律地打乱。

:表示的是使用分组卷积处理未加打乱的特征图。每一个输出通道都对应组内的输入通道。
:对组卷积GConv1处理后的特征图channel进行如上方式的打乱，此时大部分输出通道会对应不同组的输入通道；
是与的等价实现，将通道调整到合适的位置。

ShuffleNet单元

:是ResNet提出的bottleneck层，只是将3x3的卷积替换为depthwise(DW)卷积。
:是1x1卷积变为pointwise分组卷积(GConv),并且加入了通道打乱。按照DW卷积的原论文，在3x3的DW后面不添加非线性因素。最后的点卷积用来匹配相加操作。
:加入了stride=2去减少特征图尺寸，旁路加入平均池化层来维持相同的尺寸，最终在channel水平上连接特征图。

以下是基于PyTorch实现的ShuffleNet：

import torch  
import torch.nn as nn  
import torch.nn.functional as F  
  
  
class ShuffleBlock(nn.Module):  
    def __init__(self, groups):  
        super(ShuffleBlock, self).__init__()  
        self.groups = groups  
  
    def forward(self, x):  
        '''Channel shuffle: [N,C,H,W] -> [N,g,C/g,H,W] -> [N,C/g,g,H,w] -> [N,C,H,W]'''  
        N,C,H,W = x.size()  
        g = self.groups  
        # 维度变换之后必须要使用.contiguous()使得张量在内存连续之后才能调用view函数  
        return x.view(N,g,int(C/g),H,W).permute(0,2,1,3,4).contiguous().view(N,C,H,W)  
  
  
class Bottleneck(nn.Module):  
    def __init__(self, in_planes, out_planes, stride, groups):  
        super(Bottleneck, self).__init__()  
        self.stride = stride  
  
        # bottleneck层中间层的channel数变为输出channel数的1/4  
        mid_planes = int(out_planes/4)  
  
  
        g = 1 if in_planes==24 else groups  
        # 作者提到不在stage2的第一个pointwise层使用组卷积,因为输入channel数量太少,只有24  
        self.conv1 = nn.Conv2d(in_planes, mid_planes, kernel_size=1, groups=g, bias=False)  
        self.bn1 = nn.BatchNorm2d(mid_planes)  
        self.shuffle1 = ShuffleBlock(groups=g)  
        self.conv2 = nn.Conv2d(mid_planes, mid_planes, kernel_size=3, stride=stride, padding=1, groups=mid_planes, bias=False)  
        self.bn2 = nn.BatchNorm2d(mid_planes)  
        self.conv3 = nn.Conv2d(mid_planes, out_planes, kernel_size=1, groups=groups, bias=False)  
        self.bn3 = nn.BatchNorm2d(out_planes)  
  
        self.shortcut = nn.Sequential()  
        if stride == 2:  
            self.shortcut = nn.Sequential(nn.AvgPool2d(3, stride=2, padding=1))  
  
    def forward(self, x):  
        out = F.relu(self.bn1(self.conv1(x)))  
        out = self.shuffle1(out)  
        out = F.relu(self.bn2(self.conv2(out)))  
        out = self.bn3(self.conv3(out))  
        res = self.shortcut(x)  
        out = F.relu(torch.cat([out,res], 1)) if self.stride==2 else F.relu(out+res)  
        return out  
  
  
class ShuffleNet(nn.Module):  
    def __init__(self, cfg):  
        super(ShuffleNet, self).__init__()  
        out_planes = cfg['out_planes']  
        num_blocks = cfg['num_blocks']  
        groups = cfg['groups']  
  
        self.conv1 = nn.Conv2d(3, 24, kernel_size=1, bias=False)  
        self.bn1 = nn.BatchNorm2d(24)  
        self.in_planes = 24  
        self.layer1 = self._make_layer(out_planes[0], num_blocks[0], groups)  
        self.layer2 = self._make_layer(out_planes[1], num_blocks[1], groups)  
        self.layer3 = self._make_layer(out_planes[2], num_blocks[2], groups)  
        self.linear = nn.Linear(out_planes[2], 10)  
  
    def _make_layer(self, out_planes, num_blocks, groups):  
        layers = []  
        for i in range(num_blocks):  
            if i == 0:  
                layers.append(Bottleneck(self.in_planes, out_planes-self.in_planes, stride=2, groups=groups))  
            else:  
                layers.append(Bottleneck(self.in_planes, out_planes, stride=1, groups=groups))  
            self.in_planes = out_planes  
        return nn.Sequential(*layers)  
  
    def forward(self, x):  
        out = F.relu(self.bn1(self.conv1(x)))  
        out = self.layer1(out)  
        out = self.layer2(out)  
        out = self.layer3(out)  
        out = F.avg_pool2d(out, 4)  
        out = out.view(out.size(0), -1)  
        out = self.linear(out)  
        return out  
  
  
def ShuffleNetG2():  
    cfg = {  
        'out_planes': [200,400,800],  
        'num_blocks': [4,8,4],  
        'groups': 2  
    }  
    return ShuffleNet(cfg)  
  
def ShuffleNetG3():  
    cfg = {  
        'out_planes': [240,480,960],  
        'num_blocks': [4,8,4],  
        'groups': 3  
    }  
    return ShuffleNet(cfg)  
  
  
def test():  
    net = ShuffleNetG2()  
    x = torch.randn(1,3,32,32)  
    y = net(x)  
    print(y)  
  
test()

1.4 Xception

Xception并不是真正意义上的轻量化模型，借鉴depth-wise convolution，而depth-wise convolution又是上述几个轻量化模型的关键点，其思想非常值得借鉴。
Xception是基于Inception-V3，并结合了depth-wise convolution，这样做的好处是提高网络效率，以及在同等参数量的情况下，在大规模数据集上，效果要优于Inception-V3。这也提供了另外一种“轻量化”的思路：在硬件资源给定的情况下，尽可能的增加网络效率和性能，也可以理解为充分利用硬件资源。

以下为基于PyTorch实现的Xception代码：

import math  
import torch  
import torch.nn as nn  
import torch.nn.functional as F  
import torch.utils.model_zoo as model_zoo  
from modeling.sync_batchnorm.batchnorm import SynchronizedBatchNorm2d  
  
def fixed_padding(inputs, kernel_size, dilation):  
    kernel_size_effective = kernel_size + (kernel_size - 1) * (dilation - 1)  
    pad_total = kernel_size_effective - 1  
    pad_beg = pad_total // 2  
    pad_end = pad_total - pad_beg  
    padded_inputs = F.pad(inputs, (pad_beg, pad_end, pad_beg, pad_end))  
    return padded_inputs  
  
  
class SeparableConv2d(nn.Module):  
    def __init__(self, inplanes, planes, kernel_size=3, stride=1, dilation=1, bias=False, BatchNorm=None):  
        super(SeparableConv2d, self).__init__()  
        self.conv1 = nn.Conv2d(inplanes, inplanes, kernel_size, stride, 0, dilation, groups=inplanes, bias=bias)  
        self.bn = BatchNorm(inplanes)  
        self.pointwise = nn.Conv2d(inplanes, planes, 1, 1, 0, 1, 1, bias=bias)  
  
    def forward(self, x):  
        x = fixed_padding(x, self.conv1.kernel_size[0], dilation=self.conv1.dilation[0])  
        x = self.conv1(x)  
        x = self.bn(x)  
        x = self.pointwise(x)  
        return x  
  
  
class Block(nn.Module):  
    def __init__(self, inplanes, planes, reps, stride=1, dilation=1, BatchNorm=None,  
                 start_with_relu=True, grow_first=True, is_last=False):  
        super(Block, self).__init__()  
  
        if planes != inplanes or stride != 1:  
            self.skip = nn.Conv2d(inplanes, planes, 1, stride=stride, bias=False)  
            self.skipbn = BatchNorm(planes)  
        else:  
            self.skip = None  
        self.relu = nn.ReLU(inplace=True)  
        rep = []  
  
        filters = inplanes  
        if grow_first:  
            rep.append(self.relu)  
            rep.append(SeparableConv2d(inplanes, planes, 3, 1, dilation, BatchNorm=BatchNorm))  
            rep.append(BatchNorm(planes))  
            filters = planes  
  
        for i in range(reps - 1):  
            rep.append(self.relu)  
            rep.append(SeparableConv2d(filters, filters, 3, 1, dilation, BatchNorm=BatchNorm))  
            rep.append(BatchNorm(filters))  
  
        if not grow_first:  
            rep.append(self.relu)  
            rep.append(SeparableConv2d(inplanes, planes, 3, 1, dilation, BatchNorm=BatchNorm))  
            rep.append(BatchNorm(planes))  
  
        if stride != 1:  
            rep.append(self.relu)  
            rep.append(SeparableConv2d(planes, planes, 3, 2, BatchNorm=BatchNorm))  
            rep.append(BatchNorm(planes))  
  
        if stride == 1 and is_last:  
            rep.append(self.relu)  
            rep.append(SeparableConv2d(planes, planes, 3, 1, BatchNorm=BatchNorm))  
            rep.append(BatchNorm(planes))  
  
        if not start_with_relu:  
            rep = rep[1:]  
  
        self.rep = nn.Sequential(*rep)  
  
    def forward(self, inp):  
        x = self.rep(inp)  
        if self.skip is not None:  
            skip = self.skip(inp)  
            skip = self.skipbn(skip)  
        else:  
            skip = inp  
        x = x + skip  
        return x  
  
  
class AlignedXception(nn.Module):  
    def __init__(self, output_stride, BatchNorm,  
                 pretrained=True):  
        super(AlignedXception, self).__init__()  
        if output_stride == 16:  
            entry_block3_stride = 2  
            middle_block_dilation = 1  
            exit_block_dilations = (1, 2)  
        elif output_stride == 8:  
            entry_block3_stride = 1  
            middle_block_dilation = 2  
            exit_block_dilations = (2, 4)  
        else:  
            raise NotImplementedError  
  
        # Entry flow  
        self.conv1 = nn.Conv2d(3, 32, 3, stride=2, padding=1, bias=False)  
        self.bn1 = BatchNorm(32)  
        self.relu = nn.ReLU(inplace=True)  
  
        self.conv2 = nn.Conv2d(32, 64, 3, stride=1, padding=1, bias=False)  
        self.bn2 = BatchNorm(64)  
  
        self.block1 = Block(64, 128, reps=2, stride=2, BatchNorm=BatchNorm, start_with_relu=False)  
        self.block2 = Block(128, 256, reps=2, stride=2, BatchNorm=BatchNorm, start_with_relu=False, grow_first=True)  
        self.block3 = Block(256, 728, reps=2, stride=entry_block3_stride, BatchNorm=BatchNorm, start_with_relu=True, grow_first=True, is_last=True)  
  
        # Middle flow  
        self.block4  = Block(728, 728, reps=3, stride=1, dilation=middle_block_dilation,  
                             BatchNorm=BatchNorm, start_with_relu=True, grow_first=True)  
        self.block5  = Block(728, 728, reps=3, stride=1, dilation=middle_block_dilation,  
                             BatchNorm=BatchNorm, start_with_relu=True, grow_first=True)  
        self.block6  = Block(728, 728, reps=3, stride=1, dilation=middle_block_dilation,  
                             BatchNorm=BatchNorm, start_with_relu=True, grow_first=True)  
        self.block7  = Block(728, 728, reps=3, stride=1, dilation=middle_block_dilation,  
                             BatchNorm=BatchNorm, start_with_relu=True, grow_first=True)  
        self.block8  = Block(728, 728, reps=3, stride=1, dilation=middle_block_dilation,  
                             BatchNorm=BatchNorm, start_with_relu=True, grow_first=True)  
        self.block9  = Block(728, 728, reps=3, stride=1, dilation=middle_block_dilation,  
                             BatchNorm=BatchNorm, start_with_relu=True, grow_first=True)  
        self.block10 = Block(728, 728, reps=3, stride=1, dilation=middle_block_dilation,  
                             BatchNorm=BatchNorm, start_with_relu=True, grow_first=True)  
        self.block11 = Block(728, 728, reps=3, stride=1, dilation=middle_block_dilation,  
                             BatchNorm=BatchNorm, start_with_relu=True, grow_first=True)  
        self.block12 = Block(728, 728, reps=3, stride=1, dilation=middle_block_dilation,  
                             BatchNorm=BatchNorm, start_with_relu=True, grow_first=True)  
        self.block13 = Block(728, 728, reps=3, stride=1, dilation=middle_block_dilation,  
                             BatchNorm=BatchNorm, start_with_relu=True, grow_first=True)  
        self.block14 = Block(728, 728, reps=3, stride=1, dilation=middle_block_dilation,  
                             BatchNorm=BatchNorm, start_with_relu=True, grow_first=True)  
        self.block15 = Block(728, 728, reps=3, stride=1, dilation=middle_block_dilation,  
                             BatchNorm=BatchNorm, start_with_relu=True, grow_first=True)  
        self.block16 = Block(728, 728, reps=3, stride=1, dilation=middle_block_dilation,  
                             BatchNorm=BatchNorm, start_with_relu=True, grow_first=True)  
        self.block17 = Block(728, 728, reps=3, stride=1, dilation=middle_block_dilation,  
                             BatchNorm=BatchNorm, start_with_relu=True, grow_first=True)  
        self.block18 = Block(728, 728, reps=3, stride=1, dilation=middle_block_dilation,  
                             BatchNorm=BatchNorm, start_with_relu=True, grow_first=True)  
        self.block19 = Block(728, 728, reps=3, stride=1, dilation=middle_block_dilation,  
                             BatchNorm=BatchNorm, start_with_relu=True, grow_first=True)  
  
        # Exit flow  
        self.block20 = Block(728, 1024, reps=2, stride=1, dilation=exit_block_dilations[0], BatchNorm=BatchNorm, start_with_relu=True, grow_first=False, is_last=True)  
  
        self.conv3 = SeparableConv2d(1024, 1536, 3, stride=1, dilation=exit_block_dilations[1], BatchNorm=BatchNorm)  
        self.bn3 = BatchNorm(1536)  
  
        self.conv4 = SeparableConv2d(1536, 1536, 3, stride=1, dilation=exit_block_dilations[1], BatchNorm=BatchNorm)  
        self.bn4 = BatchNorm(1536)  
  
        self.conv5 = SeparableConv2d(1536, 2048, 3, stride=1, dilation=exit_block_dilations[1], BatchNorm=BatchNorm)  
        self.bn5 = BatchNorm(2048)  
  
        # Init weights  
        self._init_weight()  
  
        # Load pretrained model  
        if pretrained:  
            self._load_pretrained_model()  
  
    def forward(self, x):  
        # Entry flow  
        x = self.conv1(x)  
        x = self.bn1(x)  
        x = self.relu(x)  
  
        x = self.conv2(x)  
        x = self.bn2(x)  
        x = self.relu(x)  
  
        x = self.block1(x)  
        # add relu here  
        x = self.relu(x)  
        low_level_feat = x  
        x = self.block2(x)  
        x = self.block3(x)  
  
        # Middle flow  
        x = self.block4(x)  
        x = self.block5(x)  
        x = self.block6(x)  
        x = self.block7(x)  
        x = self.block8(x)  
        x = self.block9(x)  
        x = self.block10(x)  
        x = self.block11(x)  
        x = self.block12(x)  
        x = self.block13(x)  
        x = self.block14(x)  
        x = self.block15(x)  
        x = self.block16(x)  
        x = self.block17(x)  
        x = self.block18(x)  
        x = self.block19(x)  
  
        # Exit flow  
        x = self.block20(x)  
        x = self.relu(x)  
        x = self.conv3(x)  
        x = self.bn3(x)  
        x = self.relu(x)  
  
        x = self.conv4(x)  
        x = self.bn4(x)  
        x = self.relu(x)  
  
        x = self.conv5(x)  
        x = self.bn5(x)  
        x = self.relu(x)  
  
        return x, low_level_feat  
  
    def _init_weight(self):  
        for m in self.modules():  
            if isinstance(m, nn.Conv2d):  
                n = m.kernel_size[0] * m.kernel_size[1] * m.out_channels  
                m.weight.data.normal_(0, math.sqrt(2. / n))  
            elif isinstance(m, SynchronizedBatchNorm2d):  
                m.weight.data.fill_(1)  
                m.bias.data.zero_()  
            elif isinstance(m, nn.BatchNorm2d):  
                m.weight.data.fill_(1)  
                m.bias.data.zero_()  
  
  
    def _load_pretrained_model(self):  
        pretrain_dict = model_zoo.load_url('http://data.lip6.fr/cadene/pretrainedmodels/xception-b5690688.pth')  
        model_dict = {}  
        state_dict = self.state_dict()  
  
        for k, v in pretrain_dict.items():  
            if k in state_dict:  
                if 'pointwise' in k:  
                    v = v.unsqueeze(-1).unsqueeze(-1)  
                if k.startswith('block11'):  
                    model_dict[k] = v  
                    model_dict[k.replace('block11', 'block12')] = v  
                    model_dict[k.replace('block11', 'block13')] = v  
                    model_dict[k.replace('block11', 'block14')] = v  
                    model_dict[k.replace('block11', 'block15')] = v  
                    model_dict[k.replace('block11', 'block16')] = v  
                    model_dict[k.replace('block11', 'block17')] = v  
                    model_dict[k.replace('block11', 'block18')] = v  
                    model_dict[k.replace('block11', 'block19')] = v  
                elif k.startswith('block12'):  
                    model_dict[k.replace('block12', 'block20')] = v  
                elif k.startswith('bn3'):  
                    model_dict[k] = v  
                    model_dict[k.replace('bn3', 'bn4')] = v  
                elif k.startswith('conv4'):  
                    model_dict[k.replace('conv4', 'conv5')] = v  
                elif k.startswith('bn4'):  
                    model_dict[k.replace('bn4', 'bn5')] = v  
                else:  
                    model_dict[k] = v  
        state_dict.update(model_dict)  
        self.load_state_dict(state_dict)  
  
  
if __name__ == "__main__":  
    import torch  
    model = AlignedXception(BatchNorm=nn.BatchNorm2d, pretrained=True, output_stride=16)  
    input = torch.rand(1, 3, 512, 512)  
    output, low_level_feat = model(input)  
    print(output.size())  
    print(low_level_feat.size())

轻量化模型小结

轻量化主要得益于depth-wise convolution，因此大家可以考虑采用depth-wise convolution来设计自己的轻量化网络，但是要注意信息流通不畅问题。
解决“信息流通不畅”的问题：

MobileNet采用了point-wise convolution
ShuffleNet采用的是channel shuffle

MobileNet相较于ShuffleNet使用了更多的卷积，计算量和参数量上是劣势，但是增加了非线性层数，理论上特征更抽象，更高级了；ShuffleNet则省去point-wise convolution，采用channel shuffle，简单明了，省去卷积步骤，减少了参数量。
下图是对全文以及上一篇笔记的经典卷积神经网络的总结，可以说是很全面的了！！！

推荐专栏文章

更多嵌入式AI算法部署等请关注极术嵌入式AI专栏。