概述

随着深度学习在移动端和边缘设备上的应用需求增长,高效CNN架构成为关键研究方向。本文档系统介绍从MobileNet到ConvNeXt的现代高效卷积神经网络架构设计原理与关键技术。12


深度可分离卷积

标准卷积的计算复杂度

标准卷积层的计算量为:

其中 是输出特征图尺寸, 是卷积核尺寸。

深度可分离卷积

深度可分离卷积将标准卷积分解为两个步骤:

  1. 深度卷积(Depthwise Conv):每个输入通道独立进行卷积
  2. 逐点卷积(Pointwise Conv) 卷积融合通道信息
class DepthwiseSeparableConv(nn.Module):
    """深度可分离卷积实现"""
    def __init__(self, in_channels, out_channels, kernel_size=3, stride=1):
        super().__init__()
        # 深度卷积:每个通道独立卷积
        self.depthwise = nn.Conv2d(
            in_channels, in_channels, kernel_size,
            stride=stride, padding=kernel_size//2,
            groups=in_channels  # 关键:groups=in_channels
        )
        # 逐点卷积:通道融合
        self.pointwise = nn.Conv2d(in_channels, out_channels, 1)
    
    def forward(self, x):
        x = self.depthwise(x)
        x = self.pointwise(x)
        return x

计算量对比

卷积类型计算量压缩比
标准卷积1x
深度可分离

关键洞察:当 较大时,深度可分离卷积的计算量约为标准卷积的 。对于 卷积核,压缩比约为


MobileNet系列

MobileNet V1:深度可分离卷积的开创

MobileNet V1(2017)首次将深度可分离卷积引入大规模网络设计:

核心贡献

  • 提出宽度乘数(Width Multiplier):均匀缩放通道数
  • 提出分辨率乘数(Resolution Multiplier):缩放输入分辨率

架构结构

输入 → Conv 3×3 (stride=2) → [DW Conv + PW Conv] × N → AvgPool → FC → Softmax

参数量对比

架构ImageNet Top-1参数量
VGG-1670%138M
MobileNet-22470.6%4.2M

MobileNet V2:倒置残差与线性瓶颈

核心创新

  1. 倒置残差结构(Inverted Residual)

    • 标准残差:宽阔 → 压缩 → 宽阔
    • 倒置残差:压缩 → 宽阔 → 压缩
  2. 线性瓶颈(Linear Bottleneck)

    • 最后一层不使用ReLU6,避免信息丢失
class InvertedResidual(nn.Module):
    """MobileNetV2倒置残差块"""
    def __init__(self, in_channels, out_channels, stride, expand_ratio=6):
        super().__init__()
        mid_channels = in_channels * expand_ratio
        
        self.use_residual = (stride == 1 and in_channels == out_channels)
        
        layers = []
        # 逐点卷积:扩展通道
        if expand_ratio != 1:
            layers.extend([
                nn.Conv2d(in_channels, mid_channels, 1, bias=False),
                nn.BatchNorm2d(mid_channels),
                nn.ReLU6(inplace=True)
            ])
        
        # 深度卷积
        layers.extend([
            nn.Conv2d(mid_channels, mid_channels, 3, stride, 1,
                      groups=mid_channels, bias=False),
            nn.BatchNorm2d(mid_channels),
            nn.ReLU6(inplace=True)
        ])
        
        # 逐点卷积:压缩通道 + 线性激活
        layers.append(nn.Conv2d(mid_channels, out_channels, 1, bias=False))
        layers.append(nn.BatchNorm2d(out_channels))
        
        self.conv = nn.Sequential(*layers)
    
    def forward(self, x):
        if self.use_residual:
            return x + self.conv(x)
        return self.conv(x)

倒置残差的数学解释

设输入维度为 ,扩展因子为 ,则:

结构中间维度参数量
标准残差
倒置残差

倒置残差在扩展阶段允许更丰富的特征变换,同时通过深度卷积保持计算效率。

MobileNet V3:神经架构搜索与NetAdapt

AutoML驱动的设计

  1. 平台感知搜索:针对特定硬件(CPU、GPU、NPU)优化
  2. NetAdapt:逐层微调以满足延迟约束

新组件

class SqueezeExcitation(nn.Module):
    """SE模块"""
    def __init__(self, channels, reduction=16):
        super().__init__()
        self.squeeze = nn.AdaptiveAvgPool2d(1)
        self.excitation = nn.Sequential(
            nn.Linear(channels, channels // reduction, bias=False),
            nn.ReLU(inplace=True),
            nn.Linear(channels // reduction, channels, bias=False),
            nn.Sigmoid()
        )
    
    def forward(self, x):
        b, c, _, _ = x.size()
        y = self.squeeze(x).view(b, c)
        y = self.excitation(y).view(b, c, 1, 1)
        return x * y.expand_as(x)

h-swish激活函数

近似swish但更计算友好。

MobileNet V4:通用移动模型

Universal Inverted Bottleneck (UIB)

UIB统一了多种结构变体:

class UniversalInvertedBottleneck(nn.Module):
    """UIB块:支持多种配置"""
    def __init__(self, in_channels, out_channels, stride, 
                 use_ib=True, use_ffn=True, use_extra_dw=False):
        super().__init__()
        
        self.use_residual = (stride == 1 and in_channels == out_channels)
        
        # IB: 倒置瓶颈
        if use_ib:
            expand_ratio = 4
            mid_channels = in_channels * expand_ratio
            self.ib = nn.Sequential(
                nn.Conv2d(in_channels, mid_channels, 1, bias=False),
                nn.BatchNorm2d(mid_channels),
                nn.GELU(),
                nn.Conv2d(mid_channels, mid_channels, 3, stride, 1,
                         groups=mid_channels, bias=False),
                nn.BatchNorm2d(mid_channels),
                nn.Conv2d(mid_channels, out_channels, 1, bias=False),
                nn.BatchNorm2d(out_channels)
            )
        
        # Extra Depthwise: 额外深度卷积
        if use_extra_dw:
            self.extra_dw = nn.Conv2d(
                in_channels, in_channels, 5, stride, 2,
                groups=in_channels, bias=False
            )
        
        # FFN: 前馈网络
        if use_ffn:
            self.ffn = nn.Sequential(
                nn.Conv2d(in_channels, in_channels * 2, 1, bias=False),
                nn.GELU(),
                nn.Conv2d(in_channels * 2, out_channels, 1, bias=False),
                nn.BatchNorm2d(out_channels)
            )
    
    def forward(self, x):
        out = x
        if hasattr(self, 'ib'):
            out = self.ib(x)
        if hasattr(self, 'extra_dw'):
            out = out + self.extra_dw(x)
        if hasattr(self, 'ffn'):
            out = out + self.ffn(x)
        
        if self.use_residual:
            return out
        return out

Mobile MQA注意力

针对移动加速器优化的注意力机制:

class MobileMQA(nn.Module):
    """移动MQA注意力"""
    def __init__(self, dim, num_heads=4, window_size=7):
        super().__init__()
        self.num_heads = num_heads
        self.head_dim = dim // num_heads
        self.scale = self.head_dim ** -0.5
        
        # 简化的注意力计算
        self.qkv = nn.Linear(dim, dim * 3)
        self.proj = nn.Linear(dim, dim)
        
        # 局部窗口注意力
        self.window_size = window_size
    
    def forward(self, x):
        B, N, C = x.shape
        qkv = self.qkv(x).reshape(B, N, 3, self.num_heads, self.head_dim)
        q, k, v = qkv.unbind(2)
        
        # 简化的注意力计算
        attn = (q @ k.transpose(-2, -1)) * self.scale
        attn = attn.softmax(dim=-1)
        
        out = (attn @ v).reshape(B, N, C)
        return self.proj(out)

EfficientNet:复合缩放策略

复合缩放公式

EfficientNet(2019)提出均匀缩放深度、宽度和分辨率:

NAS优化的基线网络

EfficientNet-B0基线架构

阶段操作通道数层数分辨率
1Conv3×3321224
2MBConv1, k3×3161112
3MBConv6, k3×324256
4MBConv6, k5×540228
5MBConv6, k3×380314
6MBConv6, k5×5112314
7MBConv6, k5×519247
8Conv1×1 + Pool + FC32017

EfficientNetV2:训练感知优化

改进点

  1. 改进的缩放策略
  2. 更小的扩展比例:减少深度可分离卷积的扩展开销
  3. 渐进式学习:自适应调整正则化强度
class ProgressiveLearningSchedule:
    """渐进式学习策略"""
    def __init__(self, min_size=128, max_size=380, 
                 min_r=0.8, max_r=1.0):
        self.min_size = min_size
        self.max_size = max_size
        self.min_r = min_r
        self.max_r = max_r
    
    def get_config(self, epoch, total_epochs):
        progress = epoch / total_epochs
        size = int(
            self.min_size + (self.max_size - self.min_size) * progress
        )
        reg = self.max_r - (self.max_r - self.min_r) * progress
        return {'image_size': size, 'reg_strength': reg}

EfficientNet系列性能对比

模型Top-1参数量FLOPs
EfficientNet-B077.1%5.3M39M
EfficientNet-B482.6%19M4.2B
EfficientNet-B784.4%66M37B
EfficientNetV2-S84.9%21M8.4B

ConvNeXt:现代CNN复兴

从Transformer到CNN

ConvNeXt(2022)系统地将Transformer设计引入CNN:

Transformer组件ConvNeXt对应
大核卷积7×7 Depthwise Conv
LayerNormGroupNorm → LN
GELUGELU
FFNInverted Bottleneck
类别token全局池化

ConvNeXt架构

class ConvNeXtBlock(nn.Module):
    """ConvNeXt块"""
    def __init__(self, dim, drop_path=0.):
        super().__init__()
        # 7×7深度可分离卷积(类似大核注意力)
        self.dwconv = nn.Conv2d(dim, dim, 7, padding=3, groups=dim)
        self.norm = nn.LayerNorm(dim, eps=1e-6)
        # 通道MLP(倒置瓶颈)
        self.pwconv1 = nn.Linear(dim, 4 * dim)
        self.act = nn.GELU()
        self.pwconv2 = nn.Linear(4 * dim, dim)
        
        # DropPath正则化
        self.drop_path = DropPath(drop_path) if drop_path > 0. else nn.Identity()
    
    def forward(self, x):
        input = x
        x = self.dwconv(x)
        x = x.permute(0, 2, 3, 1)  # (N, C, H, W) -> (N, H, W, C)
        x = self.norm(x)
        x = self.pwconv1(x)
        x = self.act(x)
        x = self.pwconv2(x)
        x = x.permute(0, 3, 1, 2)  # (N, H, W, C) -> (N, C, H, W)
        
        return input + self.drop_path(x)

ConvNeXt变体

变体通道数层数参数量
ConvNeXt-T[96, 192, 384, 768][3, 3, 9, 3]28M
ConvNeXt-S[96, 192, 384, 768][3, 3, 27, 3]50M
ConvNeXt-B[128, 256, 512, 1024][3, 3, 27, 3]89M
ConvNeXt-L[192, 384, 768, 1536][3, 3, 27, 3]198M

ConvNeXt V2:全局响应归一化

Global Response Normalization (GRN)

class GRN(nn.Module):
    """全局响应归一化"""
    def __init__(self, dim):
        super().__init__()
        self.gamma = nn.Parameter(torch.zeros(dim))
        self.beta = nn.Parameter(torch.zeros(dim))
    
    def forward(self, x):
        # x: (N, H, W, C)
        # 在空间维度计算范数
        ctx_norm = torch.norm(x, p=2, dim=(1, 2), keepdim=True)
        x = x / (ctx_norm + 1e-6)
        return self.gamma * x + self.beta

ConvNeXt V2性能

模型ImageNet Top-1MAE (COCO)Box AP
ConvNeXt-B83.8%25.853.7
ConvNeXt-V2-B84.9%24.754.8

高效架构设计原则总结

1. 空间冗余利用

  • 深度可分离卷积:将通道混合与空间混合分离
  • 池化操作:主动降低空间分辨率

2. 通道冗余利用

  • 倒置残差:先压缩后扩展,减少中间表示
  • 逐点卷积 卷积高效融合通道

3. 计算-精度权衡

  • 复合缩放:深度/宽度/分辨率联合优化
  • 渐进式学习:训练时使用较小分辨率

4. 硬件协同设计

  • Mobile MQA:针对移动加速器优化
  • NAS驱动的搜索:平台感知设计

PyTorch实现示例

import torch
import torch.nn as nn
 
class MobileNetV3Like(nn.Module):
    """简化的MobileNetV3风格网络"""
    def __init__(self, num_classes=1000):
        super().__init__()
        
        # 初始卷积
        self.stem = nn.Sequential(
            nn.Conv2d(3, 16, 3, 2, 1, bias=False),
            nn.BatchNorm2d(16),
            nn.Hardswish(inplace=True)
        )
        
        # 中间阶段
        self.stages = nn.ModuleList([
            self._make_stage(16, 16, 1, 1, 'h-swish'),
            self._make_stage(16, 24, 6, 2, 'h-swish'),
            self._make_stage(24, 40, 6, 2, 'relu'),
            self._make_stage(40, 80, 6, 2, 'h-swish'),
            self._make_stage(80, 112, 6, 1, 'h-swish'),
            self._make_stage(112, 160, 6, 2, 'h-swish'),
        ])
        
        # 头部
        self.head = nn.Sequential(
            nn.Conv2d(160, 960, 1, bias=False),
            nn.BatchNorm2d(960),
            nn.Hardswish(inplace=True),
            nn.AdaptiveAvgPool2d(1),
            nn.Conv2d(960, 1280, 1),
            nn.Hardswish(inplace=True),
            nn.Conv2d(1280, num_classes, 1)
        )
    
    def _make_stage(self, in_ch, out_ch, exp_ratio, stride, act):
        mid_ch = in_ch * exp_ratio
        layers = [
            nn.Conv2d(in_ch, mid_ch, 1, bias=False),
            nn.BatchNorm2d(mid_ch),
            nn.ReLU() if act == 'relu' else nn.Hardswish(inplace=True),
            nn.Conv2d(mid_ch, mid_ch, 3, stride, 1, 
                      groups=mid_ch, bias=False),
            nn.BatchNorm2d(mid_ch),
            nn.ReLU() if act == 'relu' else nn.Hardswish(inplace=True),
            nn.Conv2d(mid_ch, out_ch, 1, bias=False),
            nn.BatchNorm2d(out_ch)
        ]
        return nn.Sequential(*layers)
    
    def forward(self, x):
        x = self.stem(x)
        for stage in self.stages:
            x = stage(x)
        x = self.head(x)
        return x.flatten(1)

参考资料


相关阅读

Footnotes

  1. Howard, A., et al. (2017). “MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications”. arXiv:1704.04861.

  2. Tan, M., & Le, Q. V. (2019). “EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks”. ICML 2019.