概述
随着深度学习在移动端和边缘设备上的应用需求增长,高效CNN架构成为关键研究方向。本文档系统介绍从MobileNet到ConvNeXt的现代高效卷积神经网络架构设计原理与关键技术。12
深度可分离卷积
标准卷积的计算复杂度
标准卷积层的计算量为:
其中 是输出特征图尺寸, 是卷积核尺寸。
深度可分离卷积
深度可分离卷积将标准卷积分解为两个步骤:
- 深度卷积(Depthwise Conv):每个输入通道独立进行卷积
- 逐点卷积(Pointwise Conv): 卷积融合通道信息
class DepthwiseSeparableConv(nn.Module):
"""深度可分离卷积实现"""
def __init__(self, in_channels, out_channels, kernel_size=3, stride=1):
super().__init__()
# 深度卷积:每个通道独立卷积
self.depthwise = nn.Conv2d(
in_channels, in_channels, kernel_size,
stride=stride, padding=kernel_size//2,
groups=in_channels # 关键:groups=in_channels
)
# 逐点卷积:通道融合
self.pointwise = nn.Conv2d(in_channels, out_channels, 1)
def forward(self, x):
x = self.depthwise(x)
x = self.pointwise(x)
return x计算量对比
| 卷积类型 | 计算量 | 压缩比 |
|---|---|---|
| 标准卷积 | 1x | |
| 深度可分离 |
关键洞察:当 较大时,深度可分离卷积的计算量约为标准卷积的 。对于 卷积核,压缩比约为 。
MobileNet系列
MobileNet V1:深度可分离卷积的开创
MobileNet V1(2017)首次将深度可分离卷积引入大规模网络设计:
核心贡献:
- 提出宽度乘数(Width Multiplier):均匀缩放通道数
- 提出分辨率乘数(Resolution Multiplier):缩放输入分辨率
架构结构:
输入 → Conv 3×3 (stride=2) → [DW Conv + PW Conv] × N → AvgPool → FC → Softmax
参数量对比:
| 架构 | ImageNet Top-1 | 参数量 |
|---|---|---|
| VGG-16 | 70% | 138M |
| MobileNet-224 | 70.6% | 4.2M |
MobileNet V2:倒置残差与线性瓶颈
核心创新:
-
倒置残差结构(Inverted Residual):
- 标准残差:宽阔 → 压缩 → 宽阔
- 倒置残差:压缩 → 宽阔 → 压缩
-
线性瓶颈(Linear Bottleneck):
- 最后一层不使用ReLU6,避免信息丢失
class InvertedResidual(nn.Module):
"""MobileNetV2倒置残差块"""
def __init__(self, in_channels, out_channels, stride, expand_ratio=6):
super().__init__()
mid_channels = in_channels * expand_ratio
self.use_residual = (stride == 1 and in_channels == out_channels)
layers = []
# 逐点卷积:扩展通道
if expand_ratio != 1:
layers.extend([
nn.Conv2d(in_channels, mid_channels, 1, bias=False),
nn.BatchNorm2d(mid_channels),
nn.ReLU6(inplace=True)
])
# 深度卷积
layers.extend([
nn.Conv2d(mid_channels, mid_channels, 3, stride, 1,
groups=mid_channels, bias=False),
nn.BatchNorm2d(mid_channels),
nn.ReLU6(inplace=True)
])
# 逐点卷积:压缩通道 + 线性激活
layers.append(nn.Conv2d(mid_channels, out_channels, 1, bias=False))
layers.append(nn.BatchNorm2d(out_channels))
self.conv = nn.Sequential(*layers)
def forward(self, x):
if self.use_residual:
return x + self.conv(x)
return self.conv(x)倒置残差的数学解释:
设输入维度为 ,扩展因子为 ,则:
| 结构 | 中间维度 | 参数量 |
|---|---|---|
| 标准残差 | ||
| 倒置残差 |
倒置残差在扩展阶段允许更丰富的特征变换,同时通过深度卷积保持计算效率。
MobileNet V3:神经架构搜索与NetAdapt
AutoML驱动的设计:
- 平台感知搜索:针对特定硬件(CPU、GPU、NPU)优化
- NetAdapt:逐层微调以满足延迟约束
新组件:
class SqueezeExcitation(nn.Module):
"""SE模块"""
def __init__(self, channels, reduction=16):
super().__init__()
self.squeeze = nn.AdaptiveAvgPool2d(1)
self.excitation = nn.Sequential(
nn.Linear(channels, channels // reduction, bias=False),
nn.ReLU(inplace=True),
nn.Linear(channels // reduction, channels, bias=False),
nn.Sigmoid()
)
def forward(self, x):
b, c, _, _ = x.size()
y = self.squeeze(x).view(b, c)
y = self.excitation(y).view(b, c, 1, 1)
return x * y.expand_as(x)h-swish激活函数:
近似swish但更计算友好。
MobileNet V4:通用移动模型
Universal Inverted Bottleneck (UIB):
UIB统一了多种结构变体:
class UniversalInvertedBottleneck(nn.Module):
"""UIB块:支持多种配置"""
def __init__(self, in_channels, out_channels, stride,
use_ib=True, use_ffn=True, use_extra_dw=False):
super().__init__()
self.use_residual = (stride == 1 and in_channels == out_channels)
# IB: 倒置瓶颈
if use_ib:
expand_ratio = 4
mid_channels = in_channels * expand_ratio
self.ib = nn.Sequential(
nn.Conv2d(in_channels, mid_channels, 1, bias=False),
nn.BatchNorm2d(mid_channels),
nn.GELU(),
nn.Conv2d(mid_channels, mid_channels, 3, stride, 1,
groups=mid_channels, bias=False),
nn.BatchNorm2d(mid_channels),
nn.Conv2d(mid_channels, out_channels, 1, bias=False),
nn.BatchNorm2d(out_channels)
)
# Extra Depthwise: 额外深度卷积
if use_extra_dw:
self.extra_dw = nn.Conv2d(
in_channels, in_channels, 5, stride, 2,
groups=in_channels, bias=False
)
# FFN: 前馈网络
if use_ffn:
self.ffn = nn.Sequential(
nn.Conv2d(in_channels, in_channels * 2, 1, bias=False),
nn.GELU(),
nn.Conv2d(in_channels * 2, out_channels, 1, bias=False),
nn.BatchNorm2d(out_channels)
)
def forward(self, x):
out = x
if hasattr(self, 'ib'):
out = self.ib(x)
if hasattr(self, 'extra_dw'):
out = out + self.extra_dw(x)
if hasattr(self, 'ffn'):
out = out + self.ffn(x)
if self.use_residual:
return out
return outMobile MQA注意力:
针对移动加速器优化的注意力机制:
class MobileMQA(nn.Module):
"""移动MQA注意力"""
def __init__(self, dim, num_heads=4, window_size=7):
super().__init__()
self.num_heads = num_heads
self.head_dim = dim // num_heads
self.scale = self.head_dim ** -0.5
# 简化的注意力计算
self.qkv = nn.Linear(dim, dim * 3)
self.proj = nn.Linear(dim, dim)
# 局部窗口注意力
self.window_size = window_size
def forward(self, x):
B, N, C = x.shape
qkv = self.qkv(x).reshape(B, N, 3, self.num_heads, self.head_dim)
q, k, v = qkv.unbind(2)
# 简化的注意力计算
attn = (q @ k.transpose(-2, -1)) * self.scale
attn = attn.softmax(dim=-1)
out = (attn @ v).reshape(B, N, C)
return self.proj(out)EfficientNet:复合缩放策略
复合缩放公式
EfficientNet(2019)提出均匀缩放深度、宽度和分辨率:
NAS优化的基线网络
EfficientNet-B0基线架构:
| 阶段 | 操作 | 通道数 | 层数 | 分辨率 |
|---|---|---|---|---|
| 1 | Conv3×3 | 32 | 1 | 224 |
| 2 | MBConv1, k3×3 | 16 | 1 | 112 |
| 3 | MBConv6, k3×3 | 24 | 2 | 56 |
| 4 | MBConv6, k5×5 | 40 | 2 | 28 |
| 5 | MBConv6, k3×3 | 80 | 3 | 14 |
| 6 | MBConv6, k5×5 | 112 | 3 | 14 |
| 7 | MBConv6, k5×5 | 192 | 4 | 7 |
| 8 | Conv1×1 + Pool + FC | 320 | 1 | 7 |
EfficientNetV2:训练感知优化
改进点:
- 改进的缩放策略:
- 更小的扩展比例:减少深度可分离卷积的扩展开销
- 渐进式学习:自适应调整正则化强度
class ProgressiveLearningSchedule:
"""渐进式学习策略"""
def __init__(self, min_size=128, max_size=380,
min_r=0.8, max_r=1.0):
self.min_size = min_size
self.max_size = max_size
self.min_r = min_r
self.max_r = max_r
def get_config(self, epoch, total_epochs):
progress = epoch / total_epochs
size = int(
self.min_size + (self.max_size - self.min_size) * progress
)
reg = self.max_r - (self.max_r - self.min_r) * progress
return {'image_size': size, 'reg_strength': reg}EfficientNet系列性能对比
| 模型 | Top-1 | 参数量 | FLOPs |
|---|---|---|---|
| EfficientNet-B0 | 77.1% | 5.3M | 39M |
| EfficientNet-B4 | 82.6% | 19M | 4.2B |
| EfficientNet-B7 | 84.4% | 66M | 37B |
| EfficientNetV2-S | 84.9% | 21M | 8.4B |
ConvNeXt:现代CNN复兴
从Transformer到CNN
ConvNeXt(2022)系统地将Transformer设计引入CNN:
| Transformer组件 | ConvNeXt对应 |
|---|---|
| 大核卷积 | 7×7 Depthwise Conv |
| LayerNorm | GroupNorm → LN |
| GELU | GELU |
| FFN | Inverted Bottleneck |
| 类别token | 全局池化 |
ConvNeXt架构
class ConvNeXtBlock(nn.Module):
"""ConvNeXt块"""
def __init__(self, dim, drop_path=0.):
super().__init__()
# 7×7深度可分离卷积(类似大核注意力)
self.dwconv = nn.Conv2d(dim, dim, 7, padding=3, groups=dim)
self.norm = nn.LayerNorm(dim, eps=1e-6)
# 通道MLP(倒置瓶颈)
self.pwconv1 = nn.Linear(dim, 4 * dim)
self.act = nn.GELU()
self.pwconv2 = nn.Linear(4 * dim, dim)
# DropPath正则化
self.drop_path = DropPath(drop_path) if drop_path > 0. else nn.Identity()
def forward(self, x):
input = x
x = self.dwconv(x)
x = x.permute(0, 2, 3, 1) # (N, C, H, W) -> (N, H, W, C)
x = self.norm(x)
x = self.pwconv1(x)
x = self.act(x)
x = self.pwconv2(x)
x = x.permute(0, 3, 1, 2) # (N, H, W, C) -> (N, C, H, W)
return input + self.drop_path(x)ConvNeXt变体
| 变体 | 通道数 | 层数 | 参数量 |
|---|---|---|---|
| ConvNeXt-T | [96, 192, 384, 768] | [3, 3, 9, 3] | 28M |
| ConvNeXt-S | [96, 192, 384, 768] | [3, 3, 27, 3] | 50M |
| ConvNeXt-B | [128, 256, 512, 1024] | [3, 3, 27, 3] | 89M |
| ConvNeXt-L | [192, 384, 768, 1536] | [3, 3, 27, 3] | 198M |
ConvNeXt V2:全局响应归一化
Global Response Normalization (GRN):
class GRN(nn.Module):
"""全局响应归一化"""
def __init__(self, dim):
super().__init__()
self.gamma = nn.Parameter(torch.zeros(dim))
self.beta = nn.Parameter(torch.zeros(dim))
def forward(self, x):
# x: (N, H, W, C)
# 在空间维度计算范数
ctx_norm = torch.norm(x, p=2, dim=(1, 2), keepdim=True)
x = x / (ctx_norm + 1e-6)
return self.gamma * x + self.betaConvNeXt V2性能:
| 模型 | ImageNet Top-1 | MAE (COCO) | Box AP |
|---|---|---|---|
| ConvNeXt-B | 83.8% | 25.8 | 53.7 |
| ConvNeXt-V2-B | 84.9% | 24.7 | 54.8 |
高效架构设计原则总结
1. 空间冗余利用
- 深度可分离卷积:将通道混合与空间混合分离
- 池化操作:主动降低空间分辨率
2. 通道冗余利用
- 倒置残差:先压缩后扩展,减少中间表示
- 逐点卷积: 卷积高效融合通道
3. 计算-精度权衡
- 复合缩放:深度/宽度/分辨率联合优化
- 渐进式学习:训练时使用较小分辨率
4. 硬件协同设计
- Mobile MQA:针对移动加速器优化
- NAS驱动的搜索:平台感知设计
PyTorch实现示例
import torch
import torch.nn as nn
class MobileNetV3Like(nn.Module):
"""简化的MobileNetV3风格网络"""
def __init__(self, num_classes=1000):
super().__init__()
# 初始卷积
self.stem = nn.Sequential(
nn.Conv2d(3, 16, 3, 2, 1, bias=False),
nn.BatchNorm2d(16),
nn.Hardswish(inplace=True)
)
# 中间阶段
self.stages = nn.ModuleList([
self._make_stage(16, 16, 1, 1, 'h-swish'),
self._make_stage(16, 24, 6, 2, 'h-swish'),
self._make_stage(24, 40, 6, 2, 'relu'),
self._make_stage(40, 80, 6, 2, 'h-swish'),
self._make_stage(80, 112, 6, 1, 'h-swish'),
self._make_stage(112, 160, 6, 2, 'h-swish'),
])
# 头部
self.head = nn.Sequential(
nn.Conv2d(160, 960, 1, bias=False),
nn.BatchNorm2d(960),
nn.Hardswish(inplace=True),
nn.AdaptiveAvgPool2d(1),
nn.Conv2d(960, 1280, 1),
nn.Hardswish(inplace=True),
nn.Conv2d(1280, num_classes, 1)
)
def _make_stage(self, in_ch, out_ch, exp_ratio, stride, act):
mid_ch = in_ch * exp_ratio
layers = [
nn.Conv2d(in_ch, mid_ch, 1, bias=False),
nn.BatchNorm2d(mid_ch),
nn.ReLU() if act == 'relu' else nn.Hardswish(inplace=True),
nn.Conv2d(mid_ch, mid_ch, 3, stride, 1,
groups=mid_ch, bias=False),
nn.BatchNorm2d(mid_ch),
nn.ReLU() if act == 'relu' else nn.Hardswish(inplace=True),
nn.Conv2d(mid_ch, out_ch, 1, bias=False),
nn.BatchNorm2d(out_ch)
]
return nn.Sequential(*layers)
def forward(self, x):
x = self.stem(x)
for stage in self.stages:
x = stage(x)
x = self.head(x)
return x.flatten(1)