对抗训练
引言
**对抗训练(Adversarial Training)**是提升神经网络对抗鲁棒性的最有效方法之一。其核心思想是在训练过程中注入对抗样本,使模型学会抵抗对抗扰动。1
对抗训练的形式化
鲁棒优化视角
Madry等人(2018)将对抗训练形式化为一个min-max优化问题:1
其中:
- 内层最大化:找到最强对抗扰动
- 外层最小化:训练鲁棒模型参数
与标准训练对比
| 训练方式 | 目标 | 优化问题 |
|---|---|---|
| 标准训练 | 最小化标准损失 | |
| 对抗训练 | 最小化对抗损失 |
PGD对抗训练
标准PGD-AT
def pgd_adversarial_training(model, optimizer, x, y,
epsilon=0.03, alpha=0.003, steps=7):
"""
PGD-based Adversarial Training.
Args:
epsilon: 扰动上界
alpha: 步长
steps: PGD迭代步数
"""
model.train()
# 生成对抗样本
x_adv = x.clone().detach()
x_adv = x_adv + torch.empty_like(x_adv).uniform_(-epsilon, epsilon)
x_adv = torch.clamp(x_adv, 0, 1)
for _ in range(steps):
x_adv.requires_grad = True
output = model(x_adv)
loss = F.cross_entropy(output, y)
optimizer.zero_grad()
loss.backward()
with torch.no_grad():
x_adv = x_adv + alpha * x_adv.grad.sign()
x_adv = torch.maximum(x_adv, x - epsilon)
x_adv = torch.minimum(x_adv, x + epsilon)
x_adv = torch.clamp(x_adv, 0, 1)
# 使用对抗样本训练
model.train()
output = model(x_adv.detach())
loss_adv = F.cross_entropy(output, y)
optimizer.zero_grad()
loss_adv.backward()
optimizer.step()
return loss_adv.item()完整训练循环
def train_adversarial(model, train_loader, optimizer,
epsilon=8/255, alpha=2/255, steps=7, epochs=100):
"""完整的对抗训练流程。"""
for epoch in range(epochs):
for x, y in train_loader:
x, y = x.cuda(), y.cuda()
# PGD对抗样本生成
x_adv = pgd_attack(model, x, y, epsilon, alpha, steps)
# 对抗训练
optimizer.zero_grad()
output = model(x_adv)
loss = F.cross_entropy(output, y)
loss.backward()
optimizer.step()TRADES方法
Zhang等人(2019)提出的TRADES在标准准确率和鲁棒性之间取得更好平衡:2
目标函数
其中 是第 个对抗样本。
TRADES的理论动机
TRADES通过KL散度正则化,强制模型在对抗扰动下保持与原始样本相似的预测分布:
“The robustness can be achieved by forcing the model to output similar predictions for nearby points.”
TRADES实现
def trades_loss(model, x, y, x_adv, beta=6.0):
"""
TRADES loss function.
TRADES = TRADE-off between Accuracy and Robustness
"""
# 标准交叉熵损失
loss_clean = F.cross_entropy(model(x), y)
# KL散度正则化
loss_robust = 0
for x_p in x_adv:
loss_robust += F.kl_div(
model(x).log_softmax(dim=1),
model(x_p).softmax(dim=1),
reduction='batchmean'
)
# 组合损失
return loss_clean + beta * loss_robustMART方法
Wang等人(2019)提出Modular Adversarial Training (MART),强调不同样本的不同作用:3
其中 是对抗样本的概率估计。
对抗训练的技术细节
1. 随机初始化
def random_start_pgd(x, epsilon):
"""随机初始化以避免对抗样本的平凡性。"""
x_adv = x + torch.empty_like(x).uniform_(-epsilon, epsilon)
return torch.clamp(x_adv, 0, 1)2. Early Stopping
对于每个样本,只在成功攻击后才更新:
if model(x_adv).argmax() != y: # 攻击成功
# 更新模型
loss.backward()3. 课程对抗训练
逐步增加扰动强度:
epsilon_schedule = [0.01, 0.02, 0.03, 0.05] # 课程
for epoch, eps in enumerate(epsilon_schedule):
# 使用当前epsilon训练
adversarial_train(model, ..., epsilon=eps)4. Label Smoothing
结合标签平滑提升泛化:
def smoothed_loss(logits, y, smoothing=0.1):
n_classes = logits.size(1)
y_smooth = y * (1 - smoothing) + smoothing / n_classes
return F.cross_entropy(logits, y_smooth)最新进展
1. AWP(Adversarial Weight Perturbation)
对抗权重扰动同时正则化权重和输入:4
def awp_loss(model, x, y, epsilon_w=1e-2, epsilon=8/255):
"""Adversarial Weight Perturbation."""
# 第一步:在权重空间扰动
params = get_params(model)
params_adv = [p + epsilon_w * p.grad.sign() for p in params]
set_params(model, params_adv)
# 第二步:在输入空间攻击
x_adv = pgd_attack(model, x, y, epsilon)
# 第三步:恢复权重,计算损失
output = model(x_adv)
loss = F.cross_entropy(output, y)
# 恢复权重
set_params(model, params)
return loss2. Fast is Better than Free(FIB)
通过重参数化和数据增强减少对抗训练开销:5
3. AT-PR(Adversarial Training for Probabilistic Robustness)
ICCV 2025提出的新方法,专门优化概率鲁棒性:6
对抗训练的挑战
1. 计算开销
对抗训练需要在内层循环生成对抗样本,计算量是标准训练的3-10倍。
解决方案:
- 使用少量PGD步数(如7步)
- 提前终止攻击(Early Stopping)
- 分布式训练
2. 过拟合问题
对抗训练容易过拟合到特定攻击:
# 使用Clean Finetuning缓解过拟合
def clean_finetune(model, epochs=10, lr=1e-5):
"""在干净数据上微调以缓解过拟合。"""
for epoch in range(epochs):
for x, y in clean_loader:
output = model(x)
loss = F.cross_entropy(output, y)
loss.backward()3. 泛化-鲁棒性权衡
标准准确率和鲁棒性之间存在基本张力:7
| 指标 | 标准训练 | 对抗训练 |
|---|---|---|
| Clean Accuracy | 高 | 略低 |
| Robust Accuracy | 低 | 高 |
| Standard-FRobust Gap | 小 | 大 |
实验对比
使用CIFAR-10数据集()的典型结果:
| 方法 | Clean Acc | Robust Acc |
|---|---|---|
| Standard | 95.0% | 0.0% |
| PGD-AT | 87.3% | 56.1% |
| TRADES | 88.6% | 55.8% |
| MART | 88.1% | 57.0% |
| AWP | 89.2% | 57.8% |
本章小结
对抗训练是对抗鲁棒性的核心方法:
- 形式化:min-max优化问题
- PGD-AT:标准对抗训练方法
- TRADES:准确率-鲁棒性平衡
- MART:模块化对抗训练
- 技术细节:随机初始化、课程学习、早停
- 最新进展:AWP、FIB、AT-PR
对抗训练虽然有效,但计算开销大。下一章将介绍认证鲁棒性方法,提供可证明的鲁棒性保证。
参考文献
Footnotes
-
Madry, A., et al. (2018). Towards Deep Learning Models Resistant to Adversarial Attacks. ICLR 2018. ↩ ↩2
-
Zhang, H., et al. (2019). Theoretically Principled Trade-off between Robustness and Accuracy. ICLR 2019. ↩
-
Wang, Y., et al. (2019). Improving Adversarial Robustness via Modularity. arXiv:1911.09074. ↩
-
Wu, D., et al. (2020). Adversarial Weight Perturbation Helps Robust Generalization. NeurIPS 2020. ↩
-
Rebuffi, S. A., et al. (2021). Fixing Data Resolution to Fix Image Recognition’s Fragility. arXiv. ↩
-
Zhang, Y., et al. (2025). Adversarial Training for Probabilistic Robustness. ICCV 2025. ↩
-
Tsipras, D., et al. (2019). Robustness May Be at Odds with Accuracy. ICLR 2019. ↩