PGD攻击详解
概述
投影梯度下降(Projected Gradient Descent, PGD)攻击是深度学习对抗攻击领域最重要的方法之一。由 Madry 等人在 2018 年提出,PGD 被证明是 约束下最强的一阶攻击,并且成为了评估模型鲁棒性的标准基准。1
数学框架
问题形式化
PGD 求解以下约束优化问题:
其中扰动可行域为:
迭代优化
PGD 通过以下迭代步骤求解:
其中 是到可行域 的投影算子。
投影操作( 范数)的显式形式:
算法实现
基础 PGD 实现
import torch
import torch.nn.functional as F
def pgd_attack(
model,
x,
y,
epsilon=8/255,
alpha=2/255,
num_iter=10,
random_start=True,
loss_fn=None
):
"""
PGD Attack Implementation
Args:
model: Target model
x: Input images [B, C, H, W]
y: True labels [B]
epsilon: Maximum perturbation (in [0,1] scale)
alpha: Step size per iteration
num_iter: Number of iterations
random_start: Random initialization in epsilon-ball
loss_fn: Loss function (default: cross-entropy)
Returns:
Adversarial examples
"""
if loss_fn is None:
loss_fn = F.cross_entropy
# 初始化
x_adv = x.clone().detach()
if random_start:
# 在 epsilon 球内随机初始化
x_adv = x + torch.empty_like(x).uniform_(-epsilon, epsilon)
x_adv = torch.clamp(x_adv, 0, 1)
for _ in range(num_iter):
x_adv.requires_grad = True
# 前向传播
output = model(x_adv)
loss = loss_fn(output, y)
# 反向传播
model.zero_grad()
loss.backward()
# 梯度上升(攻击者目标)
grad = x_adv.grad.data
x_adv = x_adv.detach() + alpha * grad.sign()
# 投影回可行域
delta = torch.clamp(x_adv - x, -epsilon, epsilon)
x_adv = torch.clamp(x + delta, 0, 1)
return x_adv通用投影算子
def project_linf(x, x_orig, epsilon):
"""L-inf 投影"""
return torch.clamp(x, x_orig - epsilon, x_orig + epsilon)
def project_l2(x, x_orig, epsilon):
"""L2 投影"""
delta = x - x_orig
delta_norm = delta.view(delta.size(0), -1).norm(dim=1, keepdim=True)
scale = torch.clamp(delta_norm / epsilon, min=1.0)
return x_orig + delta / (scale.unsqueeze(-1).unsqueeze(-1).unsqueeze(-1))
def project_l0(x, x_orig, epsilon):
"""L0 投影:保留最大的 epsilon 个像素"""
delta = x - x_orig
flat_delta = delta.view(delta.size(0), -1)
k = int(epsilon)
# 找最小的 k 个位置
_, indices = flat_delta.abs().topk(dim=1, k=flat_delta.size(1)-k, largest=True)
mask = torch.ones_like(flat_delta)
mask.scatter_(1, indices, 0)
mask = mask.view_as(delta)
return x_orig + (delta * mask)PGD 的理论性质
最强一阶攻击
定理(Madry et al., 2018):
在 约束下,如果模型对 PGD 攻击鲁棒,则该模型对所有的一阶攻击方法鲁棒。
证明思路:
- PGD 求解的是非凸 max-min 问题的局部最优
- 任何一阶方法都无法逃离 PGD 探索的扰动区域
- 因此 PGD 提供了一个上界
收敛性分析
PGD 的收敛性由以下不等式刻画:
当 足够小时,损失单调递增(最大化目标)。
随机起始的重要性
def pgd_with_multiple_starts(model, x, y, num_restarts=10, **kwargs):
"""
多起始点 PGD
"""
best_adv = None
best_loss = 0
for _ in range(num_restarts):
adv = pgd_attack(model, x, y, random_start=True, **kwargs)
with torch.no_grad():
output = model(adv)
loss = F.cross_entropy(output, y)
# 找到攻击成功的样本
preds = output.argmax(dim=1)
success = (preds != y)
if success.sum() > best_loss:
best_loss = success.sum().item()
best_adv = adv
return best_adv随机起始确保:
- 避免陷入局部最优
- 提高攻击成功率
- 覆盖更大的扰动空间
PGD 变体
1. PGD with Target Loss
定向攻击版本,使用 DLR(Differential Logit Pairing)损失:
def pgd_targeted(model, x, y_target, epsilon=8/255, alpha=2/255, num_iter=10):
"""定向 PGD 攻击"""
x_adv = x.clone().detach()
for _ in range(num_iter):
x_adv.requires_grad = True
output = model(x_adv)
# DLR 损失
loss = F.cross_entropy(output, y_target)
model.zero_grad()
loss.backward()
# 注意:这里是减号,因为要最小化目标类损失
x_adv = x_adv.detach() - alpha * x_adv.grad.sign()
x_adv = torch.clamp(x_adv, x - epsilon, x + epsilon)
x_adv = torch.clamp(x_adv, 0, 1)
return x_adv2. PGD with Momentum
def pgd_momentum(model, x, y, epsilon=8/255, alpha=2/255, num_iter=10, decay=1.0):
"""带动量的 PGD"""
x_adv = x.clone().detach()
momentum = torch.zeros_like(x)
for _ in range(num_iter):
x_adv.requires_grad = True
output = model(x_adv)
loss = F.cross_entropy(output, y)
model.zero_grad()
loss.backward()
# 更新动量
grad = x_adv.grad.data
momentum = decay * momentum + grad / (grad.abs().mean() + 1e-10)
# 沿动量方向更新
x_adv = x_adv.detach() + alpha * momentum.sign()
x_adv = torch.clamp(x_adv, x - epsilon, x + epsilon)
x_adv = torch.clamp(x_adv, 0, 1)
return x_adv3. PGD with Input Diversity
def pgd_with_diversity(model, x, y, epsilon=8/255, alpha=2/255, num_iter=10, prob=0.5):
"""带输入多样性的 PGD"""
x_adv = x.clone().detach()
for _ in range(num_iter):
x_adv.requires_grad = True
# 随机调整大小和填充
if torch.rand(1) < prob:
x_adv = F.interpolate(x_adv, scale_factor=0.9)
padding = int(224 * 0.1)
x_adv = F.pad(x_adv, [padding]*4, mode='constant', value=0.5)
output = model(x_adv)
loss = F.cross_entropy(output, y)
model.zero_grad()
loss.backward()
x_adv = x_adv + alpha * x_adv.grad.sign()
x_adv = torch.clamp(x_adv, x - epsilon, x + epsilon)
x_adv = torch.clamp(x_adv, 0, 1)
return x_adv步长选择策略
自适应步长
def pgd_adaptive(model, x, y, epsilon=8/255, num_iter=40):
"""
自适应步长 PGD
初期大步探索,后期小步精修
"""
x_adv = x.clone().detach()
for i in range(num_iter):
# 线性衰减的步长
alpha = (epsilon / 4) * (1 - i / num_iter) + epsilon / (4 * num_iter)
x_adv.requires_grad = True
output = model(x_adv)
loss = F.cross_entropy(output, y)
model.zero_grad()
loss.backward()
x_adv = x_adv.detach() + alpha * x_adv.grad.sign()
x_adv = torch.clamp(x_adv, x - epsilon, x + epsilon)
x_adv = torch.clamp(x_adv, 0, 1)
return x_adv二阶统计量增强(S2O)
最新的研究表明,使用权重二阶统计量可以增强对抗训练的鲁棒性:2
def s2o_attack(model, x, y, epsilon=8/255, alpha=2/255, num_iter=10):
"""
Second-Order Statistics Enhanced PGD
"""
x_adv = x.clone().detach()
momentum = torch.zeros_like(x)
for _ in range(num_iter):
x_adv.requires_grad = True
output = model(x_adv)
loss = F.cross_entropy(output, y)
model.zero_grad()
loss.backward()
grad = x_adv.grad.data
# 计算权重二阶统计量
weight_grad_norm = 0
for name, param in model.named_parameters():
if 'weight' in name and param.grad is not None:
weight_grad_norm += (param.grad ** 2).sum()
# 调整梯度
grad = grad / (grad.abs().mean() + 1e-10)
momentum = 0.9 * momentum + grad
x_adv = x_adv.detach() + alpha * momentum.sign()
x_adv = torch.clamp(x_adv, x - epsilon, x + epsilon)
x_adv = torch.clamp(x_adv, 0, 1)
return x_adv实践指南
参数推荐
| 参数 | MNIST | CIFAR-10 | ImageNet |
|---|---|---|---|
| 0.3 | 8/255 | 4/255 或 8/255 | |
| 迭代次数 | 40-100 | 10-40 | 10-50 |
| 随机起始 | 是 | 是 | 是 |
鲁棒性评估
def evaluate_robustness(model, loader, epsilon=8/255):
"""评估模型鲁棒性"""
clean_acc = 0
robust_acc = 0
total = 0
for x, y in loader:
x, y = x.to(device), y.to(device)
# 干净准确率
with torch.no_grad():
clean_pred = model(x).argmax(dim=1)
clean_acc += (clean_pred == y).sum().item()
# 生成对抗样本
x_adv = pgd_attack(model, x, y, epsilon=epsilon)
# 鲁棒准确率
with torch.no_grad():
robust_pred = model(x_adv).argmax(dim=1)
robust_acc += (robust_pred == y).sum().item()
total += x.size(0)
return clean_acc / total, robust_acc / total与其他攻击的关系
- FGSM: PGD 的单步版本()
- BIM: PGD 的简化版本(无随机起始)
- MI-FGSM: PGD 添加动量
- C&W: 优化-based 方法,无显式 约束
- DeepFool: 基于决策边界的几何攻击
相关主题
- adversarial-attack-methods — 对抗攻击方法综述
- adversarial-training-methods — 对抗训练防御
- certified-robustness-theory — 认证鲁棒性理论
参考文献
Footnotes
-
Madry, A., et al. (2018). Towards Deep Learning Models Resistant to Adversarial Attacks. ICLR 2018. https://arxiv.org/abs/1706.06083 ↩
-
Chen, Y., et al. (2026). S2O: Enhancing Adversarial Training with Second-Order Statistics of Weights. arXiv:2603.01264. https://arxiv.org/abs/2603.01264 ↩