PGD攻击详解

概述

投影梯度下降(Projected Gradient Descent, PGD)攻击是深度学习对抗攻击领域最重要的方法之一。由 Madry 等人在 2018 年提出,PGD 被证明是 约束下最强的一阶攻击,并且成为了评估模型鲁棒性的标准基准。1

数学框架

问题形式化

PGD 求解以下约束优化问题:

其中扰动可行域为:

迭代优化

PGD 通过以下迭代步骤求解:

其中 是到可行域 的投影算子。

投影操作 范数)的显式形式:

算法实现

基础 PGD 实现

import torch
import torch.nn.functional as F
 
def pgd_attack(
    model,
    x,
    y,
    epsilon=8/255,
    alpha=2/255,
    num_iter=10,
    random_start=True,
    loss_fn=None
):
    """
    PGD Attack Implementation
    
    Args:
        model: Target model
        x: Input images [B, C, H, W]
        y: True labels [B]
        epsilon: Maximum perturbation (in [0,1] scale)
        alpha: Step size per iteration
        num_iter: Number of iterations
        random_start: Random initialization in epsilon-ball
        loss_fn: Loss function (default: cross-entropy)
    
    Returns:
        Adversarial examples
    """
    if loss_fn is None:
        loss_fn = F.cross_entropy
    
    # 初始化
    x_adv = x.clone().detach()
    
    if random_start:
        # 在 epsilon 球内随机初始化
        x_adv = x + torch.empty_like(x).uniform_(-epsilon, epsilon)
        x_adv = torch.clamp(x_adv, 0, 1)
    
    for _ in range(num_iter):
        x_adv.requires_grad = True
        
        # 前向传播
        output = model(x_adv)
        loss = loss_fn(output, y)
        
        # 反向传播
        model.zero_grad()
        loss.backward()
        
        # 梯度上升(攻击者目标)
        grad = x_adv.grad.data
        x_adv = x_adv.detach() + alpha * grad.sign()
        
        # 投影回可行域
        delta = torch.clamp(x_adv - x, -epsilon, epsilon)
        x_adv = torch.clamp(x + delta, 0, 1)
    
    return x_adv

通用投影算子

def project_linf(x, x_orig, epsilon):
    """L-inf 投影"""
    return torch.clamp(x, x_orig - epsilon, x_orig + epsilon)
 
def project_l2(x, x_orig, epsilon):
    """L2 投影"""
    delta = x - x_orig
    delta_norm = delta.view(delta.size(0), -1).norm(dim=1, keepdim=True)
    scale = torch.clamp(delta_norm / epsilon, min=1.0)
    return x_orig + delta / (scale.unsqueeze(-1).unsqueeze(-1).unsqueeze(-1))
 
def project_l0(x, x_orig, epsilon):
    """L0 投影:保留最大的 epsilon 个像素"""
    delta = x - x_orig
    flat_delta = delta.view(delta.size(0), -1)
    k = int(epsilon)
    
    # 找最小的 k 个位置
    _, indices = flat_delta.abs().topk(dim=1, k=flat_delta.size(1)-k, largest=True)
    mask = torch.ones_like(flat_delta)
    mask.scatter_(1, indices, 0)
    mask = mask.view_as(delta)
    
    return x_orig + (delta * mask)

PGD 的理论性质

最强一阶攻击

定理(Madry et al., 2018)

约束下,如果模型对 PGD 攻击鲁棒,则该模型对所有的一阶攻击方法鲁棒。

证明思路

  1. PGD 求解的是非凸 max-min 问题的局部最优
  2. 任何一阶方法都无法逃离 PGD 探索的扰动区域
  3. 因此 PGD 提供了一个上界

收敛性分析

PGD 的收敛性由以下不等式刻画:

足够小时,损失单调递增(最大化目标)。

随机起始的重要性

def pgd_with_multiple_starts(model, x, y, num_restarts=10, **kwargs):
    """
    多起始点 PGD
    """
    best_adv = None
    best_loss = 0
    
    for _ in range(num_restarts):
        adv = pgd_attack(model, x, y, random_start=True, **kwargs)
        
        with torch.no_grad():
            output = model(adv)
            loss = F.cross_entropy(output, y)
            
            # 找到攻击成功的样本
            preds = output.argmax(dim=1)
            success = (preds != y)
            
            if success.sum() > best_loss:
                best_loss = success.sum().item()
                best_adv = adv
    
    return best_adv

随机起始确保:

  • 避免陷入局部最优
  • 提高攻击成功率
  • 覆盖更大的扰动空间

PGD 变体

1. PGD with Target Loss

定向攻击版本,使用 DLR(Differential Logit Pairing)损失:

def pgd_targeted(model, x, y_target, epsilon=8/255, alpha=2/255, num_iter=10):
    """定向 PGD 攻击"""
    x_adv = x.clone().detach()
    
    for _ in range(num_iter):
        x_adv.requires_grad = True
        output = model(x_adv)
        
        # DLR 损失
        loss = F.cross_entropy(output, y_target)
        
        model.zero_grad()
        loss.backward()
        
        # 注意:这里是减号,因为要最小化目标类损失
        x_adv = x_adv.detach() - alpha * x_adv.grad.sign()
        x_adv = torch.clamp(x_adv, x - epsilon, x + epsilon)
        x_adv = torch.clamp(x_adv, 0, 1)
    
    return x_adv

2. PGD with Momentum

def pgd_momentum(model, x, y, epsilon=8/255, alpha=2/255, num_iter=10, decay=1.0):
    """带动量的 PGD"""
    x_adv = x.clone().detach()
    momentum = torch.zeros_like(x)
    
    for _ in range(num_iter):
        x_adv.requires_grad = True
        output = model(x_adv)
        loss = F.cross_entropy(output, y)
        
        model.zero_grad()
        loss.backward()
        
        # 更新动量
        grad = x_adv.grad.data
        momentum = decay * momentum + grad / (grad.abs().mean() + 1e-10)
        
        # 沿动量方向更新
        x_adv = x_adv.detach() + alpha * momentum.sign()
        x_adv = torch.clamp(x_adv, x - epsilon, x + epsilon)
        x_adv = torch.clamp(x_adv, 0, 1)
    
    return x_adv

3. PGD with Input Diversity

def pgd_with_diversity(model, x, y, epsilon=8/255, alpha=2/255, num_iter=10, prob=0.5):
    """带输入多样性的 PGD"""
    x_adv = x.clone().detach()
    
    for _ in range(num_iter):
        x_adv.requires_grad = True
        
        # 随机调整大小和填充
        if torch.rand(1) < prob:
            x_adv = F.interpolate(x_adv, scale_factor=0.9)
            padding = int(224 * 0.1)
            x_adv = F.pad(x_adv, [padding]*4, mode='constant', value=0.5)
        
        output = model(x_adv)
        loss = F.cross_entropy(output, y)
        
        model.zero_grad()
        loss.backward()
        
        x_adv = x_adv + alpha * x_adv.grad.sign()
        x_adv = torch.clamp(x_adv, x - epsilon, x + epsilon)
        x_adv = torch.clamp(x_adv, 0, 1)
    
    return x_adv

步长选择策略

自适应步长

def pgd_adaptive(model, x, y, epsilon=8/255, num_iter=40):
    """
    自适应步长 PGD
    初期大步探索,后期小步精修
    """
    x_adv = x.clone().detach()
    
    for i in range(num_iter):
        # 线性衰减的步长
        alpha = (epsilon / 4) * (1 - i / num_iter) + epsilon / (4 * num_iter)
        
        x_adv.requires_grad = True
        output = model(x_adv)
        loss = F.cross_entropy(output, y)
        
        model.zero_grad()
        loss.backward()
        
        x_adv = x_adv.detach() + alpha * x_adv.grad.sign()
        x_adv = torch.clamp(x_adv, x - epsilon, x + epsilon)
        x_adv = torch.clamp(x_adv, 0, 1)
    
    return x_adv

二阶统计量增强(S2O)

最新的研究表明,使用权重二阶统计量可以增强对抗训练的鲁棒性:2

def s2o_attack(model, x, y, epsilon=8/255, alpha=2/255, num_iter=10):
    """
    Second-Order Statistics Enhanced PGD
    """
    x_adv = x.clone().detach()
    momentum = torch.zeros_like(x)
    
    for _ in range(num_iter):
        x_adv.requires_grad = True
        output = model(x_adv)
        loss = F.cross_entropy(output, y)
        
        model.zero_grad()
        loss.backward()
        grad = x_adv.grad.data
        
        # 计算权重二阶统计量
        weight_grad_norm = 0
        for name, param in model.named_parameters():
            if 'weight' in name and param.grad is not None:
                weight_grad_norm += (param.grad ** 2).sum()
        
        # 调整梯度
        grad = grad / (grad.abs().mean() + 1e-10)
        momentum = 0.9 * momentum + grad
        
        x_adv = x_adv.detach() + alpha * momentum.sign()
        x_adv = torch.clamp(x_adv, x - epsilon, x + epsilon)
        x_adv = torch.clamp(x_adv, 0, 1)
    
    return x_adv

实践指南

参数推荐

参数MNISTCIFAR-10ImageNet
0.38/2554/255 或 8/255
迭代次数40-10010-4010-50
随机起始

鲁棒性评估

def evaluate_robustness(model, loader, epsilon=8/255):
    """评估模型鲁棒性"""
    clean_acc = 0
    robust_acc = 0
    total = 0
    
    for x, y in loader:
        x, y = x.to(device), y.to(device)
        
        # 干净准确率
        with torch.no_grad():
            clean_pred = model(x).argmax(dim=1)
            clean_acc += (clean_pred == y).sum().item()
        
        # 生成对抗样本
        x_adv = pgd_attack(model, x, y, epsilon=epsilon)
        
        # 鲁棒准确率
        with torch.no_grad():
            robust_pred = model(x_adv).argmax(dim=1)
            robust_acc += (robust_pred == y).sum().item()
        
        total += x.size(0)
    
    return clean_acc / total, robust_acc / total

与其他攻击的关系

  • FGSM: PGD 的单步版本(
  • BIM: PGD 的简化版本(无随机起始)
  • MI-FGSM: PGD 添加动量
  • C&W: 优化-based 方法,无显式 约束
  • DeepFool: 基于决策边界的几何攻击

相关主题


参考文献

Footnotes

  1. Madry, A., et al. (2018). Towards Deep Learning Models Resistant to Adversarial Attacks. ICLR 2018. https://arxiv.org/abs/1706.06083

  2. Chen, Y., et al. (2026). S2O: Enhancing Adversarial Training with Second-Order Statistics of Weights. arXiv:2603.01264. https://arxiv.org/abs/2603.01264