对抗攻击方法综述

概述

对抗攻击（Adversarial Attack）旨在生成能够欺骗深度学习模型的输入扰动。根据攻击者的知识和能力，可分为白盒攻击、黑盒攻击和迁移攻击。本综述系统介绍主流的对抗攻击方法，包括一阶方法、迭代方法和优化-based方法。

一阶攻击方法

FGSM（Fast Gradient Sign Method）

FGSM 是最早也是最简洁有效的对抗攻击方法，由 Goodfellow 等人于 2015 年提出。¹

核心思想：利用损失函数的梯度信息，沿梯度符号方向做一步大幅扰动。

\tilde{x} = x + ϵ \cdot sign (\nabla_{x} L (x, y))

def fgsm(image, label, model, epsilon=8/255):
    """
    Fast Gradient Sign Method
    
    Args:
        image: 输入图像 [B, C, H, W]
        label: 真实标签
        model: 目标模型
        epsilon: 扰动幅度 (归一化到[0,1])
    """
    image.requires_grad = True
    output = model(image)
    loss = F.cross_entropy(output, label)
    
    # 计算梯度
    model.zero_grad()
    loss.backward()
    grad = image.grad.data
    
    # 生成对抗样本
    perturbation = epsilon * grad.sign()
    adversarial = (image + perturbation).clamp(0, 1)
    
    return adversarial

特点：

计算速度快（一步梯度）
攻击效果显著
是其他攻击方法的基线

FGM（Fast Gradient Method）

FGM 是 FGSM 的 $L_{2}$ 范数版本：

\tilde{x} = x + ϵ \cdot \frac{\nabla _{x} L}{∥ \nabla _{x} L ∥ _{2}}

def fgm(image, label, model, epsilon=8/255):
    image.requires_grad = True
    output = model(image)
    loss = F.cross_entropy(output, label)
    
    model.zero_grad()
    loss.backward()
    grad = image.grad.data
    
    # L2归一化扰动
    perturbation = epsilon * grad / (grad.norm(dim=(1,2,3), keepdim=True) + 1e-10)
    adversarial = (image + perturbation).clamp(0, 1)
    
    return adversarial

R-FGSM（Randomized FGSM）

R-FGSM 在 FGSM 之前添加随机扰动，提高攻击成功率：

\tilde{x} = x + α \cdot sign (N (0, I)) + ϵ \cdot sign (\nabla_{x} L)

其中 $α < ϵ$ ，确保总扰动不超过 $ϵ$ 。

迭代攻击方法

PGD（Projected Gradient Descent）

PGD 是 FGSM 的迭代版本，是最强的 $L_{\infty}$ 攻击之一。²

def pgd_attack(image, label, model, epsilon=8/255, alpha=2/255, iterations=10):
    """
    Projected Gradient Descent Attack
    
    Args:
        alpha: 每步扰动幅度
        iterations: 迭代次数
    """
    x_adv = image.clone()
    
    # 初始化：在epsilon球内随机一点
    x_adv = x_adv + torch.empty_like(x_adv).uniform_(-epsilon, epsilon)
    x_adv = torch.clamp(x_adv, 0, 1)
    
    for _ in range(iterations):
        x_adv.requires_grad = True
        output = model(x_adv)
        loss = F.cross_entropy(output, label)
        
        model.zero_grad()
        loss.backward()
        
        # 梯度上升（攻击者目标）
        x_adv = x_adv.detach() + alpha * x_adv.grad.sign()
        
        # 投影回epsilon球
        delta = torch.clamp(x_adv - image, -epsilon, epsilon)
        x_adv = torch.clamp(image + delta, 0, 1)
    
    return x_adv

PGD 的理论保证：

PGD 是 $L_{\infty}$ 约束下的一阶最强攻击
如果模型对 PGD 鲁棒，则对所有一阶攻击鲁棒
随机起始点确保攻击的全面性

BIM（Basic Iterative Method）

BIM 是 PGD 的简化版本（无随机初始化）：

x_{t + 1} = Clip_{ϵ} (x_{t} + α \cdot sign (\nabla_{x} L))

MI-FGSM（Momentum Iterative FGSM）

添加动量项稳定梯度方向：³

g_{t + 1} = μ \cdot g_{t} + \frac{\nabla _{x} L ( x _{t} , y )}{∥ \nabla _{x} L ( x _{t} , y ) ∥ _{1}}

x_{t + 1} = x_{t} + α \cdot sign (g_{t + 1})

EOT（Expectation over Transformation）

EOT 针对物理世界攻击，对抗扰动在变换分布上优化：

def eot_attack(image, target_label, model, num_samples=100):
    """EOT攻击框架"""
    delta = torch.zeros_like(image, requires_grad=True)
    optimizer = torch.optim.Adam([delta], lr=0.01)
    
    transforms = [
        lambda x: F.interpolate(x, scale_factor=0.9),
        lambda x: F.interpolate(x, scale_factor=1.1),
        lambda x: torch.rot90(x, k=1, dims=(2,3)),
        # 更多变换...
    ]
    
    for _ in range(1000):
        # 采样变换
        sampled_transforms = random.choices(transforms, k=num_samples)
        
        # 对多种变换取期望梯度
        grads = []
        for T in sampled_transforms:
            x_transformed = T(image + delta)
            loss = F.cross_entropy(model(x_transformed), target_label)
            model.zero_grad()
            loss.backward()
            grads.append(x_transformed.grad)
        
        # 平均梯度
        avg_grad = torch.stack(grads).mean(dim=0)
        
        optimizer.zero_grad()
        delta.grad = avg_grad
        optimizer.step()
        
        # 投影到可行域
        delta.data = torch.clamp(delta.data, -epsilon, epsilon)
    
    return image + delta

优化-based 攻击

C&W 攻击（Carlini & Wagner）

C&W 攻击将攻击问题形式化为优化问题：

δ min ∥ δ ∥_{p} + c \cdot max (Z (x + δ)_{y} - i \neq = y max Z (x + δ)_{i}, - κ)

其中：

$Z (\cdot)$ 是模型的 logits 输出
$κ$ 是置信度参数
$c$ 是平衡系数

def cw_attack(image, label, model, target=None, c=1.0, kappa=0, max_iter=1000):
    """
    Carlini & Wagner L2 Attack
    """
    # 变量替换：delta = (tanh(w) + 1) / 2 - x
    def to_image(w, x):
        return (torch.tanh(w) + 1) / 2 * (x.max() - x.min()) + x.min()
    
    w = torch.randn_like(image) * 0.01
    w.requires_grad = True
    optimizer = torch.optim.Adam([w], lr=0.01)
    
    for _ in range(max_iter):
        adv_image = to_image(w, image)
        
        # 计算 logits
        logits = model(adv_image)
        
        if target is None:
            # 非定向攻击
            target = label
            loss1 = F.cross_entropy(logits, target)
            loss2 = ((adv_image - image) ** 2).sum()
        else:
            # 定向攻击
            target_logit = logits[0, target]
            max_nontarget = (logits[0, :] - target_logit).clamp(min=-kappa).max()
            loss1 = (-max_nontarget).clamp(min=0)
            loss2 = ((adv_image - image) ** 2).sum()
        
        loss = loss1 + c * loss2
        
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()
    
    return to_image(w, image)

C&W 攻击的特点：

对多种 $L_{p}$ 范数（ $L_{2}$ , $L_{\infty}$ , $L_{0}$ ）都有效
能绕过部分防御方法
计算开销较大

EoT 攻击变体

结合 EOT 框架和优化方法：

δ^{*} = ar g δ min E_{t \sim T, m \sim M} [∥ \nabla_{x} L (x ⊙ t + δ ⊙ m, y) ∥]

DeepFool 攻击

DeepFool 通过迭代线性化决策边界找到最小扰动：⁴

def deepfool(image, model, num_classes=10, max_iter=50, overshoot=0.02):
    """
    DeepFool L2 Attack
    """
    image = image.clone().detach().to(device)
    perturbed = image.clone()
    
    for iteration in range(max_iter):
        perturbed.requires_grad = True
        output = model(perturbed)
        
        # 找到非真实类别的 logits
        current_pred = output.argmax(dim=1)
        if current_pred == true_label:
            # 找到最小梯度方向
            grads = []
            for c in range(num_classes):
                if c != true_label:
                    model.zero_grad()
                    output[0, c].backward(retain_graph=True)
                    grads.append(perturbed.grad.data.flatten())
            
            # 计算到每类边界的距离
            w = torch.stack(grads)  # [num_classes-1, dim]
            f = output[0, true_label] - output[0, 1:]  # [num_classes-1]
            
            # 最小范数扰动方向
            perturbation = (w @ f) / (w @ w.T + 1e-10)
            
            # 归一化并加上overshoot
            perturbation_norm = perturbation.norm() + 1e-10
            perturbation = perturbation / perturbation_norm * (perturbation_norm + overshoot)
            
            perturbed = perturbed.detach() + perturbation.view_as(image)
            perturbed = torch.clamp(perturbed, 0, 1)
        else:
            break
    
    return perturbed

AutoAttack

AutoAttack 是一个对抗攻击集成，包含四种互补的攻击方法：⁵

APGD-CE: 自适应步长的 PGD（交叉熵损失）
APGD-T: PGD（DLR损失，定向攻击）
FAB: Fast Adaptive Boundary attack
SQUARE: 黑盒攻击

# AutoAttack 使用示例
from autoattack import AutoAttack
 
adversary = AutoAttack(model, norm='Linf', eps=8/255, version='standard')
adversarial = adversary.run_standard_evaluation(images, labels)

AutoAttack 是评估鲁棒性的事实标准。

对抗攻击对比总结

攻击方法	威胁模型	$L_{p}$ 范数	攻击强度	计算成本
FGSM	白盒	$L_{\infty}$	中	低
FGM	白盒	$L_{2}$	中	低
PGD	白盒	$L_{\infty}$	高	中
BIM	白盒	$L_{\infty}$	中高	中
MI-FGSM	白盒	$L_{\infty}$	高	中
C&W	白盒	任意	最高	高
DeepFool	白盒	$L_{2}$	高	中
EOT	白盒	物理	高	高
AutoAttack	白盒	$L_{\infty}$	最高	高

参考文献

Goodfellow, I. J., et al. (2015). Explaining and Harnessing Adversarial Examples. ICLR 2015. https://arxiv.org/abs/1412.6572 ↩
Madry, A., et al. (2018). Towards Deep Learning Models Resistant to Adversarial Attacks. ICLR 2018. https://arxiv.org/abs/1706.06083 ↩
Dong, Y., et al. (2018). Boosting Adversarial Attacks with Momentum. CVPR 2018. https://arxiv.org/abs/1710.06081 ↩
Moosavi-Dezfooli, S. M., et al. (2016). DeepFool: A Universal First-order Method. CVPR 2016. https://arxiv.org/abs/1511.04599 ↩
Croce, F., & Hein, M. (2020). Reliable Evaluation of Adversarial Robustness with an Ensemble of Diverse Parameter-free Attacks. ICML 2020. https://arxiv.org/abs/2003.01690 ↩

Metaphor

探索

对抗攻击方法综述

对抗攻击方法综述

概述

一阶攻击方法

FGSM（Fast Gradient Sign Method）

FGM（Fast Gradient Method）

R-FGSM（Randomized FGSM）

迭代攻击方法

PGD（Projected Gradient Descent）

BIM（Basic Iterative Method）

MI-FGSM（Momentum Iterative FGSM）

EOT（Expectation over Transformation）

优化-based 攻击

C&W 攻击（Carlini & Wagner）

EoT 攻击变体

DeepFool 攻击

AutoAttack

对抗攻击对比总结

相关主题

参考文献

关系图谱

目录

Metaphor

探索

对抗攻击方法综述

对抗攻击方法综述

概述

一阶攻击方法

FGSM（Fast Gradient Sign Method）

FGM（Fast Gradient Method）

R-FGSM（Randomized FGSM）

迭代攻击方法

PGD（Projected Gradient Descent）

BIM（Basic Iterative Method）

MI-FGSM（Momentum Iterative FGSM）

EOT（Expectation over Transformation）

优化-based 攻击

C&W 攻击（Carlini & Wagner）

EoT 攻击变体

DeepFool 攻击

AutoAttack

对抗攻击对比总结

相关主题

参考文献

Footnotes

关系图谱

目录