对抗攻击方法综述

概述

对抗攻击(Adversarial Attack)是生成对抗样本的技术手段。根据攻击者掌握的信息不同,可分为:

攻击类型攻击者能力威胁模型
白盒攻击完整访问模型权重、梯度、结构强攻击,防御评估下界
黑盒攻击仅能查询模型输入输出依赖迁移性
物理攻击攻击可拍摄的物理物体现实威胁

一阶攻击方法

1. FGSM(Fast Gradient Sign Method)

FGSM是最早且最简单的一阶攻击方法,利用损失函数的梯度信息生成对抗扰动。1

数学推导

其中:

  • 是扰动步长
  • 是损失函数(如交叉熵)
  • 是符号函数

核心思想

FGSM利用了深度网络在高维空间的线性特性:

“For high-dimensional problems, even a small linear perturbation can cause the prediction to change dramatically.”

Python实现

def fgsm_attack(model, x, y, epsilon=0.03):
    """Fast Gradient Sign Method attack."""
    x.requires_grad = True
    
    # 前向传播
    output = model(x)
    loss = F.cross_entropy(output, y)
    
    # 反向传播获取梯度
    model.zero_grad()
    loss.backward()
    
    # 生成对抗样本
    grad = x.grad.data
    x_adv = x + epsilon * grad.sign()
    
    # 投影到L_infinity球
    x_adv = torch.clamp(x_adv, x - epsilon, x + epsilon)
    x_adv = torch.clamp(x_adv, 0, 1)
    
    return x_adv

2. FGM(Fast Gradient Method)

FGM是FGSM的范数版本:

def fgm_attack(model, x, y, epsilon=0.03):
    """Fast Gradient Method attack (L2 norm)."""
    x.requires_grad = True
    output = model(x)
    loss = F.cross_entropy(output, y)
    loss.backward()
    
    grad = x.grad.data
    grad_norm = torch.norm(grad, p=2, dim=1, keepdim=True)
    grad_normalized = grad / (grad_norm + 1e-10)
    
    x_adv = x + epsilon * grad_normalized
    x_adv = torch.clamp(x_adv, 0, 1)
    
    return x_adv

迭代攻击方法

1. BIM / PGD(Basic Iterative Method / Projected Gradient Descent)

Madry等人(2018)提出的PGD攻击是评估鲁棒性的标准方法。2

数学形式

其中 是投影算子,将结果投影回扰动球。

实现细节

def pgd_attack(model, x, y, epsilon=0.03, alpha=0.003, steps=10):
    """Projected Gradient Descent attack."""
    x_adv = x.clone().detach()
    
    # 初始化为球内随机点
    x_adv = x_adv + torch.empty_like(x_adv).uniform_(-epsilon, epsilon)
    x_adv = torch.clamp(x_adv, 0, 1).detach()
    
    for i in range(steps):
        x_adv.requires_grad = True
        
        output = model(x_adv)
        loss = F.cross_entropy(output, y)
        loss.backward()
        
        # 梯度上升
        with torch.no_grad():
            x_adv = x_adv + alpha * x_adv.grad.sign()
            
            # 投影回扰动球和原始输入邻域
            x_adv = torch.maximum(x_adv, x - epsilon)
            x_adv = torch.minimum(x_adv, x + epsilon)
            x_adv = torch.clamp(x_adv, 0, 1)
    
    return x_adv

2. MIM(Momentum Iterative Method)

在迭代过程中引入动量以增强攻击的迁移性:3

优化-based攻击

C&W攻击(Carlini & Wagner)

Carlini和Wagner提出的攻击通过优化直接最小化扰动:4

其中损失函数定义为:

是logits输出, 是目标类别, 是置信度参数。

DeepFool

Moosavi-Dezfooli等人提出的DeepFool通过迭代找到最近的决策边界:5

def deepfool_attack(model, x, num_classes=10, overshoot=0.02, max_iter=50):
    """DeepFool attack for untargeted classification."""
    x = x.clone().detach()
    pred = model(x).argmax(dim=1).item()
    
    for i in range(max_iter):
        x.requires_grad = True
        output = model(x)
        
        # 获取当前预测的logits梯度
        for c in range(num_classes):
            if c != pred:
                continue
            if output[0, c] == output[0, pred]:
                # 找到其他类别的梯度
                grad = torch.autograd.grad(output[0, c], x)[0]
                # 投影到正交方向
                perturbation = grad / (grad.norm() + 1e-10)
        
        # 更新并投影
        x = x + overshoot * perturbation
        x = torch.clamp(x, 0, 1)
        
        new_pred = model(x).argmax(dim=1).item()
        if new_pred != pred:
            break
    
    return x

决策边界攻击

HopSkipJumpAttack(HSJA)

一种基于决策的边界攻击,仅需模型预测标签:6

SignHunter

利用符号搜索在L∞约束下高效攻击。

物理世界攻击

EOT(Expectation over Transformation)

Athalye等人提出的EOT框架用于生成对物理变换鲁棒的对抗样本:7

其中 是物理变换分布(旋转、平移、缩放等)。

3D对抗物体

通过优化3D模型生成对抗样本,打印后拍照仍能攻击。

AutoAttack

AutoAttack是一个包含多种攻击的集成评估工具:8

  1. APGD-CE: 自适应PGD(交叉熵损失)
  2. APGD-T: 自适应PGD(DLR损失,目标攻击)
  3. FAB: 快速自适应边界攻击
  4. Square: 稀疏攻击
# 使用AutoAttack进行鲁棒性评估
from autoattack import AutoAttack
 
model.eval()
adversary = AutoAttack(model, norm='Linf', eps=8/255)
x_adv = adversary.run_standard_evaluation(x, y)

攻击方法对比

方法白盒/黑盒范数攻击强度速度
FGSM白盒中等极快
PGD白盒最强较慢
C&W白盒多种最强
DeepFool白盒中等
HSJA黑盒中等
AutoAttack白盒最强较慢

本章小结

对抗攻击方法从简单到复杂不断发展:

  1. 一阶攻击:FGSM利用线性近似,计算高效
  2. 迭代攻击:PGD是多步精细化,成为标准基准
  3. 优化攻击:C&W直接优化扰动,攻击强度高
  4. 决策攻击:仅需预测标签,适合黑盒场景
  5. 物理攻击:EOT框架实现现实世界攻击

理解攻击方法是设计防御的基础,下一章将介绍对抗训练等防御方法。

参考文献

Footnotes

  1. Goodfellow, I. J., et al. (2015). Explaining and Harnessing Adversarial Examples. ICLR 2015.

  2. Madry, A., et al. (2018). Towards Deep Learning Models Resistant to Adversarial Attacks. ICLR 2018.

  3. Dong, Y., et al. (2018). Boosting Adversarial Attacks with Momentum. CVPR 2018.

  4. Carlini, N., & Wagner, D. (2017). Towards Evaluating the Robustness of Neural Networks. IEEE S&P 2017.

  5. Moosavi-Dezfooli, S. M., et al. (2016). DeepFool: A Universal Noise-based Method. CVPR 2016.

  6. Chen, J., et al. (2019). HopSkipJumpAttack: A Query-Efficient Decision-Based Attack. IEEE S&P 2020.

  7. Athalye, A., et al. (2018). Synthesizing Robust Adversarial Examples. ICML 2018.

  8. Croce, F., & Hein, M. (2020). Reliable Evaluation of Adversarial Robustness with an AutoAttack. ICLR 2020.