Vision Transformer对抗鲁棒性

引言

Vision Transformer(ViT)在计算机视觉任务上取得了优异性能,但其对抗鲁棒性问题引起了广泛关注。早期研究认为ViT可能比CNN更鲁棒,但后续研究揭示了更复杂的图景。1

ViT与CNN鲁棒性对比

早期观点

部分研究表明ViT在对抗扰动下表现更好,原因可能包括:

  1. 全局注意力机制:捕获长距离依赖
  2. 更强的特征表示:预训练-微调范式
  3. 更少的归纳偏置:减少对局部模式的依赖

后续研究

更强的攻击模型揭示了ViT的脆弱性:2

模型Clean AccPGD-20 AccAutoAttack Acc
ResNet-5076.1%49.2%48.7%
DeiT-S79.9%47.3%46.1%
ViT-B/1681.1%44.8%43.5%

发现:在强攻击下,ViT并不比CNN更鲁棒。

ViT对抗脆弱性的原因

1. Patch Embedding的脆弱性

ViT的patch划分方式可能引入新的攻击面:

class PatchEmbedding:
    def __init__(self, patch_size=16):
        self.proj = nn.Conv2d(3, 768, patch_size, patch_size)
    
    def forward(self, x):
        # 将图像划分为patches
        patches = self.proj(x)  # [B, 768, H/16, W/16]
        B, C, H, W = patches.shape
        patches = patches.flatten(2).transpose(1, 2)  # [B, N, C]
        return patches

攻击策略:针对特定patch的定向攻击可能更有效。

2. Attention机制的脆弱性

注意力权重的分布可能暴露攻击向量:

def analyze_attention_pattern(model, x):
    """分析注意力模式以发现脆弱点。"""
    attention_maps = []
    for block in model.blocks:
        x = block.norm1(x)
        attn_weights = block.attn(x)
        attention_maps.append(attn_weights)
    
    # 分析高注意力区域
    return [w.mean(dim=1) for w in attention_maps]

3. Class Token的集中性

ViT的[CLS] token可能成为攻击目标:

攻击者可以针对[CLS] token的表示进行定向干扰

SAFER方法

核心思想

SAFER(Sharpness-Aware layer-selective Finetuning for Enhanced Robustness)针对ViT的过拟合问题:3

方法原理

标准微调:更新所有层
┌────────────────────────────────────┐
│ Layer 1 → Layer 2 → ... → Layer N  │
│   ↓         ↓            ↓         │
│ 更新     更新          更新          │
└────────────────────────────────────┘

SAFER:选择性更新最易过拟合的层
┌────────────────────────────────────┐
│ Layer 1 → Layer 2 → ... → Layer N  │
│  冻结      冻结    ← 更新 ← 更新    │
└────────────────────────────────────┘

算法步骤

def safer_training(model, x, y, epsilon=4/255, alpha=2/255):
    """
    SAFER: 选择性SAM微调。
    """
    model.train()
    
    # 1. 识别过拟合层(基于验证集性能下降)
    overfitting_layers = identify_overfitting_layers(
        model, validation_loader
    )
    
    # 2. 生成对抗样本
    x_adv = pgd_attack(model, x, y, epsilon, alpha)
    
    # 3. 仅对过拟合层应用SAM
    for name, param in model.named_parameters():
        layer_id = extract_layer_id(name)
        
        if layer_id in overfitting_layers:
            # SAM优化
            original_param = param.data.clone()
            loss = F.cross_entropy(model(x_adv), y)
            loss.backward()
            
            param.data -= alpha * param.grad.sign()
            
            # 计算SAM损失
            sam_loss = F.cross_entropy(model(x_adv), y)
            
            # 恢复并优化
            param.data = original_param
            sam_loss.backward()
        else:
            # 标准反向传播
            loss = F.cross_entropy(model(x_adv), y)
            loss.backward()

实验结果

方法CleanPGD-20AAΔ
标准微调82.3%42.1%41.5%-
SAFER83.1%48.7%47.9%+6.6%

PROSAC与ViT

AAAI 2025的发现

Feng等人(2025)发现ViT在认证鲁棒性测试中表现优于CNN:4

def prosac_comparison_vit_cnn(models, test_set):
    """
    比较ViT和CNN的PROSAC认证性能。
    """
    results = {}
    for name, model in models.items():
        is_safe = prosac_certification(model, test_set)
        certified_rate = is_safe.mean()
        results[name] = certified_rate
    
    return results

关键发现

  1. 更大模型更鲁棒:ViT-L比ViT-S更鲁棒
  2. 预训练帮助:ImageNet预训练显著提升鲁棒性
  3. Attention机制优势:在认证测试中注意力机制表现更好

ViT对抗防御策略

1. Attention Dropping

随机丢弃部分注意力头以增强鲁棒性:

def attention_dropout_hook(module, input, output):
    if training:
        attention_probs = output[1]  # 注意力权重
        dropped_attention = F.dropout(attention_probs, p=0.1, training=True)
        # 返回修改后的注意力
        return (output[0] @ dropped_attention, output[1])
    return output

2. Token Dropout

随机丢弃patch tokens:

class TokenDropout(nn.Module):
    def __init__(self, drop_prob=0.1):
        super().__init__()
        self.drop_prob = drop_prob
    
    def forward(self, x):
        if self.training:
            B, N, C = x.shape
            mask = torch.rand(B, N, 1, device=x.device) > self.drop_prob
            return x * mask
        return x

3. 对抗训练适配

针对ViT优化的对抗训练:

def vit_adversarial_training(model, x, y, 
                            epsilon=4/255, alpha=2/255, steps=7):
    """
    适配ViT的对抗训练。
    """
    # ViT对大扰动更敏感,使用较小的epsilon
    x_adv = x.clone().detach()
    
    # 随机初始化(ViT对初始化敏感)
    x_adv = x_adv + torch.empty_like(x_adv).uniform_(
        -epsilon, epsilon
    )
    x_adv = torch.clamp(x_adv, 0, 1)
    
    for _ in range(steps):
        x_adv.requires_grad = True
        
        # 前向传播
        output = model(x_adv)
        loss = F.cross_entropy(output, y)
        
        # 反向传播
        loss.backward()
        
        with torch.no_grad():
            x_adv = x_adv + alpha * x_adv.grad.sign()
            x_adv = torch.maximum(x_adv, x - epsilon)
            x_adv = torch.minimum(x_adv, x + epsilon)
            x_adv = torch.clamp(x_adv, 0, 1)
    
    # 使用对抗样本训练
    output = model(x_adv.detach())
    loss = F.cross_entropy(output, y)
    
    return loss

多头注意力的防御

注意力头重要性分析

def analyze_head_importance(model, x, y):
    """
    分析不同注意力头的重要性。
    """
    model.eval()
    importance = {}
    
    for block_idx, block in enumerate(model.blocks):
        for head_idx in range(block.attn.num_heads):
            # 获取特定头的注意力权重
            attn_weights = block.attn.get_head_weights(
                x, head_idx
            )
            
            # 评估删除该头的影响
            modified_model = delete_head(model, block_idx, head_idx)
            acc = evaluate(modified_model, x, y)
            
            importance[(block_idx, head_idx)] = acc
    
    return importance

关键头保护

针对关键注意力头的额外保护:

def protect_critical_heads(model, importance, threshold=0.5):
    """对关键头应用更强的正则化。"""
    critical_heads = [
        k for k, v in importance.items() 
        if v < threshold
    ]
    
    for block_idx, head_idx in critical_heads:
        # 添加额外正则化
        apply_extra_regularization(
            model.blocks[block_idx].attn,
            head_idx
        )

实践建议

ViT鲁棒性评估清单

def comprehensive_robustness_evaluation(model, test_loader):
    """
    全面的ViT鲁棒性评估。
    """
    results = {
        'clean': evaluate_accuracy(model, test_loader),
        'fgsm': evaluate_fgsm(model, test_loader),
        'pgd': evaluate_pgd(model, test_loader),
        'aa': evaluate_autoattack(model, test_loader),
        'noise': evaluate_gaussian_noise(model, test_loader),
        'blur': evaluate_blur(model, test_loader),
        'jpeg': evaluate_jpeg(model, test_loader),
    }
    
    return results

防御选择指南

场景推荐方法
资源受限Attention Dropout
需要认证保证PROSAC
过拟合严重SAFER
综合防御组合多种方法

本章小结

ViT的对抗鲁棒性研究揭示了以下关键点:

  1. 鲁棒性对比:ViT在强攻击下不比CNN更鲁棒
  2. 脆弱性原因:Patch embedding、注意力机制、Class token
  3. SAFER方法:选择性SAM微调提升ViT鲁棒性
  4. PROSAC发现:大规模ViT在认证测试中表现良好
  5. 防御策略:注意力dropout、token dropout、对抗训练适配

参考文献

Footnotes

  1. Bai, Y., et al. (2021). Evolutionary Adversarial Attack on Vision Transformers. arXiv.

  2. Ghiasi, A., et al. (2022). A Closer Look at Adversarial Robustness of Vision Transformers. arXiv.

  3. Zhang, Y., et al. (2025). SAFER: Sharpness-Aware layer-selective Finetuning for Enhanced Robustness. arXiv:2501.01529.

  4. Feng, C., et al. (2025). PROSAC: Provably Safe Certification for ML Models Under Adversarial Attacks. AAAI 2025.