Vision Transformer对抗鲁棒性
引言
Vision Transformer(ViT)在计算机视觉任务上取得了优异性能,但其对抗鲁棒性问题引起了广泛关注。早期研究认为ViT可能比CNN更鲁棒,但后续研究揭示了更复杂的图景。1
ViT与CNN鲁棒性对比
早期观点
部分研究表明ViT在对抗扰动下表现更好,原因可能包括:
- 全局注意力机制:捕获长距离依赖
- 更强的特征表示:预训练-微调范式
- 更少的归纳偏置:减少对局部模式的依赖
后续研究
更强的攻击模型揭示了ViT的脆弱性:2
| 模型 | Clean Acc | PGD-20 Acc | AutoAttack Acc |
|---|---|---|---|
| ResNet-50 | 76.1% | 49.2% | 48.7% |
| DeiT-S | 79.9% | 47.3% | 46.1% |
| ViT-B/16 | 81.1% | 44.8% | 43.5% |
发现:在强攻击下,ViT并不比CNN更鲁棒。
ViT对抗脆弱性的原因
1. Patch Embedding的脆弱性
ViT的patch划分方式可能引入新的攻击面:
class PatchEmbedding:
def __init__(self, patch_size=16):
self.proj = nn.Conv2d(3, 768, patch_size, patch_size)
def forward(self, x):
# 将图像划分为patches
patches = self.proj(x) # [B, 768, H/16, W/16]
B, C, H, W = patches.shape
patches = patches.flatten(2).transpose(1, 2) # [B, N, C]
return patches攻击策略:针对特定patch的定向攻击可能更有效。
2. Attention机制的脆弱性
注意力权重的分布可能暴露攻击向量:
def analyze_attention_pattern(model, x):
"""分析注意力模式以发现脆弱点。"""
attention_maps = []
for block in model.blocks:
x = block.norm1(x)
attn_weights = block.attn(x)
attention_maps.append(attn_weights)
# 分析高注意力区域
return [w.mean(dim=1) for w in attention_maps]3. Class Token的集中性
ViT的[CLS] token可能成为攻击目标:
攻击者可以针对[CLS] token的表示进行定向干扰
SAFER方法
核心思想
SAFER(Sharpness-Aware layer-selective Finetuning for Enhanced Robustness)针对ViT的过拟合问题:3
方法原理
标准微调:更新所有层
┌────────────────────────────────────┐
│ Layer 1 → Layer 2 → ... → Layer N │
│ ↓ ↓ ↓ │
│ 更新 更新 更新 │
└────────────────────────────────────┘
SAFER:选择性更新最易过拟合的层
┌────────────────────────────────────┐
│ Layer 1 → Layer 2 → ... → Layer N │
│ 冻结 冻结 ← 更新 ← 更新 │
└────────────────────────────────────┘
算法步骤
def safer_training(model, x, y, epsilon=4/255, alpha=2/255):
"""
SAFER: 选择性SAM微调。
"""
model.train()
# 1. 识别过拟合层(基于验证集性能下降)
overfitting_layers = identify_overfitting_layers(
model, validation_loader
)
# 2. 生成对抗样本
x_adv = pgd_attack(model, x, y, epsilon, alpha)
# 3. 仅对过拟合层应用SAM
for name, param in model.named_parameters():
layer_id = extract_layer_id(name)
if layer_id in overfitting_layers:
# SAM优化
original_param = param.data.clone()
loss = F.cross_entropy(model(x_adv), y)
loss.backward()
param.data -= alpha * param.grad.sign()
# 计算SAM损失
sam_loss = F.cross_entropy(model(x_adv), y)
# 恢复并优化
param.data = original_param
sam_loss.backward()
else:
# 标准反向传播
loss = F.cross_entropy(model(x_adv), y)
loss.backward()实验结果
| 方法 | Clean | PGD-20 | AA | Δ |
|---|---|---|---|---|
| 标准微调 | 82.3% | 42.1% | 41.5% | - |
| SAFER | 83.1% | 48.7% | 47.9% | +6.6% |
PROSAC与ViT
AAAI 2025的发现
Feng等人(2025)发现ViT在认证鲁棒性测试中表现优于CNN:4
def prosac_comparison_vit_cnn(models, test_set):
"""
比较ViT和CNN的PROSAC认证性能。
"""
results = {}
for name, model in models.items():
is_safe = prosac_certification(model, test_set)
certified_rate = is_safe.mean()
results[name] = certified_rate
return results关键发现
- 更大模型更鲁棒:ViT-L比ViT-S更鲁棒
- 预训练帮助:ImageNet预训练显著提升鲁棒性
- Attention机制优势:在认证测试中注意力机制表现更好
ViT对抗防御策略
1. Attention Dropping
随机丢弃部分注意力头以增强鲁棒性:
def attention_dropout_hook(module, input, output):
if training:
attention_probs = output[1] # 注意力权重
dropped_attention = F.dropout(attention_probs, p=0.1, training=True)
# 返回修改后的注意力
return (output[0] @ dropped_attention, output[1])
return output2. Token Dropout
随机丢弃patch tokens:
class TokenDropout(nn.Module):
def __init__(self, drop_prob=0.1):
super().__init__()
self.drop_prob = drop_prob
def forward(self, x):
if self.training:
B, N, C = x.shape
mask = torch.rand(B, N, 1, device=x.device) > self.drop_prob
return x * mask
return x3. 对抗训练适配
针对ViT优化的对抗训练:
def vit_adversarial_training(model, x, y,
epsilon=4/255, alpha=2/255, steps=7):
"""
适配ViT的对抗训练。
"""
# ViT对大扰动更敏感,使用较小的epsilon
x_adv = x.clone().detach()
# 随机初始化(ViT对初始化敏感)
x_adv = x_adv + torch.empty_like(x_adv).uniform_(
-epsilon, epsilon
)
x_adv = torch.clamp(x_adv, 0, 1)
for _ in range(steps):
x_adv.requires_grad = True
# 前向传播
output = model(x_adv)
loss = F.cross_entropy(output, y)
# 反向传播
loss.backward()
with torch.no_grad():
x_adv = x_adv + alpha * x_adv.grad.sign()
x_adv = torch.maximum(x_adv, x - epsilon)
x_adv = torch.minimum(x_adv, x + epsilon)
x_adv = torch.clamp(x_adv, 0, 1)
# 使用对抗样本训练
output = model(x_adv.detach())
loss = F.cross_entropy(output, y)
return loss多头注意力的防御
注意力头重要性分析
def analyze_head_importance(model, x, y):
"""
分析不同注意力头的重要性。
"""
model.eval()
importance = {}
for block_idx, block in enumerate(model.blocks):
for head_idx in range(block.attn.num_heads):
# 获取特定头的注意力权重
attn_weights = block.attn.get_head_weights(
x, head_idx
)
# 评估删除该头的影响
modified_model = delete_head(model, block_idx, head_idx)
acc = evaluate(modified_model, x, y)
importance[(block_idx, head_idx)] = acc
return importance关键头保护
针对关键注意力头的额外保护:
def protect_critical_heads(model, importance, threshold=0.5):
"""对关键头应用更强的正则化。"""
critical_heads = [
k for k, v in importance.items()
if v < threshold
]
for block_idx, head_idx in critical_heads:
# 添加额外正则化
apply_extra_regularization(
model.blocks[block_idx].attn,
head_idx
)实践建议
ViT鲁棒性评估清单
def comprehensive_robustness_evaluation(model, test_loader):
"""
全面的ViT鲁棒性评估。
"""
results = {
'clean': evaluate_accuracy(model, test_loader),
'fgsm': evaluate_fgsm(model, test_loader),
'pgd': evaluate_pgd(model, test_loader),
'aa': evaluate_autoattack(model, test_loader),
'noise': evaluate_gaussian_noise(model, test_loader),
'blur': evaluate_blur(model, test_loader),
'jpeg': evaluate_jpeg(model, test_loader),
}
return results防御选择指南
| 场景 | 推荐方法 |
|---|---|
| 资源受限 | Attention Dropout |
| 需要认证保证 | PROSAC |
| 过拟合严重 | SAFER |
| 综合防御 | 组合多种方法 |
本章小结
ViT的对抗鲁棒性研究揭示了以下关键点:
- 鲁棒性对比:ViT在强攻击下不比CNN更鲁棒
- 脆弱性原因:Patch embedding、注意力机制、Class token
- SAFER方法:选择性SAM微调提升ViT鲁棒性
- PROSAC发现:大规模ViT在认证测试中表现良好
- 防御策略:注意力dropout、token dropout、对抗训练适配
参考文献
Footnotes
-
Bai, Y., et al. (2021). Evolutionary Adversarial Attack on Vision Transformers. arXiv. ↩
-
Ghiasi, A., et al. (2022). A Closer Look at Adversarial Robustness of Vision Transformers. arXiv. ↩
-
Zhang, Y., et al. (2025). SAFER: Sharpness-Aware layer-selective Finetuning for Enhanced Robustness. arXiv:2501.01529. ↩
-
Feng, C., et al. (2025). PROSAC: Provably Safe Certification for ML Models Under Adversarial Attacks. AAAI 2025. ↩