通用对抗扰动
概述
通用对抗扰动(Universal Adversarial Perturbations, UAP)是一种与输入无关的对抗扰动,对大量自然图像都具有攻击性。与样本特定的对抗扰动不同,UAP 可以预先计算并应用于任何输入,这使其成为实际攻击场景中的强大工具。1
定义与形式化
数学定义
给定分类器 ,UAP 满足:
其中 是目标攻击成功率阈值。
几何解释
UAP 的存在性源于深度网络决策边界的特殊几何结构:
- 决策边界在输入空间中近似平行
- 存在少数方向几乎正交于所有样本的流形
- 这些方向即为 UAP 的候选
生成方法
UAP 算法(Moosavi-Dezfooli et al., 2017)
def uap_attack(model, dataloader, epsilon=10/255, num_iterations=10, delta=0.2):
"""
Universal Adversarial Perturbation (UAP)
Args:
model: Target classifier
dataloader: Data samples for crafting UAP
epsilon: Maximum perturbation norm
num_iterations: Iterations per sample
delta: Fooling rate target
"""
device = next(model.parameters()).device
# 初始化 UAP
delta = torch.zeros(1, 3, 224, 224, device=device)
# 统计
total = 0
fooled = 0
for images, labels in dataloader:
images, labels = images.to(device), labels.to(device)
num_images = images.size(0)
with torch.no_grad():
preds_before = model(images).argmax(dim=1)
preds_before_pert = model(images + delta).argmax(dim=1)
fooled_before = (preds_before_pert != preds_before).sum().item()
# 迭代更新 UAP
for _ in range(num_iterations):
images.requires_grad = True
outputs = model(images + delta)
# 找到被正确分类且未被扰动攻击的样本
mask = (preds_before == labels) & (preds_before_pert == preds_before)
if mask.sum() == 0:
continue
masked_images = images[mask]
# 计算到决策边界的距离
for i, (img, p) in enumerate(zip(masked_images, preds_before[mask])):
img = img.unsqueeze(0)
img.requires_grad = True
output = model(img + delta)
loss = output[0, p]
model.zero_grad()
loss.backward()
# 计算最小扰动方向
grad = img.grad.data
r = epsilon * grad / (grad.norm() + 1e-10)
# 检查是否越过边界
new_pred = model(img + delta + r).argmax(dim=1)
if new_pred != p:
delta.data = torch.clamp(delta.data + r, -epsilon, epsilon)
total += num_images
with torch.no_grad():
preds_after = model(images + delta).argmax(dim=1)
fooled += (preds_after != labels).sum().item()
print(f"Fooling rate: {fooled/total:.2%}")
return delta面向数据驱动的 UAP(Data-Driven UAP)
def data_driven_uap(model, images, epsilon=10/255, momentum=1.0, iterations=10):
"""
带动量的 UAP 生成
"""
delta = torch.zeros_like(images[0]).unsqueeze(0)
momentum = torch.zeros_like(delta)
for _ in range(iterations):
delta.requires_grad = True
# 对多个样本计算梯度
total_grad = 0
for x in images:
x = x.unsqueeze(0)
x_adv = x + delta
if x_adv.requires_grad:
output = model(x_adv)
loss = F.cross_entropy(output, torch.tensor([output.argmax().item()]))
model.zero_grad()
loss.backward()
total_grad += x_adv.grad.data
# 更新动量
momentum = momentum * momentum + total_grad
delta = delta + epsilon * momentum.sign()
delta = torch.clamp(delta, -epsilon, epsilon)
return delta.detach()CLIP 驱动的 UAP
最近的研究发现,利用 CLIP 的文本-图像对齐可以生成更强的 UAP:2
def clip_uap(model, clip_model, tokenizer, images, epsilon=10/255):
"""
CLIP-textual-concept guided UAP
"""
delta = torch.zeros_like(images[0]).unsqueeze(0)
# 编码文本概念
text_embeddings = []
concepts = ["a photo", "an image", "a picture"]
for concept in concepts:
tokens = tokenizer(concept).to(images.device)
emb = clip_model.encode_text(tokens)
text_embeddings.append(emb)
for _ in range(20):
delta.requires_grad = True
# 图像编码
image_emb = model.encode_image(images + delta)
# 对比损失:使图像靠近错误类别
loss = 0
for text_emb in text_embeddings:
similarity = (image_emb @ text_emb.T).mean()
loss -= similarity # 最小化相似度
model.zero_grad()
loss.backward()
delta = delta + epsilon * delta.grad.sign()
delta = torch.clamp(delta, -epsilon, epsilon)
return delta.detach()迁移性与防御
迁移性分析
UAP 的迁移性源于:
- 共享脆弱方向:不同模型共享类似的对抗方向
- 特征层级相似:预训练模型提取相似的低级特征
- 决策边界对齐:边界几何结构跨模型相似
防御策略
1. 对抗训练
def uap_adversarial_training(model, dataloader, epsilon=10/255):
"""针对 UAP 的对抗训练"""
optimizer = torch.optim.Adam(model.parameters())
for images, labels in dataloader:
images = images.to(device)
# 生成 UAP
uap = uap_attack(model, [(images, labels)], epsilon=epsilon)
# 对抗训练
optimizer.zero_grad()
clean_loss = F.cross_entropy(model(images), labels)
adv_loss = F.cross_entropy(model(images + uap), labels)
loss = 0.5 * (clean_loss + adv_loss)
loss.backward()
optimizer.step()2. 输入变换防御
| 变换方法 | 防御效果 | 计算开销 |
|---|---|---|
| JPEG 压缩 | 中等 | 低 |
| 随机调整大小 | 中等 | 低 |
| 特征压缩 | 较高 | 中 |
| 随机填充 | 中等 | 低 |
def random_transform(x):
"""随机变换防御"""
# 随机裁剪
x = F.random_crop(x, (224, 224))
# 随机水平翻转
if torch.rand(1) > 0.5:
x = torch.flip(x, dims=[3])
# 随机颜色抖动
x = (x + torch.randn_like(x) * 0.05).clamp(0, 1)
return x3. 模型集成
def ensemble_prediction(models, x, weights=None):
"""集成预测"""
if weights is None:
weights = [1.0 / len(models)] * len(models)
logits = []
for model, w in zip(models, weights):
model.eval()
with torch.no_grad():
logit = model(x) * w
logits.append(logit)
return torch.stack(logits).sum(dim=0)理论分析
UAP 存在性证明
定理(Moosavi-Dezfooli et al., 2017):
假设网络决策边界是 Lipschitz 平滑的,则存在一个范数有界的扰动 ,使得对大多数样本的攻击成功率任意高。
证明思路:
- 定义每对类别 的边界超平面
- 边界超平面的法向量集合形成低维子空间
- 存在一个方向同时接近所有边界的法向量
- 该方向即 UAP 候选
维度分析
UAP 的有效性取决于:
- 输入空间维度 :维度越高,UAP 越难存在
- 类别数量 :类别越多,边界越多,UAP 越难满足所有边界
- 模型复杂度:复杂模型有更多局部边界
物理世界 UAP
3D 通用对抗扰动
def universal_3d_patch(model, obj_3d_renderer, epsilon=0.3):
"""
生成 3D 通用对抗补丁
"""
patch = torch.randn(1, 3, 128, 128, requires_grad=True)
optimizer = torch.optim.Adam([patch])
for _ in range(1000):
optimizer.zero_grad()
loss = 0
# 多视角渲染
for viewpoint in range(16): # 16 个视角
rendered = obj_3d_renderer.render(patch, viewpoint)
pred = model(rendered)
loss += F.cross_entropy(pred, wrong_label)
loss.backward()
optimizer.step()
patch.data = torch.clamp(patch.data, -epsilon, epsilon)
return patch.detach()最新研究进展
稀疏 UAP
稀疏 UAP 只修改少量像素:3
def sparse_uap(model, images, epsilon=10/255, sparsity=0.1):
"""稀疏 UAP"""
delta = torch.zeros_like(images)
num_pixels = int(sparsity * images.numel())
for _ in range(50):
# 只更新选中的像素
delta_flat = delta.view(-1)
indices = torch.randperm(delta_flat.numel())[:num_pixels]
delta_flat[indices].requires_grad = True
images_batch = images + delta
output = model(images_batch)
loss = F.cross_entropy(output, wrong_labels)
loss.backward()
delta_flat[indices] += epsilon * delta_flat[indices].grad.sign()
delta = delta.view(images.shape)
return delta可迁移 UAP
提升跨模型迁移性的方法:4
- 多模型集成梯度
- 随机化动量
- 数据增强
- 对抗样本蒸馏
相关主题
- adversarial-attack-methods — 对抗攻击方法综述
- adversarial-examples-phenomenology — 对抗样本现象学
- adversarial-training-methods — 对抗训练防御
参考文献
Footnotes
-
Moosavi-Dezfooli, S. M., et al. (2017). Universal Adversarial Perturbations. CVPR 2017. https://arxiv.org/abs/1610.08401 ↩
-
Chen, J., et al. (2025). Universal Adversarial Perturbation with Pseudo-semantic Prior. arXiv:2502.21048. https://arxiv.org/abs/2502.21048 ↩
-
Aghakhani, H., et al. (2025). Sparse and Transferable Universal Singular Vectors Attack. arXiv:2401.14031. https://arxiv.org/abs/2401.14031 ↩
-
Liu, Y., et al. (2025). Improving Generalization of Universal Adversarial Perturbation via Dynamic Maximin Optimization. arXiv:2503.12793. https://arxiv.org/abs/2503.12793 ↩