通用对抗扰动

概述

通用对抗扰动(Universal Adversarial Perturbations, UAP)是一种与输入无关的对抗扰动,对大量自然图像都具有攻击性。与样本特定的对抗扰动不同,UAP 可以预先计算并应用于任何输入,这使其成为实际攻击场景中的强大工具。1

定义与形式化

数学定义

给定分类器 ,UAP 满足:

其中 是目标攻击成功率阈值。

几何解释

UAP 的存在性源于深度网络决策边界的特殊几何结构:

  • 决策边界在输入空间中近似平行
  • 存在少数方向几乎正交于所有样本的流形
  • 这些方向即为 UAP 的候选

生成方法

UAP 算法(Moosavi-Dezfooli et al., 2017)

def uap_attack(model, dataloader, epsilon=10/255, num_iterations=10, delta=0.2):
    """
    Universal Adversarial Perturbation (UAP)
    
    Args:
        model: Target classifier
        dataloader: Data samples for crafting UAP
        epsilon: Maximum perturbation norm
        num_iterations: Iterations per sample
        delta: Fooling rate target
    """
    device = next(model.parameters()).device
    
    # 初始化 UAP
    delta = torch.zeros(1, 3, 224, 224, device=device)
    
    # 统计
    total = 0
    fooled = 0
    
    for images, labels in dataloader:
        images, labels = images.to(device), labels.to(device)
        num_images = images.size(0)
        
        with torch.no_grad():
            preds_before = model(images).argmax(dim=1)
            preds_before_pert = model(images + delta).argmax(dim=1)
        
        fooled_before = (preds_before_pert != preds_before).sum().item()
        
        # 迭代更新 UAP
        for _ in range(num_iterations):
            images.requires_grad = True
            outputs = model(images + delta)
            
            # 找到被正确分类且未被扰动攻击的样本
            mask = (preds_before == labels) & (preds_before_pert == preds_before)
            
            if mask.sum() == 0:
                continue
            
            masked_images = images[mask]
            
            # 计算到决策边界的距离
            for i, (img, p) in enumerate(zip(masked_images, preds_before[mask])):
                img = img.unsqueeze(0)
                img.requires_grad = True
                
                output = model(img + delta)
                loss = output[0, p]
                model.zero_grad()
                loss.backward()
                
                # 计算最小扰动方向
                grad = img.grad.data
                r = epsilon * grad / (grad.norm() + 1e-10)
                
                # 检查是否越过边界
                new_pred = model(img + delta + r).argmax(dim=1)
                if new_pred != p:
                    delta.data = torch.clamp(delta.data + r, -epsilon, epsilon)
        
        total += num_images
        with torch.no_grad():
            preds_after = model(images + delta).argmax(dim=1)
        fooled += (preds_after != labels).sum().item()
        
        print(f"Fooling rate: {fooled/total:.2%}")
    
    return delta

面向数据驱动的 UAP(Data-Driven UAP)

def data_driven_uap(model, images, epsilon=10/255, momentum=1.0, iterations=10):
    """
    带动量的 UAP 生成
    """
    delta = torch.zeros_like(images[0]).unsqueeze(0)
    momentum = torch.zeros_like(delta)
    
    for _ in range(iterations):
        delta.requires_grad = True
        
        # 对多个样本计算梯度
        total_grad = 0
        for x in images:
            x = x.unsqueeze(0)
            x_adv = x + delta
            
            if x_adv.requires_grad:
                output = model(x_adv)
                loss = F.cross_entropy(output, torch.tensor([output.argmax().item()]))
                model.zero_grad()
                loss.backward()
                total_grad += x_adv.grad.data
        
        # 更新动量
        momentum = momentum * momentum + total_grad
        delta = delta + epsilon * momentum.sign()
        delta = torch.clamp(delta, -epsilon, epsilon)
    
    return delta.detach()

CLIP 驱动的 UAP

最近的研究发现,利用 CLIP 的文本-图像对齐可以生成更强的 UAP:2

def clip_uap(model, clip_model, tokenizer, images, epsilon=10/255):
    """
    CLIP-textual-concept guided UAP
    """
    delta = torch.zeros_like(images[0]).unsqueeze(0)
    
    # 编码文本概念
    text_embeddings = []
    concepts = ["a photo", "an image", "a picture"]
    for concept in concepts:
        tokens = tokenizer(concept).to(images.device)
        emb = clip_model.encode_text(tokens)
        text_embeddings.append(emb)
    
    for _ in range(20):
        delta.requires_grad = True
        
        # 图像编码
        image_emb = model.encode_image(images + delta)
        
        # 对比损失:使图像靠近错误类别
        loss = 0
        for text_emb in text_embeddings:
            similarity = (image_emb @ text_emb.T).mean()
            loss -= similarity  # 最小化相似度
        
        model.zero_grad()
        loss.backward()
        
        delta = delta + epsilon * delta.grad.sign()
        delta = torch.clamp(delta, -epsilon, epsilon)
    
    return delta.detach()

迁移性与防御

迁移性分析

UAP 的迁移性源于:

  1. 共享脆弱方向:不同模型共享类似的对抗方向
  2. 特征层级相似:预训练模型提取相似的低级特征
  3. 决策边界对齐:边界几何结构跨模型相似

防御策略

1. 对抗训练

def uap_adversarial_training(model, dataloader, epsilon=10/255):
    """针对 UAP 的对抗训练"""
    optimizer = torch.optim.Adam(model.parameters())
    
    for images, labels in dataloader:
        images = images.to(device)
        
        # 生成 UAP
        uap = uap_attack(model, [(images, labels)], epsilon=epsilon)
        
        # 对抗训练
        optimizer.zero_grad()
        clean_loss = F.cross_entropy(model(images), labels)
        adv_loss = F.cross_entropy(model(images + uap), labels)
        loss = 0.5 * (clean_loss + adv_loss)
        loss.backward()
        optimizer.step()

2. 输入变换防御

变换方法防御效果计算开销
JPEG 压缩中等
随机调整大小中等
特征压缩较高
随机填充中等
def random_transform(x):
    """随机变换防御"""
    # 随机裁剪
    x = F.random_crop(x, (224, 224))
    
    # 随机水平翻转
    if torch.rand(1) > 0.5:
        x = torch.flip(x, dims=[3])
    
    # 随机颜色抖动
    x = (x + torch.randn_like(x) * 0.05).clamp(0, 1)
    
    return x

3. 模型集成

def ensemble_prediction(models, x, weights=None):
    """集成预测"""
    if weights is None:
        weights = [1.0 / len(models)] * len(models)
    
    logits = []
    for model, w in zip(models, weights):
        model.eval()
        with torch.no_grad():
            logit = model(x) * w
            logits.append(logit)
    
    return torch.stack(logits).sum(dim=0)

理论分析

UAP 存在性证明

定理(Moosavi-Dezfooli et al., 2017):

假设网络决策边界是 Lipschitz 平滑的,则存在一个范数有界的扰动 ,使得对大多数样本的攻击成功率任意高。

证明思路

  1. 定义每对类别 的边界超平面
  2. 边界超平面的法向量集合形成低维子空间
  3. 存在一个方向同时接近所有边界的法向量
  4. 该方向即 UAP 候选

维度分析

UAP 的有效性取决于:

  • 输入空间维度 :维度越高,UAP 越难存在
  • 类别数量 :类别越多,边界越多,UAP 越难满足所有边界
  • 模型复杂度:复杂模型有更多局部边界

物理世界 UAP

3D 通用对抗扰动

def universal_3d_patch(model, obj_3d_renderer, epsilon=0.3):
    """
    生成 3D 通用对抗补丁
    """
    patch = torch.randn(1, 3, 128, 128, requires_grad=True)
    optimizer = torch.optim.Adam([patch])
    
    for _ in range(1000):
        optimizer.zero_grad()
        loss = 0
        
        # 多视角渲染
        for viewpoint in range(16):  # 16 个视角
            rendered = obj_3d_renderer.render(patch, viewpoint)
            pred = model(rendered)
            loss += F.cross_entropy(pred, wrong_label)
        
        loss.backward()
        optimizer.step()
        patch.data = torch.clamp(patch.data, -epsilon, epsilon)
    
    return patch.detach()

最新研究进展

稀疏 UAP

稀疏 UAP 只修改少量像素:3

def sparse_uap(model, images, epsilon=10/255, sparsity=0.1):
    """稀疏 UAP"""
    delta = torch.zeros_like(images)
    num_pixels = int(sparsity * images.numel())
    
    for _ in range(50):
        # 只更新选中的像素
        delta_flat = delta.view(-1)
        indices = torch.randperm(delta_flat.numel())[:num_pixels]
        
        delta_flat[indices].requires_grad = True
        images_batch = images + delta
        output = model(images_batch)
        loss = F.cross_entropy(output, wrong_labels)
        loss.backward()
        
        delta_flat[indices] += epsilon * delta_flat[indices].grad.sign()
        delta = delta.view(images.shape)
    
    return delta

可迁移 UAP

提升跨模型迁移性的方法:4

  1. 多模型集成梯度
  2. 随机化动量
  3. 数据增强
  4. 对抗样本蒸馏

相关主题


参考文献

Footnotes

  1. Moosavi-Dezfooli, S. M., et al. (2017). Universal Adversarial Perturbations. CVPR 2017. https://arxiv.org/abs/1610.08401

  2. Chen, J., et al. (2025). Universal Adversarial Perturbation with Pseudo-semantic Prior. arXiv:2502.21048. https://arxiv.org/abs/2502.21048

  3. Aghakhani, H., et al. (2025). Sparse and Transferable Universal Singular Vectors Attack. arXiv:2401.14031. https://arxiv.org/abs/2401.14031

  4. Liu, Y., et al. (2025). Improving Generalization of Universal Adversarial Perturbation via Dynamic Maximin Optimization. arXiv:2503.12793. https://arxiv.org/abs/2503.12793