认证鲁棒性

引言

对抗训练可以提升模型的鲁棒性,但它只提供**经验(empirical)**的鲁棒性保证。我们无法确定模型是否对某个输入真正鲁棒。**认证鲁棒性(Certified Robustness)旨在提供可证明(provable/certified)**的鲁棒性下界。1

认证鲁棒性的定义

数学形式化

对于输入 和扰动半径 ,如果存在一个认证半径 ,使得:

则模型在 处是 -鲁棒的。

认证 vs 经验鲁棒性

类型描述保证强度
经验鲁棒性对特定攻击(如PGD)的攻击成功率
认证鲁棒性对所有扰动的通用下界

随机平滑(Randomized Smoothing)

核心思想

Cohen等人(2019)提出的随机平滑是认证鲁棒性的里程碑方法:2

将分类器 通过随机噪声转化为平滑分类器

高斯平滑的认证保证

核心定理:设 为输入, 为平滑后的预测类别, 为类别 的下界概率,则模型在 处具有认证鲁棒半径:

其中 是标准正态分布的逆CDF。

认证半径推导

对于二分类情况(类别 vs ):

,则认证半径为:

实现代码

import torch
from scipy.stats import norm
 
def certified_accuracy(model, x, y, n_samples=1000, sigma=0.25, alpha=0.001):
    """
    随机平滑认证鲁棒性评估。
    
    Args:
        model: 基分类器
        x: 输入图像
        y: 真实标签
        n_samples: 采样数量
        sigma: 噪声标准差
        alpha: 置信度参数
    
    Returns:
        predicted_class, certified_radius, clean_acc
    """
    with torch.no_grad():
        # Monte Carlo采样
        epsilon = torch.randn(n_samples, *x.shape) * sigma
        x_noisy = (x + epsilon).clamp(0, 1)
        
        # 批量推理
        logits = model(x_noisy)
        counts = logits.argmax(dim=-1).bincount()
        
        # 获取预测类别和概率下界
        predicted_class = counts.argmax().item()
        p_lower = (counts[predicted_class].item() - norm.ppf(1 - alpha/2)) / n_samples
        
        # 计算认证半径
        if p_lower > 0.5:
            certified_radius = sigma * norm.ppf(p_lower)
        else:
            certified_radius = 0.0
    
    # 干净样本准确率
    clean_pred = model(x.unsqueeze(0)).argmax().item()
    clean_acc = (clean_pred == y)
    
    return predicted_class, certified_radius, clean_acc

Abadia等人(2022)改进

Abadia等人提出Medoid Smoothing,使用中位数而非多数投票,提高效率:3

def medoid_smoothing(model, x, n_samples=100, sigma=0.25):
    """Medoid平滑:返回到最近多数类。"""
    with torch.no_grad():
        # 采样
        epsilon = torch.randn(n_samples, *x.shape) * sigma
        x_noisy = (x + epsilon).clamp(0, 1)
        
        # 提取特征
        features = model.extract_features(x_noisy)
        
        # 计算Medoid
        medoid_idx = medoid(features)
        return model.classify(features[medoid_idx:medoid_idx+1])

神经网络认证方法

1. 逐层上界分析(IBP)

Hein等人提出的区间边界传播(Interval Bound Propagation):4

对于第 层:

CROWN认证器

Zhang等人提出的CROWN方法利用线性松弛:5

def crown_bounds(model, x, epsilon, layers_to_check):
    """
    CROWN线性边界传播。
    """
    lb = x - epsilon  # 下界
    ub = x + epsilon  # 上界
    
    # 逐层传播
    for layer in model.layers[:layers_to_check]:
        W, b = layer.weight, layer.bias
        
        # 线性松弛
        if isinstance(layer, torch.nn.ReLU):
            # ReLU的CROWN松弛
            alpha_lb, alpha_ub = compute_relu_relaxation(lb, ub)
            W_new = W * alpha_ub
            b_new = b + W * alpha_lb * (lb < 0)
        else:
            W_new = W
            b_new = b
        
        lb = lb @ W_new.t() + b_new
        ub = ub @ W_new.t() + b_new
    
    return lb, ub

Fast-Lin / Fast-Lip

Weng等人提出的高效Lipschitz常数估计方法:6

def fast_lip_bound(model, x, epsilon):
    """
    快速Lipschitz边界估计。
    """
    lipschitz_product = 1.0
    
    for layer in model.layers:
        if isinstance(layer, torch.nn.Linear):
            # 更新乘积
            lipschitz_product *= layer.weight.norm(p=2)
        elif isinstance(layer, torch.nn.Conv2d):
            # 卷积层谱范数
            lipschitz_product *= compute_conv spectral_norm(layer)
        elif isinstance(layer, torch.nn.ReLU):
            # ReLU: Lipschitz上界为1
            pass
    
    return lipschitz_product

可证明防御方法

1. IBP训练

使用IBP边界作为正则化项:7

def ibp_loss(model, x, y, epsilon, lambda_ibp=1.0):
    """
    基于IBP的对抗训练损失。
    """
    # 标准交叉熵
    ce_loss = F.cross_entropy(model(x), y)
    
    # IBP边界损失
    lb, ub = crown_bounds(model, x, epsilon)
    ibp_loss = F.cross_entropy(model(ub), y) + F.cross_entropy(model(lb), y)
    
    return ce_loss + lambda_ibp * ibp_loss

2. smoothed classifiers

基于随机平滑的防御:

class SmoothedClassifier:
    def __init__(self, base_classifier, sigma=0.25, n_classes=10):
        self.base = base_classifier
        self.sigma = sigma
        self.n_classes = n_classes
    
    def predict(self, x, n_samples=100):
        """返回平滑后的预测。"""
        with torch.no_grad():
            logits = torch.zeros(self.n_classes)
            for _ in range(n_samples):
                noise = torch.randn_like(x) * self.sigma
                pred = self.base(x + noise).argmax()
                logits[pred] += 1
            return logits.argmax().item()
    
    def certify(self, x, n_samples=100, alpha=0.001):
        """返回预测和认证半径。"""
        with torch.no_grad():
            preds = []
            for _ in range(n_samples):
                noise = torch.randn_like(x) * self.sigma
                preds.append(self.base(x + noise).argmax())
            
            # 统计计数
            counts = torch.bincount(torch.tensor(preds), minlength=self.n_classes)
            top_class = counts.argmax().item()
            p_lower = (counts[top_class].item() - norm.ppf(1 - alpha/2)) / n_samples
            
            radius = self.sigma * norm.ppf(max(p_lower, 0.5)) if p_lower > 0.5 else 0.0
            
            return top_class, radius

Smoothed-LP认证

Salman等人(2019)提出在平滑分类器上应用LP认证:8

def smoothed_lp_certify(model, x, y, sigma=0.25, n_samples=10000):
    """
    在随机平滑分类器上应用LP认证。
    """
    # 获取预测
    pred, p_lower = monte_carlo_predict(model, x, sigma, n_samples)
    
    # 认证半径
    if pred != y:
        return pred, 0.0  # 错误预测无法认证
    
    certified_radius = sigma * norm.ppf(p_lower)
    return pred, certified_radius

认证方法的比较

方法认证精度计算效率适用范围
随机平滑任意分类器
IBP低-中全层网络
CROWN全层网络
Fast-Lin全层网络
LP认证小网络

最新进展

PROSAC(AAAI 2025)

Feng等人(2025)提出PROSAC,提供统计意义上可证明的安全认证:9

“We introduce the notion of (α,ζ)-safe ML model and propose hypothesis testing to derive statistical guarantees.”

def prosac_certification(model, x, y, calibration_set, 
                        alpha=0.05, zeta=0.05):
    """
    PROSAC安全认证。
    
    Returns:
        is_safe: 是否满足(α, ζ)-安全条件
    """
    # 在校准集上评估对抗风险
    adversarial_risks = evaluate_adversarial_risk(
        model, calibration_set, alpha
    )
    
    # 统计检验
    is_safe = statistical_test(
        adversarial_risks, 
        threshold=alpha,
        confidence=1-zeta
    )
    
    return is_safe

实际应用建议

何时使用认证方法

  1. 高安全需求:自动驾驶、医疗诊断等
  2. 形式化验证需求:需要数学证明的场景
  3. 部署前验证:确保模型满足安全约束

计算权衡

def choose_certification_method(threat_model, compute_budget):
    """
    根据威胁模型和计算预算选择认证方法。
    """
    if threat_model == "random_noise":
        return RandomizedSmoothing()
    elif threat_model == "L_inf" and compute_budget == "high":
        return CROWN()
    elif threat_model == "L_inf" and compute_budget == "low":
        return IBP()
    else:
        return FastLin()

本章小结

认证鲁棒性提供可证明的鲁棒性保证:

  1. 随机平滑:通过Monte Carlo采样提供认证半径
  2. 神经网络认证:IBP、CROWN、Fast-Lin等方法
  3. 认证训练:IBP正则化等可证明防御
  4. 实践应用:根据场景选择合适方法
  5. 最新进展:PROSAC统计认证

参考文献

Footnotes

  1. Wong, E., & Kolter, Z. (2018). Provable Defenses against Adversarial Examples. ICML 2018.

  2. Cohen, J. M., et al. (2019). Certified Adversarial Robustness via Randomized Smoothing. ICML 2019.

  3. Abadia, M., et al. (2022). Medoid Smoothing: Efficient Renewal of Randomized Smoothing. ICML 2022.

  4. Hein, M., & Andriushchenko, M. (2017). Formal Guarantees on the Robustness of Classifiers. NeurIPS 2017.

  5. Zhang, H., et al. (2018). Efficient Neural Network Robustness Certification. NeurIPS 2018.

  6. Weng, L., et al. (2018). Towards Fast Computation of Certified Robustness. ICLR 2018.

  7. Mirman, M., et al. (2018). Differentiable Abstract Interpretation. ICML 2018.

  8. Salman, H., et al. (2019). A Convex Relaxation Barrier to Certified Robustness. NeurIPS 2019.

  9. Feng, C., et al. (2025). PROSAC: Provably Safe Certification for ML Models Under Adversarial Attacks. AAAI 2025.