认证鲁棒性

引言

对抗训练可以提升模型的鲁棒性，但它只提供**经验（empirical）**的鲁棒性保证。我们无法确定模型是否对某个输入真正鲁棒。**认证鲁棒性（Certified Robustness）旨在提供可证明（provable/certified）**的鲁棒性下界。¹

认证鲁棒性的定义

数学形式化

对于输入 $x$ 和扰动半径 $ϵ$ ，如果存在一个认证半径 $R \geq ϵ$ ，使得：

\forall δ : ∥ δ ∥ \leq R \Rightarrow f (x + δ) = f (x)

则模型在 $x$ 处是 $(R, f (x))$ -鲁棒的。

认证 vs 经验鲁棒性

类型	描述	保证强度
经验鲁棒性	对特定攻击（如PGD）的攻击成功率	弱
认证鲁棒性	对所有扰动的通用下界	强

随机平滑（Randomized Smoothing）

核心思想

Cohen等人（2019）提出的随机平滑是认证鲁棒性的里程碑方法：²

将分类器 $f$ 通过随机噪声转化为平滑分类器 $g$ ：

g (x) = ar g c max P_{ϵ \sim N (0, σ^{2} I)} [f (x + ϵ) = c]

高斯平滑的认证保证

核心定理：设 $x$ 为输入， $c_{A} = g (x)$ 为平滑后的预测类别， $\underline{p_{A}}$ 为类别 $c_{A}$ 的下界概率，则模型在 $x$ 处具有认证鲁棒半径：

R = \frac{σ}{2} \cdot Φ^{- 1} (\underline{p_{A}})

其中 $Φ^{- 1}$ 是标准正态分布的逆CDF。

认证半径推导

对于二分类情况（类别 $A$ vs $B$ ）：

P (f (x + ϵ) = A) \geq p_{A}, P (f (x + ϵ) = B) \geq p_{B}

若 $p_{A} > p_{B}$ ，则认证半径为：

R = \frac{σ}{2} \cdot (Φ^{- 1} (p_{A}) - Φ^{- 1} (p_{B}))

实现代码

import torch
from scipy.stats import norm
 
def certified_accuracy(model, x, y, n_samples=1000, sigma=0.25, alpha=0.001):
    """
    随机平滑认证鲁棒性评估。
    
    Args:
        model: 基分类器
        x: 输入图像
        y: 真实标签
        n_samples: 采样数量
        sigma: 噪声标准差
        alpha: 置信度参数
    
    Returns:
        predicted_class, certified_radius, clean_acc
    """
    with torch.no_grad():
        # Monte Carlo采样
        epsilon = torch.randn(n_samples, *x.shape) * sigma
        x_noisy = (x + epsilon).clamp(0, 1)
        
        # 批量推理
        logits = model(x_noisy)
        counts = logits.argmax(dim=-1).bincount()
        
        # 获取预测类别和概率下界
        predicted_class = counts.argmax().item()
        p_lower = (counts[predicted_class].item() - norm.ppf(1 - alpha/2)) / n_samples
        
        # 计算认证半径
        if p_lower > 0.5:
            certified_radius = sigma * norm.ppf(p_lower)
        else:
            certified_radius = 0.0
    
    # 干净样本准确率
    clean_pred = model(x.unsqueeze(0)).argmax().item()
    clean_acc = (clean_pred == y)
    
    return predicted_class, certified_radius, clean_acc

Abadia等人（2022）改进

Abadia等人提出Medoid Smoothing，使用中位数而非多数投票，提高效率：³

def medoid_smoothing(model, x, n_samples=100, sigma=0.25):
    """Medoid平滑：返回到最近多数类。"""
    with torch.no_grad():
        # 采样
        epsilon = torch.randn(n_samples, *x.shape) * sigma
        x_noisy = (x + epsilon).clamp(0, 1)
        
        # 提取特征
        features = model.extract_features(x_noisy)
        
        # 计算Medoid
        medoid_idx = medoid(features)
        return model.classify(features[medoid_idx:medoid_idx+1])

神经网络认证方法

1. 逐层上界分析（IBP）

Hein等人提出的区间边界传播（Interval Bound Propagation）：⁴

对于第 $l$ 层：

\underline{z}^{(l)} = max (W^{(l)} \cdot \underline{z}^{(l - 1)} + b^{(l)}, 0)

\overline{z}^{(l)} = min (W^{(l)} \cdot \overline{z}^{(l - 1)} + b^{(l)}, 1)

CROWN认证器

Zhang等人提出的CROWN方法利用线性松弛：⁵

def crown_bounds(model, x, epsilon, layers_to_check):
    """
    CROWN线性边界传播。
    """
    lb = x - epsilon  # 下界
    ub = x + epsilon  # 上界
    
    # 逐层传播
    for layer in model.layers[:layers_to_check]:
        W, b = layer.weight, layer.bias
        
        # 线性松弛
        if isinstance(layer, torch.nn.ReLU):
            # ReLU的CROWN松弛
            alpha_lb, alpha_ub = compute_relu_relaxation(lb, ub)
            W_new = W * alpha_ub
            b_new = b + W * alpha_lb * (lb < 0)
        else:
            W_new = W
            b_new = b
        
        lb = lb @ W_new.t() + b_new
        ub = ub @ W_new.t() + b_new
    
    return lb, ub

Fast-Lin / Fast-Lip

Weng等人提出的高效Lipschitz常数估计方法：⁶

def fast_lip_bound(model, x, epsilon):
    """
    快速Lipschitz边界估计。
    """
    lipschitz_product = 1.0
    
    for layer in model.layers:
        if isinstance(layer, torch.nn.Linear):
            # 更新乘积
            lipschitz_product *= layer.weight.norm(p=2)
        elif isinstance(layer, torch.nn.Conv2d):
            # 卷积层谱范数
            lipschitz_product *= compute_conv spectral_norm(layer)
        elif isinstance(layer, torch.nn.ReLU):
            # ReLU: Lipschitz上界为1
            pass
    
    return lipschitz_product

可证明防御方法

1. IBP训练

使用IBP边界作为正则化项：⁷

def ibp_loss(model, x, y, epsilon, lambda_ibp=1.0):
    """
    基于IBP的对抗训练损失。
    """
    # 标准交叉熵
    ce_loss = F.cross_entropy(model(x), y)
    
    # IBP边界损失
    lb, ub = crown_bounds(model, x, epsilon)
    ibp_loss = F.cross_entropy(model(ub), y) + F.cross_entropy(model(lb), y)
    
    return ce_loss + lambda_ibp * ibp_loss

2. smoothed classifiers

基于随机平滑的防御：

class SmoothedClassifier:
    def __init__(self, base_classifier, sigma=0.25, n_classes=10):
        self.base = base_classifier
        self.sigma = sigma
        self.n_classes = n_classes
    
    def predict(self, x, n_samples=100):
        """返回平滑后的预测。"""
        with torch.no_grad():
            logits = torch.zeros(self.n_classes)
            for _ in range(n_samples):
                noise = torch.randn_like(x) * self.sigma
                pred = self.base(x + noise).argmax()
                logits[pred] += 1
            return logits.argmax().item()
    
    def certify(self, x, n_samples=100, alpha=0.001):
        """返回预测和认证半径。"""
        with torch.no_grad():
            preds = []
            for _ in range(n_samples):
                noise = torch.randn_like(x) * self.sigma
                preds.append(self.base(x + noise).argmax())
            
            # 统计计数
            counts = torch.bincount(torch.tensor(preds), minlength=self.n_classes)
            top_class = counts.argmax().item()
            p_lower = (counts[top_class].item() - norm.ppf(1 - alpha/2)) / n_samples
            
            radius = self.sigma * norm.ppf(max(p_lower, 0.5)) if p_lower > 0.5 else 0.0
            
            return top_class, radius

Smoothed-LP认证

Salman等人（2019）提出在平滑分类器上应用LP认证：⁸

def smoothed_lp_certify(model, x, y, sigma=0.25, n_samples=10000):
    """
    在随机平滑分类器上应用LP认证。
    """
    # 获取预测
    pred, p_lower = monte_carlo_predict(model, x, sigma, n_samples)
    
    # 认证半径
    if pred != y:
        return pred, 0.0  # 错误预测无法认证
    
    certified_radius = sigma * norm.ppf(p_lower)
    return pred, certified_radius

认证方法的比较

方法	认证精度	计算效率	适用范围
随机平滑	高	中	任意分类器
IBP	低-中	高	全层网络
CROWN	高	低	全层网络
Fast-Lin	中	高	全层网络
LP认证	高	低	小网络

实际应用建议

何时使用认证方法

高安全需求：自动驾驶、医疗诊断等
形式化验证需求：需要数学证明的场景
部署前验证：确保模型满足安全约束

计算权衡

def choose_certification_method(threat_model, compute_budget):
    """
    根据威胁模型和计算预算选择认证方法。
    """
    if threat_model == "random_noise":
        return RandomizedSmoothing()
    elif threat_model == "L_inf" and compute_budget == "high":
        return CROWN()
    elif threat_model == "L_inf" and compute_budget == "low":
        return IBP()
    else:
        return FastLin()

本章小结

认证鲁棒性提供可证明的鲁棒性保证：

随机平滑：通过Monte Carlo采样提供认证半径
神经网络认证：IBP、CROWN、Fast-Lin等方法
认证训练：IBP正则化等可证明防御
实践应用：根据场景选择合适方法
最新进展：PROSAC统计认证

参考文献

Wong, E., & Kolter, Z. (2018). Provable Defenses against Adversarial Examples. ICML 2018. ↩
Cohen, J. M., et al. (2019). Certified Adversarial Robustness via Randomized Smoothing. ICML 2019. ↩
Abadia, M., et al. (2022). Medoid Smoothing: Efficient Renewal of Randomized Smoothing. ICML 2022. ↩
Hein, M., & Andriushchenko, M. (2017). Formal Guarantees on the Robustness of Classifiers. NeurIPS 2017. ↩
Zhang, H., et al. (2018). Efficient Neural Network Robustness Certification. NeurIPS 2018. ↩
Weng, L., et al. (2018). Towards Fast Computation of Certified Robustness. ICLR 2018. ↩
Mirman, M., et al. (2018). Differentiable Abstract Interpretation. ICML 2018. ↩
Salman, H., et al. (2019). A Convex Relaxation Barrier to Certified Robustness. NeurIPS 2019. ↩
Feng, C., et al. (2025). PROSAC: Provably Safe Certification for ML Models Under Adversarial Attacks. AAAI 2025. ↩

Metaphor

探索

认证鲁棒性

认证鲁棒性

引言

认证鲁棒性的定义

数学形式化

认证 vs 经验鲁棒性

随机平滑（Randomized Smoothing）

核心思想

高斯平滑的认证保证

认证半径推导

实现代码

Abadia等人（2022）改进

神经网络认证方法

1. 逐层上界分析（IBP）

CROWN认证器

Fast-Lin / Fast-Lip

可证明防御方法

1. IBP训练

2. smoothed classifiers

Smoothed-LP认证

认证方法的比较

最新进展

PROSAC（AAAI 2025）

实际应用建议

何时使用认证方法

计算权衡

本章小结

参考文献

关系图谱

目录

Metaphor

探索

认证鲁棒性

认证鲁棒性

引言

认证鲁棒性的定义

数学形式化

认证 vs 经验鲁棒性

随机平滑（Randomized Smoothing）

核心思想

高斯平滑的认证保证

认证半径推导

实现代码

Abadia等人（2022）改进

神经网络认证方法

1. 逐层上界分析（IBP）

CROWN认证器

Fast-Lin / Fast-Lip

可证明防御方法

1. IBP训练

2. smoothed classifiers

Smoothed-LP认证

认证方法的比较

最新进展

PROSAC（AAAI 2025）

实际应用建议

何时使用认证方法

计算权衡

本章小结

参考文献

Footnotes

关系图谱

目录