认证鲁棒性
引言
对抗训练可以提升模型的鲁棒性,但它只提供**经验(empirical)**的鲁棒性保证。我们无法确定模型是否对某个输入真正鲁棒。**认证鲁棒性(Certified Robustness)旨在提供可证明(provable/certified)**的鲁棒性下界。1
认证鲁棒性的定义
数学形式化
对于输入 和扰动半径 ,如果存在一个认证半径 ,使得:
则模型在 处是 -鲁棒的。
认证 vs 经验鲁棒性
| 类型 | 描述 | 保证强度 |
|---|---|---|
| 经验鲁棒性 | 对特定攻击(如PGD)的攻击成功率 | 弱 |
| 认证鲁棒性 | 对所有扰动的通用下界 | 强 |
随机平滑(Randomized Smoothing)
核心思想
Cohen等人(2019)提出的随机平滑是认证鲁棒性的里程碑方法:2
将分类器 通过随机噪声转化为平滑分类器 :
高斯平滑的认证保证
核心定理:设 为输入, 为平滑后的预测类别, 为类别 的下界概率,则模型在 处具有认证鲁棒半径:
其中 是标准正态分布的逆CDF。
认证半径推导
对于二分类情况(类别 vs ):
若 ,则认证半径为:
实现代码
import torch
from scipy.stats import norm
def certified_accuracy(model, x, y, n_samples=1000, sigma=0.25, alpha=0.001):
"""
随机平滑认证鲁棒性评估。
Args:
model: 基分类器
x: 输入图像
y: 真实标签
n_samples: 采样数量
sigma: 噪声标准差
alpha: 置信度参数
Returns:
predicted_class, certified_radius, clean_acc
"""
with torch.no_grad():
# Monte Carlo采样
epsilon = torch.randn(n_samples, *x.shape) * sigma
x_noisy = (x + epsilon).clamp(0, 1)
# 批量推理
logits = model(x_noisy)
counts = logits.argmax(dim=-1).bincount()
# 获取预测类别和概率下界
predicted_class = counts.argmax().item()
p_lower = (counts[predicted_class].item() - norm.ppf(1 - alpha/2)) / n_samples
# 计算认证半径
if p_lower > 0.5:
certified_radius = sigma * norm.ppf(p_lower)
else:
certified_radius = 0.0
# 干净样本准确率
clean_pred = model(x.unsqueeze(0)).argmax().item()
clean_acc = (clean_pred == y)
return predicted_class, certified_radius, clean_accAbadia等人(2022)改进
Abadia等人提出Medoid Smoothing,使用中位数而非多数投票,提高效率:3
def medoid_smoothing(model, x, n_samples=100, sigma=0.25):
"""Medoid平滑:返回到最近多数类。"""
with torch.no_grad():
# 采样
epsilon = torch.randn(n_samples, *x.shape) * sigma
x_noisy = (x + epsilon).clamp(0, 1)
# 提取特征
features = model.extract_features(x_noisy)
# 计算Medoid
medoid_idx = medoid(features)
return model.classify(features[medoid_idx:medoid_idx+1])神经网络认证方法
1. 逐层上界分析(IBP)
Hein等人提出的区间边界传播(Interval Bound Propagation):4
对于第 层:
CROWN认证器
Zhang等人提出的CROWN方法利用线性松弛:5
def crown_bounds(model, x, epsilon, layers_to_check):
"""
CROWN线性边界传播。
"""
lb = x - epsilon # 下界
ub = x + epsilon # 上界
# 逐层传播
for layer in model.layers[:layers_to_check]:
W, b = layer.weight, layer.bias
# 线性松弛
if isinstance(layer, torch.nn.ReLU):
# ReLU的CROWN松弛
alpha_lb, alpha_ub = compute_relu_relaxation(lb, ub)
W_new = W * alpha_ub
b_new = b + W * alpha_lb * (lb < 0)
else:
W_new = W
b_new = b
lb = lb @ W_new.t() + b_new
ub = ub @ W_new.t() + b_new
return lb, ubFast-Lin / Fast-Lip
Weng等人提出的高效Lipschitz常数估计方法:6
def fast_lip_bound(model, x, epsilon):
"""
快速Lipschitz边界估计。
"""
lipschitz_product = 1.0
for layer in model.layers:
if isinstance(layer, torch.nn.Linear):
# 更新乘积
lipschitz_product *= layer.weight.norm(p=2)
elif isinstance(layer, torch.nn.Conv2d):
# 卷积层谱范数
lipschitz_product *= compute_conv spectral_norm(layer)
elif isinstance(layer, torch.nn.ReLU):
# ReLU: Lipschitz上界为1
pass
return lipschitz_product可证明防御方法
1. IBP训练
使用IBP边界作为正则化项:7
def ibp_loss(model, x, y, epsilon, lambda_ibp=1.0):
"""
基于IBP的对抗训练损失。
"""
# 标准交叉熵
ce_loss = F.cross_entropy(model(x), y)
# IBP边界损失
lb, ub = crown_bounds(model, x, epsilon)
ibp_loss = F.cross_entropy(model(ub), y) + F.cross_entropy(model(lb), y)
return ce_loss + lambda_ibp * ibp_loss2. smoothed classifiers
基于随机平滑的防御:
class SmoothedClassifier:
def __init__(self, base_classifier, sigma=0.25, n_classes=10):
self.base = base_classifier
self.sigma = sigma
self.n_classes = n_classes
def predict(self, x, n_samples=100):
"""返回平滑后的预测。"""
with torch.no_grad():
logits = torch.zeros(self.n_classes)
for _ in range(n_samples):
noise = torch.randn_like(x) * self.sigma
pred = self.base(x + noise).argmax()
logits[pred] += 1
return logits.argmax().item()
def certify(self, x, n_samples=100, alpha=0.001):
"""返回预测和认证半径。"""
with torch.no_grad():
preds = []
for _ in range(n_samples):
noise = torch.randn_like(x) * self.sigma
preds.append(self.base(x + noise).argmax())
# 统计计数
counts = torch.bincount(torch.tensor(preds), minlength=self.n_classes)
top_class = counts.argmax().item()
p_lower = (counts[top_class].item() - norm.ppf(1 - alpha/2)) / n_samples
radius = self.sigma * norm.ppf(max(p_lower, 0.5)) if p_lower > 0.5 else 0.0
return top_class, radiusSmoothed-LP认证
Salman等人(2019)提出在平滑分类器上应用LP认证:8
def smoothed_lp_certify(model, x, y, sigma=0.25, n_samples=10000):
"""
在随机平滑分类器上应用LP认证。
"""
# 获取预测
pred, p_lower = monte_carlo_predict(model, x, sigma, n_samples)
# 认证半径
if pred != y:
return pred, 0.0 # 错误预测无法认证
certified_radius = sigma * norm.ppf(p_lower)
return pred, certified_radius认证方法的比较
| 方法 | 认证精度 | 计算效率 | 适用范围 |
|---|---|---|---|
| 随机平滑 | 高 | 中 | 任意分类器 |
| IBP | 低-中 | 高 | 全层网络 |
| CROWN | 高 | 低 | 全层网络 |
| Fast-Lin | 中 | 高 | 全层网络 |
| LP认证 | 高 | 低 | 小网络 |
最新进展
PROSAC(AAAI 2025)
Feng等人(2025)提出PROSAC,提供统计意义上可证明的安全认证:9
“We introduce the notion of (α,ζ)-safe ML model and propose hypothesis testing to derive statistical guarantees.”
def prosac_certification(model, x, y, calibration_set,
alpha=0.05, zeta=0.05):
"""
PROSAC安全认证。
Returns:
is_safe: 是否满足(α, ζ)-安全条件
"""
# 在校准集上评估对抗风险
adversarial_risks = evaluate_adversarial_risk(
model, calibration_set, alpha
)
# 统计检验
is_safe = statistical_test(
adversarial_risks,
threshold=alpha,
confidence=1-zeta
)
return is_safe实际应用建议
何时使用认证方法
- 高安全需求:自动驾驶、医疗诊断等
- 形式化验证需求:需要数学证明的场景
- 部署前验证:确保模型满足安全约束
计算权衡
def choose_certification_method(threat_model, compute_budget):
"""
根据威胁模型和计算预算选择认证方法。
"""
if threat_model == "random_noise":
return RandomizedSmoothing()
elif threat_model == "L_inf" and compute_budget == "high":
return CROWN()
elif threat_model == "L_inf" and compute_budget == "low":
return IBP()
else:
return FastLin()本章小结
认证鲁棒性提供可证明的鲁棒性保证:
- 随机平滑:通过Monte Carlo采样提供认证半径
- 神经网络认证:IBP、CROWN、Fast-Lin等方法
- 认证训练:IBP正则化等可证明防御
- 实践应用:根据场景选择合适方法
- 最新进展:PROSAC统计认证
参考文献
Footnotes
-
Wong, E., & Kolter, Z. (2018). Provable Defenses against Adversarial Examples. ICML 2018. ↩
-
Cohen, J. M., et al. (2019). Certified Adversarial Robustness via Randomized Smoothing. ICML 2019. ↩
-
Abadia, M., et al. (2022). Medoid Smoothing: Efficient Renewal of Randomized Smoothing. ICML 2022. ↩
-
Hein, M., & Andriushchenko, M. (2017). Formal Guarantees on the Robustness of Classifiers. NeurIPS 2017. ↩
-
Zhang, H., et al. (2018). Efficient Neural Network Robustness Certification. NeurIPS 2018. ↩
-
Weng, L., et al. (2018). Towards Fast Computation of Certified Robustness. ICLR 2018. ↩
-
Mirman, M., et al. (2018). Differentiable Abstract Interpretation. ICML 2018. ↩
-
Salman, H., et al. (2019). A Convex Relaxation Barrier to Certified Robustness. NeurIPS 2019. ↩
-
Feng, C., et al. (2025). PROSAC: Provably Safe Certification for ML Models Under Adversarial Attacks. AAAI 2025. ↩