贝叶斯神经网络的不确定性量化

概述

深度学习模型在现实世界应用中面临一个关键挑战：如何可靠地估计模型对预测的置信度。不确定性量化（Uncertainty Quantification）旨在为深度学习模型提供概率性的预测，使模型能够识别其自身的局限性——比如遇到分布外（Out-of-Distribution, OOD）数据时主动”承认不知道”。¹

不确定性量化在自动驾驶、医疗诊断、科学预测等安全关键应用中至关重要。

不确定性的分类

1. 任意不确定性（Aleatoric Uncertainty）

定义：数据本身固有的随机性，无法通过更多数据消除。

特点：

源于数据收集过程中的观测噪声
固有的统计波动
不会随训练数据增加而减少

示例：

传感器测量噪声
标注过程中的主观差异
数据本身的随机性

建模方式：在网络输出层建模方差

# 同方差不确定性：所有输入共享相同方差
output_mean, log_var = model(x)
var = torch.exp(log_var)
 
# 异方差不确定性：方差依赖于输入
mean, log_var = model(x)  # log_var 也是输入的函数
var = torch.exp(log_var)

2. 认知不确定性（Epistemic Uncertainty）

定义：模型参数的不确定性，源于训练数据的不足。

特点：

可以通过增加训练数据来减少
反映了模型对其预测的”知识”程度
当模型遇到OOD数据时会变大

建模方式：对网络参数使用分布而非点估计

3. 总不确定性分解

预测的总不确定性可以分解为：

\underbrace{\mathbb{V}[y \mid x]}_{\text{总不确定性}} = \underbrace{\mathbb{E}[\sigma^2(y \mid x, \theta)]}_{\text{任意不确定性}} + \underbrace{\mathbb{V}[\mathbb{E}[y \mid x, \theta)]]_{\text{认知不确定性}}

贝叶斯神经网络（BNN）

形式化定义

贝叶斯神经网络将网络权重视为随机变量：

p (w ∣ D) = \frac{p ( D ∣ w ) p ( w )}{p ( D )}

预测时，对所有可能的权重进行积分：

p (y ∣ x, D) = \int p (y ∣ x, w) p (w ∣ D) d w

预测均值与方差

E [y ∣ x] = \int f (x; w) p (w ∣ D) d w

V [y ∣ x] = \int ∥ f (x; w) - E [y ∣ x] ∥^{2} p (w ∣ D) d w + σ^{2}

近似推断方法

精确贝叶斯推断在高维神经网络中不可行，需要近似方法。

1. MC Dropout

由Gal & Ghahramani (2016)提出，是一种简单而有效的贝叶斯近似方法。²

理论基础

Dropout可以解释为变分推断的一种形式。Dropout网络最小化变分自由能：

F (θ) \approx - n = 1 \sum N lo g p (y^{(n)} ∣ x^{(n)}, θ) + \frac{1 - π}{π} i \sum θ_{i}^{2}

MC Dropout推断

def mc_dropout_predict(model, x, num_samples=50, dropout_prob=0.5):
    """
    MC Dropout预测
    """
    model.train()  # 开启dropout
    
    predictions = []
    for _ in range(num_samples):
        with torch.no_grad():
            y = model(x)
            predictions.append(y)
    
    predictions = torch.stack(predictions)  # [num_samples, batch, output_dim]
    
    # 预测均值
    mean = predictions.mean(dim=0)
    
    # 预测方差（认知不确定性）
    variance = predictions.var(dim=0)
    
    return mean, variance, predictions

不确定性估计

def estimate_uncertainty(model, x, y_true=None, num_samples=50):
    mean, variance, predictions = mc_dropout_predict(model, x, num_samples)
    
    # 认知不确定性
    epistemic = variance
    
    # 如果需要计算测试log似然
    if y_true is not None:
        # 边缘化预测分布
        log_likelihood = torch.distributions.Normal(mean, variance.sqrt()).log_prob(y_true).mean()
        return {
            'mean': mean,
            'epistemic_uncertainty': epistemic,
            'test_log_likelihood': log_likelihood
        }
    
    return {
        'mean': mean,
        'epistemic_uncertainty': epistemic
    }

MC Dropout的局限性

近似质量：Dropout近似可能不准确
训练-推断不一致：训练时使用Dropout，测试时也必须使用
方差估计：往往低估真实不确定性

2. 深度集成（Deep Ensembles）

Lakshminarayanan et al. (2017)提出使用多个独立训练的模型来估计不确定性。³

方法

训练 $M$ 个独立模型，每个模型初始化不同：

class EnsembleModel(nn.Module):
    def __init__(self, base_model, num_models=5):
        super().__init__()
        self.models = nn.ModuleList([
            copy.deepcopy(base_model) for _ in range(num_models)
        ])
    
    def forward(self, x):
        predictions = [model(x) for model in self.models]
        return torch.stack(predictions)
    
    def predict(self, x):
        preds = self.forward(x)  # [num_models, batch, output]
        
        mean = preds.mean(dim=0)
        variance = preds.var(dim=0)
        
        return mean, variance

不确定性分解

def ensemble_uncertainty(predictions, y_true=None):
    """
    深度集成的完整不确定性量化
    predictions: [num_models, batch_size, output_dim]
    """
    # 预测均值
    mean = predictions.mean(dim=0)
    
    # 认知不确定性（模型间方差）
    epistemic = predictions.var(dim=0)
    
    # 预测方差（加权平均）
    avg_pred_var = torch.distributions.Normal(
        predictions.mean(dim=0), 
        predictions.std(dim=0)
    ).variance
    
    # 总不确定性
    total_uncertainty = avg_pred_var + epistemic
    
    if y_true is not None:
        # 测试NLL
        nll = -torch.distributions.Normal(mean, torch.sqrt(total_uncertainty)).log_prob(y_true).mean()
        
        return {
            'mean': mean,
            'total_uncertainty': total_uncertainty,
            'epistemic': epistemic,
            'aleatoric': avg_pred_var,
            'test_nll': nll
        }
    
    return {
        'mean': mean,
        'total_uncertainty': total_uncertainty,
        'epistemic': epistemic
    }

3. 多样性诱导方法

集成效果的关键在于模型的多样性。

随机权重初始化

def train_diverse_ensembles(model_class, train_loader, num_models=5):
    ensembles = []
    for i in range(num_models):
        # 不同随机种子
        torch.manual_seed(42 + i * 17)
        torch.cuda.manual_seed(42 + i * 17)
        
        model = model_class()
        # 使用不同的初始化
        
        # 训练
        train_model(model, train_loader, epochs=100)
        ensembles.append(model)
    
    return ensembles

数据增强多样性

每个子模型使用不同的数据增强策略：

class AugmentedEnsemble:
    def __init__(self, augmentations_list):
        self.augmentations = augmentations_list
    
    def train(self, model, x, y, model_idx):
        # 对应子模型使用特定的增强
        aug = self.augmentations[model_idx]
        x_aug = aug(x)
        # 训练...

不确定性评估指标

1. 斯皮尔曼等级相关系数

衡量不确定性与误差之间的相关性：

ρ = 1 - \frac{6 \sum d _{i}^{2}}{n ( n ^{2} - 1 )}

其中 $d_{i}$ 是第 $i$ 个样本的预测误差排名与不确定性排名的差。

2. 分布外检测

使用不确定性作为OOD检测指标：

def ood_detection(model, id_data, ood_data, method='softmax'):
    """
    使用不确定性进行分布外检测
    """
    if method == 'softmax':
        # 基于最大softmax概率
        id_unc = 1 - get_softmax_probs(model, id_data).max(dim=-1)[0]
        ood_unc = 1 - get_softmax_probs(model, ood_data).max(dim=-1)[0]
    elif method == 'epistemic':
        # 基于认知不确定性
        _, id_unc = mc_dropout_predict(model, id_data, num_samples=50)
        _, ood_unc = mc_dropout_predict(model, ood_data, num_samples=50)
        id_unc = id_unc.mean(dim=-1)
        ood_unc = ood_unc.mean(dim=-1)
    
    # 计算AUROC
    labels = torch.cat([torch.zeros(len(id_unc)), torch.ones(len(ood_unc))])
    scores = torch.cat([id_unc, ood_unc])
    
    auroc = compute_auroc(labels, scores)
    return auroc

3. 校准曲线

评估预测概率与实际准确率的一致性：

def plot_calibration_curve(model, data_loader, num_bins=10):
    """
    绘制可靠性图（Reliability Diagram）
    """
    confidences = []
    accuracies = []
    
    for x, y in data_loader:
        probs = torch.softmax(model(x), dim=-1)
        max_probs = probs.max(dim=-1)[0]
        preds = probs.argmax(dim=-1)
        
        confidences.extend(max_probs.cpu().numpy())
        accuracies.extend((preds == y).cpu().numpy())
    
    # 分箱
    bins = np.linspace(0, 1, num_bins + 1)
    bin_indices = np.digitize(confidences, bins) - 1
    
    bin_confidences = []
    bin_accuracies = []
    
    for i in range(num_bins):
        mask = bin_indices == i
        if mask.sum() > 0:
            bin_confidences.append(np.mean(np.array(confidences)[mask]))
            bin_accuracies.append(np.mean(np.array(accuracies)[mask]))
    
    # 绘制校准曲线
    plt.figure(figsize=(8, 8))
    plt.plot([0, 1], [0, 1], 'k--', label='Perfect calibration')
    plt.plot(bin_confidences, bin_accuracies, 'o-', label='Model')
    plt.xlabel('Confidence')
    plt.ylabel('Accuracy')
    plt.legend()
    plt.title('Calibration Curve')

不确定性在主动学习中的应用

Bayesian Active Learning

使用不确定性来选择最有价值的标注样本：

def bayesian_active_learning(model, unlabeled_pool, batch_size=10, num_samples=50):
    """
    基于不确定性的主动学习
    """
    uncertainties = []
    
    for x in unlabeled_pool:
        _, variance = mc_dropout_predict(model, x.unsqueeze(0), num_samples)
        uncertainty = variance.mean()  # 总不确定性
        uncertainties.append(uncertainty.item())
    
    # 选择不确定性最高的样本
    selected_indices = np.argsort(uncertainties)[-batch_size:]
    
    return selected_indices, np.array(uncertainties)[selected_indices]

Bayesian Query By Committee

使用多个模型（委员会）的一致性来选择样本：

def query_by_committee(models, unlabeled_pool, batch_size=10):
    """
    Query By Committee (QBC) 主动学习
    """
    all_predictions = []
    
    for model in models:
        preds = torch.stack([torch.softmax(model(x.unsqueeze(0)), dim=-1) 
                           for x in unlabeled_pool])
        all_predictions.append(preds)
    
    all_predictions = torch.stack(all_predictions)  # [num_models, pool_size, num_classes]
    
    # 计算委员会成员间的分歧
    mean_pred = all_predictions.mean(dim=0)
    kl_divs = torch.distributions.kl_divergence(
        torch.distributions.Categorical(probs=mean_pred),
        torch.distributions.Categorical(probs=all_predictions)
    ).mean(dim=-1)
    
    selected_indices = kl_divs.topk(batch_size)[1]
    
    return selected_indices, kl_divs[selected_indices]

安全关键应用中的不确定性

自动驾驶

class UncertaintyAwareDetector:
    def __init__(self, detector, uncertainty_threshold=0.3):
        self.detector = detector
        self.threshold = uncertainty_threshold
    
    def detect(self, image):
        mean, uncertainty = mc_dropout_predict(self.detector, image)
        
        # 高不确定性时触发安全策略
        if uncertainty.mean() > self.threshold:
            # 减速或请求人类接管
            return {
                'detections': mean,
                'uncertainty': uncertainty,
                'action': 'reduce_speed_or_takeover'
            }
        
        return {
            'detections': mean,
            'uncertainty': uncertainty,
            'action': 'continue'
        }

医学诊断

class BayesianMedicalClassifier:
    def __init__(self, model):
        self.model = model
    
    def diagnose(self, patient_data):
        mean, variance = mc_dropout_predict(self.model, patient_data)
        
        # 返回诊断结果和置信区间
        std = torch.sqrt(variance)
        
        return {
            'diagnosis': mean.argmax(dim=-1),
            'probability': torch.softmax(mean, dim=-1),
            'confidence_interval': (mean - 1.96*std, mean + 1.96*std),
            'high_uncertainty': variance.mean() > 0.1
        }

方法对比

方法	认知不确定性	任意不确定性	计算成本	实现复杂度
MC Dropout	✓	需要修改	低	低
深度集成	✓	需要修改	中-高	低
贝叶斯推断	✓	✓	高	高
SWAG	✓	需要修改	中	中
MC Dropout + 异方差	✓	✓	低	中

Metaphor

探索