概述

深度学习模型在现实世界应用中面临一个关键挑战:如何可靠地估计模型对预测的置信度。不确定性量化(Uncertainty Quantification)旨在为深度学习模型提供概率性的预测,使模型能够识别其自身的局限性——比如遇到分布外(Out-of-Distribution, OOD)数据时主动”承认不知道”。1

不确定性量化在自动驾驶、医疗诊断、科学预测等安全关键应用中至关重要。


不确定性的分类

1. 任意不确定性(Aleatoric Uncertainty)

定义:数据本身固有的随机性,无法通过更多数据消除。

特点

  • 源于数据收集过程中的观测噪声
  • 固有的统计波动
  • 不会随训练数据增加而减少

示例

  • 传感器测量噪声
  • 标注过程中的主观差异
  • 数据本身的随机性

建模方式:在网络输出层建模方差

# 同方差不确定性:所有输入共享相同方差
output_mean, log_var = model(x)
var = torch.exp(log_var)
 
# 异方差不确定性:方差依赖于输入
mean, log_var = model(x)  # log_var 也是输入的函数
var = torch.exp(log_var)

2. 认知不确定性(Epistemic Uncertainty)

定义:模型参数的不确定性,源于训练数据的不足。

特点

  • 可以通过增加训练数据来减少
  • 反映了模型对其预测的”知识”程度
  • 当模型遇到OOD数据时会变大

建模方式:对网络参数使用分布而非点估计

3. 总不确定性分解

预测的总不确定性可以分解为:

\underbrace{\mathbb{V}[y \mid x]}_{\text{总不确定性}} = \underbrace{\mathbb{E}[\sigma^2(y \mid x, \theta)]}_{\text{任意不确定性}} + \underbrace{\mathbb{V}[\mathbb{E}[y \mid x, \theta)]]_{\text{认知不确定性}}

贝叶斯神经网络(BNN)

形式化定义

贝叶斯神经网络将网络权重视为随机变量:

预测时,对所有可能的权重进行积分:

预测均值与方差


近似推断方法

精确贝叶斯推断在高维神经网络中不可行,需要近似方法。

1. MC Dropout

由Gal & Ghahramani (2016)提出,是一种简单而有效的贝叶斯近似方法。2

理论基础

Dropout可以解释为变分推断的一种形式。Dropout网络最小化变分自由能:

MC Dropout推断

def mc_dropout_predict(model, x, num_samples=50, dropout_prob=0.5):
    """
    MC Dropout预测
    """
    model.train()  # 开启dropout
    
    predictions = []
    for _ in range(num_samples):
        with torch.no_grad():
            y = model(x)
            predictions.append(y)
    
    predictions = torch.stack(predictions)  # [num_samples, batch, output_dim]
    
    # 预测均值
    mean = predictions.mean(dim=0)
    
    # 预测方差(认知不确定性)
    variance = predictions.var(dim=0)
    
    return mean, variance, predictions

不确定性估计

def estimate_uncertainty(model, x, y_true=None, num_samples=50):
    mean, variance, predictions = mc_dropout_predict(model, x, num_samples)
    
    # 认知不确定性
    epistemic = variance
    
    # 如果需要计算测试log似然
    if y_true is not None:
        # 边缘化预测分布
        log_likelihood = torch.distributions.Normal(mean, variance.sqrt()).log_prob(y_true).mean()
        return {
            'mean': mean,
            'epistemic_uncertainty': epistemic,
            'test_log_likelihood': log_likelihood
        }
    
    return {
        'mean': mean,
        'epistemic_uncertainty': epistemic
    }

MC Dropout的局限性

  1. 近似质量:Dropout近似可能不准确
  2. 训练-推断不一致:训练时使用Dropout,测试时也必须使用
  3. 方差估计:往往低估真实不确定性

2. 深度集成(Deep Ensembles)

Lakshminarayanan et al. (2017)提出使用多个独立训练的模型来估计不确定性。3

方法

训练 个独立模型,每个模型初始化不同:

class EnsembleModel(nn.Module):
    def __init__(self, base_model, num_models=5):
        super().__init__()
        self.models = nn.ModuleList([
            copy.deepcopy(base_model) for _ in range(num_models)
        ])
    
    def forward(self, x):
        predictions = [model(x) for model in self.models]
        return torch.stack(predictions)
    
    def predict(self, x):
        preds = self.forward(x)  # [num_models, batch, output]
        
        mean = preds.mean(dim=0)
        variance = preds.var(dim=0)
        
        return mean, variance

不确定性分解

def ensemble_uncertainty(predictions, y_true=None):
    """
    深度集成的完整不确定性量化
    predictions: [num_models, batch_size, output_dim]
    """
    # 预测均值
    mean = predictions.mean(dim=0)
    
    # 认知不确定性(模型间方差)
    epistemic = predictions.var(dim=0)
    
    # 预测方差(加权平均)
    avg_pred_var = torch.distributions.Normal(
        predictions.mean(dim=0), 
        predictions.std(dim=0)
    ).variance
    
    # 总不确定性
    total_uncertainty = avg_pred_var + epistemic
    
    if y_true is not None:
        # 测试NLL
        nll = -torch.distributions.Normal(mean, torch.sqrt(total_uncertainty)).log_prob(y_true).mean()
        
        return {
            'mean': mean,
            'total_uncertainty': total_uncertainty,
            'epistemic': epistemic,
            'aleatoric': avg_pred_var,
            'test_nll': nll
        }
    
    return {
        'mean': mean,
        'total_uncertainty': total_uncertainty,
        'epistemic': epistemic
    }

3. 多样性诱导方法

集成效果的关键在于模型的多样性。

随机权重初始化

def train_diverse_ensembles(model_class, train_loader, num_models=5):
    ensembles = []
    for i in range(num_models):
        # 不同随机种子
        torch.manual_seed(42 + i * 17)
        torch.cuda.manual_seed(42 + i * 17)
        
        model = model_class()
        # 使用不同的初始化
        
        # 训练
        train_model(model, train_loader, epochs=100)
        ensembles.append(model)
    
    return ensembles

数据增强多样性

每个子模型使用不同的数据增强策略:

class AugmentedEnsemble:
    def __init__(self, augmentations_list):
        self.augmentations = augmentations_list
    
    def train(self, model, x, y, model_idx):
        # 对应子模型使用特定的增强
        aug = self.augmentations[model_idx]
        x_aug = aug(x)
        # 训练...

不确定性评估指标

1. 斯皮尔曼等级相关系数

衡量不确定性与误差之间的相关性:

其中 是第 个样本的预测误差排名与不确定性排名的差。

2. 分布外检测

使用不确定性作为OOD检测指标:

def ood_detection(model, id_data, ood_data, method='softmax'):
    """
    使用不确定性进行分布外检测
    """
    if method == 'softmax':
        # 基于最大softmax概率
        id_unc = 1 - get_softmax_probs(model, id_data).max(dim=-1)[0]
        ood_unc = 1 - get_softmax_probs(model, ood_data).max(dim=-1)[0]
    elif method == 'epistemic':
        # 基于认知不确定性
        _, id_unc = mc_dropout_predict(model, id_data, num_samples=50)
        _, ood_unc = mc_dropout_predict(model, ood_data, num_samples=50)
        id_unc = id_unc.mean(dim=-1)
        ood_unc = ood_unc.mean(dim=-1)
    
    # 计算AUROC
    labels = torch.cat([torch.zeros(len(id_unc)), torch.ones(len(ood_unc))])
    scores = torch.cat([id_unc, ood_unc])
    
    auroc = compute_auroc(labels, scores)
    return auroc

3. 校准曲线

评估预测概率与实际准确率的一致性:

def plot_calibration_curve(model, data_loader, num_bins=10):
    """
    绘制可靠性图(Reliability Diagram)
    """
    confidences = []
    accuracies = []
    
    for x, y in data_loader:
        probs = torch.softmax(model(x), dim=-1)
        max_probs = probs.max(dim=-1)[0]
        preds = probs.argmax(dim=-1)
        
        confidences.extend(max_probs.cpu().numpy())
        accuracies.extend((preds == y).cpu().numpy())
    
    # 分箱
    bins = np.linspace(0, 1, num_bins + 1)
    bin_indices = np.digitize(confidences, bins) - 1
    
    bin_confidences = []
    bin_accuracies = []
    
    for i in range(num_bins):
        mask = bin_indices == i
        if mask.sum() > 0:
            bin_confidences.append(np.mean(np.array(confidences)[mask]))
            bin_accuracies.append(np.mean(np.array(accuracies)[mask]))
    
    # 绘制校准曲线
    plt.figure(figsize=(8, 8))
    plt.plot([0, 1], [0, 1], 'k--', label='Perfect calibration')
    plt.plot(bin_confidences, bin_accuracies, 'o-', label='Model')
    plt.xlabel('Confidence')
    plt.ylabel('Accuracy')
    plt.legend()
    plt.title('Calibration Curve')

不确定性在主动学习中的应用

Bayesian Active Learning

使用不确定性来选择最有价值的标注样本:

def bayesian_active_learning(model, unlabeled_pool, batch_size=10, num_samples=50):
    """
    基于不确定性的主动学习
    """
    uncertainties = []
    
    for x in unlabeled_pool:
        _, variance = mc_dropout_predict(model, x.unsqueeze(0), num_samples)
        uncertainty = variance.mean()  # 总不确定性
        uncertainties.append(uncertainty.item())
    
    # 选择不确定性最高的样本
    selected_indices = np.argsort(uncertainties)[-batch_size:]
    
    return selected_indices, np.array(uncertainties)[selected_indices]

Bayesian Query By Committee

使用多个模型(委员会)的一致性来选择样本:

def query_by_committee(models, unlabeled_pool, batch_size=10):
    """
    Query By Committee (QBC) 主动学习
    """
    all_predictions = []
    
    for model in models:
        preds = torch.stack([torch.softmax(model(x.unsqueeze(0)), dim=-1) 
                           for x in unlabeled_pool])
        all_predictions.append(preds)
    
    all_predictions = torch.stack(all_predictions)  # [num_models, pool_size, num_classes]
    
    # 计算委员会成员间的分歧
    mean_pred = all_predictions.mean(dim=0)
    kl_divs = torch.distributions.kl_divergence(
        torch.distributions.Categorical(probs=mean_pred),
        torch.distributions.Categorical(probs=all_predictions)
    ).mean(dim=-1)
    
    selected_indices = kl_divs.topk(batch_size)[1]
    
    return selected_indices, kl_divs[selected_indices]

安全关键应用中的不确定性

自动驾驶

class UncertaintyAwareDetector:
    def __init__(self, detector, uncertainty_threshold=0.3):
        self.detector = detector
        self.threshold = uncertainty_threshold
    
    def detect(self, image):
        mean, uncertainty = mc_dropout_predict(self.detector, image)
        
        # 高不确定性时触发安全策略
        if uncertainty.mean() > self.threshold:
            # 减速或请求人类接管
            return {
                'detections': mean,
                'uncertainty': uncertainty,
                'action': 'reduce_speed_or_takeover'
            }
        
        return {
            'detections': mean,
            'uncertainty': uncertainty,
            'action': 'continue'
        }

医学诊断

class BayesianMedicalClassifier:
    def __init__(self, model):
        self.model = model
    
    def diagnose(self, patient_data):
        mean, variance = mc_dropout_predict(self.model, patient_data)
        
        # 返回诊断结果和置信区间
        std = torch.sqrt(variance)
        
        return {
            'diagnosis': mean.argmax(dim=-1),
            'probability': torch.softmax(mean, dim=-1),
            'confidence_interval': (mean - 1.96*std, mean + 1.96*std),
            'high_uncertainty': variance.mean() > 0.1
        }

方法对比

方法认知不确定性任意不确定性计算成本实现复杂度
MC Dropout需要修改
深度集成需要修改中-高
贝叶斯推断
SWAG需要修改
MC Dropout + 异方差

最新进展

1. SWAG (Stochastic Weight Averaging Gaussian)

将权重空间的不确定性建模为高斯分布:

class SWAG:
    def __init__(self, model, deviations=[]):
        self.mean = copy.deepcopy(model.state_dict())
        self.deviations = deviations
    
    def collect_weights(self, model, k=20):
        # 收集权重均值和偏差
        pass
    
    def sample(self):
        # 从高斯分布采样权重
        sampled = {}
        for key in self.mean.keys():
            mean = self.mean[key]
            # 采样
            noise = torch.randn_like(mean)
            sampled[key] = mean + noise @ self.deviation_t
        return sampled

2. LUQ (Learning Uncertainty Quantities)

端到端学习不确定性:

class LUQ(nn.Module):
    def __init__(self, backbone):
        super().__init__()
        self.backbone = backbone
        # 共享特征提取器
        self.mean_head = nn.Linear(feature_dim, output_dim)
        self.var_head = nn.Sequential(
            nn.Linear(feature_dim, hidden_dim),
            nn.ReLU(),
            nn.Linear(hidden_dim, output_dim),
            nn.Softplus()  # 确保方差为正
        )

3. Evidential Deep Learning

将不确定性建模为证据分布:

class EvidentialNetwork(nn.Module):
    def forward(self, x):
        outputs = self.backbone(x)
        
        # 狄利克雷参数(证据)
        gamma = torch.softmax(outputs[:, :num_classes], dim=-1)
        log_lambda = outputs[:, num_classes:]
        alpha = gamma * torch.exp(log_lambda) + 1
        
        # 预测分布为Dirichlet
        return {'alpha': alpha}
 
def evidential_loss(y_true, y_pred, alpha, lambd):
    # Evidential回归损失
    lambda_ = y_pred['lambd']
    alpha = y_pred['alpha']
    
    # 数据损失
    data_loss = 0.5 * torch.sum(
        lambda_ * (y_true - gamma)**2 / (alpha - 1),
        dim=-1
    )
    
    # 正则化损失
    reg_loss = torch.sum(
        (alpha - y_true) * (torch.digamma(alpha) - torch.log(alpha - 1)),
        dim=-1
    )
    
    return (data_loss + reg_loss).mean()

参考


相关主题

Footnotes

  1. Kendall & Gal (2017). “What Uncertainties Do We Need in Bayesian Deep Learning for Computer Vision?” NeurIPS 2017.

  2. Gal & Ghahramani (2016). “Dropout as a Bayesian Approximation: Representing Model Uncertainty in Deep Learning”. ICML 2016.

  3. Lakshminarayanan et al. (2017). “Simple and Scalable Predictive Uncertainty Estimation Using Deep Ensembles”. NeurIPS 2017.