LLM不确定性量化

概述

大语言模型（LLM）在生成流畅文本的同时，往往表现出过度自信的问题——即使在回答错误或虚构信息时，也给出很高的置信度。这在医疗、法律、金融等关键领域是不可接受的。

LLM不确定性量化旨在：

识别模型”不知道自己不知道什么”
检测幻觉（hallucination）
实现可靠的人机协作
支持主动学习与不确定性感知的决策¹²

本系统介绍LLM不确定性量化的理论基础、方法分类、最新进展和实践指南。

问题背景

LLM的过度自信问题

现象：GPT-4等模型在回答错误问题时，往往以同样的自信语气给出答案。

原因分析：

原因	描述
训练目标	下一个token预测鼓励生成流畅文本，而非表达不确定性
分布偏移	预训练语料与测试分布可能差异很大
校准缺失	MLE训练不对应于概率校准
知识冲突	内部知识与上下文信息可能冲突

不确定性的类型

认知不确定性（Epistemic Uncertainty）：

由于知识不足导致的不确定性
可通过获取更多数据减少
对应”知道自己不知道”

偶然不确定性（Aleatoric Uncertainty）：

数据固有噪声导致的不确定性
无法通过更多数据减少
对应问题的本质模糊性

分布外检测（OOD Detection）：

识别输入是否在训练分布内
是认知不确定性的一个特例

不确定性量化方法分类

方法总览

LLM不确定性量化方法
├── 内部方法（Internal）
│   ├── Token级概率
│   ├── 隐藏状态分析
│   └── 注意力模式
├── 外部方法（External）
│   ├── Prompt工程
│   ├── 自我评估
│   └── 采样方法
└── 贝叶斯方法（Bayesian）
    ├── MC Dropout
    ├── Deep Ensembles
    └── LoRA + Laplace

方法对比

方法类别	优点	缺点	适用场景
Token概率	简单、无需额外计算	容易绕过	快速筛选
自我评估	利用模型自身能力	容易被欺骗	对话系统
MC Dropout	理论基础强	需要修改推理	推理任务
Deep Ensembles	准确性高	计算量大	离线分析
混合方法	综合多种信号	复杂	高风险场景

内部方法

Token级概率

核心思想：使用生成文本的token概率作为置信度指标。

困惑度（Perplexity）：

PPL (x) = exp (- \frac{1}{T} t = 1 \sum T lo g p (x_{t} ∣ x_{< t}))

低困惑度 = 高置信度。

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
 
class TokenProbabilityUncertainty:
    """
    基于Token概率的不确定性量化
    """
    def __init__(self, model_name="meta-llama/Llama-2-7b"):
        self.tokenizer = AutoTokenizer.from_pretrained(model_name)
        self.model = AutoModelForCausalLM.from_pretrained(model_name)
        self.model.eval()
    
    def compute_perplexity(self, text):
        """计算文本的困惑度"""
        inputs = self.tokenizer(text, return_tensors="pt")
        
        with torch.no_grad():
            outputs = self.model(**inputs, labels=inputs["input_ids"])
            loss = outputs.loss.item()
        
        return torch.exp(torch.tensor(loss)).item()
    
    def compute_token_uncertainties(self, text):
        """计算每个token的不确定性"""
        inputs = self.tokenizer(text, return_tensors="pt")
        
        with torch.no_grad():
            outputs = self.model(**inputs)
            logits = outputs.logits
            
            # Token概率
            probs = torch.softmax(logits, dim=-1)
            
            # 每个位置的置信度（top-1概率）
            max_probs = probs[0, :-1].max(dim=-1).values
            tokens = self.tokenizer.convert_ids_to_tokens(inputs["input_ids"][0])
            
            return list(zip(tokens[1:], max_probs.cpu().numpy()))
    
    def detect_hallucination(self, text, threshold=0.2):
        """
        检测幻觉
        
        低置信度token可能是幻觉
        """
        uncertainties = self.compute_token_uncertainties(text)
        
        low_confidence_tokens = [
            (token, prob) 
            for token, prob in uncertainties 
            if prob < threshold
        ]
        
        return {
            'low_confidence_tokens': low_confidence_tokens,
            'avg_confidence': sum(p for _, p in uncertainties) / len(uncertainties),
            'is_hallucination': len(low_confidence_tokens) / len(uncertainties) > 0.3
        }

语义熵（Semantic Entropy）

核心思想：不应该只看token概率，而应该看语义的不确定性。

由Kuhn等人提出（ICLR 2023）：

H_{semantic} (s ∣ x) = - c \in C (s) \sum p (c ∣ x) lo g p (c ∣ x)

其中 $C (s)$ 是与响应 $s$ 语义等价的所有响应集合。

def semantic_entropy(text, model, tokenizer, num_samples=50):
    """
    计算语义熵
    
    通过采样和聚类估计语义不确定性
    """
    # 采样多个回答
    samples = []
    for _ in range(num_samples):
        # 采样生成
        inputs = tokenizer(text, return_tensors="pt")
        with torch.no_grad():
            outputs = model.generate(**inputs, max_new_tokens=50)
        sample_text = tokenizer.decode(outputs[0], skip_special_tokens=True)
        samples.append(sample_text)
    
    # 语义聚类
    embeddings = get_embeddings(samples)  # 需要embedding模型
    
    # 简化的聚类：按句子相似度
    clusters = cluster_texts(samples, embeddings)
    
    # 计算语义熵
    p_c = [len(cluster) / len(samples) for cluster in clusters]
    entropy = -sum(p * np.log(p + 1e-10) for p in p_c if p > 0)
    
    return entropy, clusters

注意力模式分析

核心思想：异常的注意力模式可能指示不确定性。

def analyze_attention_uncertainty(model, text):
    """
    分析注意力模式的不确定性信号
    """
    inputs = tokenizer(text, return_tensors="pt", return_token_type_ids=True)
    
    outputs = model(**inputs, output_attentions=True)
    attentions = outputs.attentions  # 每层的注意力
    
    # 分析1：注意力分布的均匀性
    uniformities = []
    for layer_attn in attentions:
        # 最后一个token对其他token的注意力
        last_token_attn = layer_attn[0, -1, :, :]
        
        # 均匀性 = H(attn) / log(n)
        entropy = -(last_token_attn * torch.log(last_token_attn + 1e-10)).sum()
        uniformities.append(entropy.item())
    
    # 高均匀性可能表示不确定
    return {
        'layer_uniformities': uniformities,
        'avg_uniformity': np.mean(uniformities),
        'high_uniformity_layers': [i for i, u in enumerate(uniformities) if u > 2.0]
    }

外部方法

自我评估（Self-Evaluation）

核心思想：让LLM评估自己回答的可靠性。

def self_evaluation(question, answer, model):
    """
    让模型评估自己的回答
    """
    prompt = f"""
    Question: {question}
    Answer: {answer}
    
    Please evaluate your answer:
    1. How confident are you that your answer is correct? (0-10)
    2. Are there any potential issues with your answer?
    3. What additional information would help verify this answer?
    
    Provide a JSON response.
    """
    
    response = model.generate(prompt)
    return parse_evaluation(response)

CoT-UQ方法

论文 (2025)：Chain-of-Thought增强的不确定性量化

核心贡献：

链式思维提示提升不确定性估计质量
答案一致性比单次概率更可靠
语义级校准替代token级校准

def cot_uq(question, model, num_reasoning_paths=8):
    """
    CoT-UQ: 链式思维增强的不确定性量化
    """
    reasoning_paths = []
    final_answers = []
    
    for _ in range(num_reasoning_paths):
        # 生成推理路径
        prompt = f"""
        Think step by step about: {question}
        Then provide your final answer.
        """
        
        response = model.generate(prompt)
        
        # 提取推理过程和最终答案
        reasoning = extract_reasoning(response)
        answer = extract_answer(response)
        
        reasoning_paths.append(reasoning)
        final_answers.append(answer)
    
    # 计算答案一致性
    consistency = calculate_answer_consistency(final_answers)
    
    # 计算推理多样性
    reasoning_diversity = calculate_reasoning_diversity(reasoning_paths)
    
    # 不确定性 = 1 - 一致性
    uncertainty = 1 - consistency
    
    return {
        'uncertainty': uncertainty,
        'consistency': consistency,
        'reasoning_diversity': reasoning_diversity,
        'answers': final_answers,
        'verdict': 'reliable' if consistency > 0.7 else 'unreliable'
    }

Semantic ECE

问题：传统的ECE（期望校准误差）只适用于分类，对于自由形式回答不适用。

解决：语义级ECE。

def semantic_ece(questions, model_answers, ground_truth, semantic_similarity_fn):
    """
    计算语义期望校准误差
    
    将概率预测映射到语义相似度空间
    """
    confidences = []
    semantic_accuracies = []
    
    for question, answer, truth in zip(questions, model_answers, ground_truth):
        # 预测置信度（可以是任何方法）
        conf = model.confidence(question, answer)
        confidences.append(conf)
        
        # 语义准确性
        sim = semantic_similarity_fn(answer, truth)
        semantic_accuracies.append(sim)
    
    # 分仓计算ECE
    n_bins = 10
    bins = np.linspace(0, 1, n_bins + 1)
    ece = 0
    
    for i in range(n_bins):
        mask = (confidences >= bins[i]) & (confidences < bins[i+1])
        if mask.sum() > 0:
            bin_conf = np.mean([confidences[j] for j in range(len(confidences)) if mask[j]])
            bin_acc = np.mean([semantic_accuracies[j] for j in range(len(semantic_accuracies)) if mask[j]])
            ece += mask.sum() * abs(bin_conf - bin_acc)
    
    ece /= len(questions)
    return ece

贝叶斯方法

MC Dropout for LLM

将Dropout在推理时保持激活：

class MCDropoutLLM:
    """
    MC Dropout用于LLM不确定性量化
    """
    def __init__(self, model):
        self.model = model
        # 确保dropout层存在
        self._enable_dropout()
    
    def _enable_dropout(self):
        """启用所有dropout层"""
        for module in self.model.modules():
            if isinstance(module, torch.nn.Dropout):
                module.p = module.p  # 保持原有dropout率
    
    def predict_with_uncertainty(self, prompt, num_samples=30):
        """
        多次采样估计不确定性
        """
        responses = []
        log_probs_list = []
        
        for _ in range(num_samples):
            output = self.model.generate(
                prompt, 
                do_sample=True,  # 启用采样
                temperature=0.7
            )
            response = self.tokenizer.decode(output[0])
            
            # 计算token概率
            log_probs = self._get_token_log_probs(output)
            
            responses.append(response)
            log_probs_list.append(log_probs)
        
        # 计算响应多样性（不确定性）
        response_entropy = self._compute_response_entropy(responses)
        
        # 计算token级不确定性
        avg_log_probs = torch.stack(log_probs_list).mean(dim=0)
        token_uncertainty = -avg_log_probs.exp() * avg_log_probs
        
        return {
            'responses': responses,
            'response_entropy': response_entropy,
            'token_uncertainty': token_uncertainty,
            'consistency': 1 - response_entropy / np.log(num_samples)
        }

LoRA + Laplace近似

策略：使用LoRA进行微调后，对LoRA参数应用Laplace近似。

class LoRALaplaceLLM:
    """
    LoRA + Laplace近似的不确定性量化
    
    只对LoRA参数进行贝叶斯推断，大幅降低计算成本
    """
    def __init__(self, base_model, lora_config):
        self.model = base_model
        self.lora_module = inject_lora(base_model, lora_config)
        
        # 冻结基础模型
        for param in self.model.parameters():
            param.requires_grad = False
        for param in self.lora_module.parameters():
            param.requires_grad = True
    
    def fit_posterior(self, calibration_data):
        """
        估计LoRA参数的Laplace后验
        """
        # 1. 找到MAP估计
        self.map_params = find_map(self.lora_module, calibration_data)
        
        # 2. 计算精度矩阵（对角近似）
        self.precision = compute_precision_diagonal(
            self.lora_module, 
            self.map_params, 
            calibration_data
        )
    
    def predict_with_uncertainty(self, prompt, num_samples=50):
        """
        贝叶斯预测
        """
        predictions = []
        
        for _ in range(num_samples):
            # 从Laplace后验采样LoRA权重
            sampled_params = sample_from_laplace(
                self.map_params, 
                self.precision
            )
            
            # 应用采样权重
            self.lora_module.load_state_dict(sampled_params)
            
            # 生成响应
            response = self.model.generate(prompt)
            predictions.append(response)
        
        # 计算不确定性
        uncertainty = compute_response_diversity(predictions)
        
        return {
            'predictions': predictions,
            'uncertainty': uncertainty,
            'consistency': 1 - uncertainty
        }

幻觉检测

方法框架

幻觉检测流程
├── 输入处理
│   ├── 事实核查请求
│   └── 实体提取
├── 不确定性评估
│   ├── Token概率分析
│   ├── 语义熵计算
│   └── 知识溯源
├── 幻觉判定
│   ├── 规则方法
│   ├── 分类器方法
│   └── LLM自评估
└── 输出标记
    ├── 置信度分数
    └── 可疑片段标记

实用系统实现

class HallucinationDetector:
    """
    综合幻觉检测系统
    """
    def __init__(self, llm, knowledge_base=None):
        self.llm = llm
        self.knowledge_base = knowledge_base
    
    def detect(self, response, context=None):
        """
        检测回复中的幻觉
        """
        signals = {}
        
        # 1. Token级信号
        signals['token_confidence'] = self._token_confidence(response)
        
        # 2. 语义不一致性
        signals['semantic_entropy'] = self._semantic_entropy(response)
        
        # 3. 知识冲突检测
        if self.knowledge_base:
            signals['knowledge_conflict'] = self._check_knowledge_base(response)
        
        # 4. 自我评估
        signals['self_eval'] = self._self_evaluation(response)
        
        # 综合判断
        final_signal = self._aggregate_signals(signals)
        
        return {
            'is_hallucination': final_signal > 0.7,
            'confidence': 1 - final_signal,
            'signals': signals,
            'flagged_segments': self._flag_suspicious_parts(response, signals)
        }
    
    def _flag_suspicious_parts(self, response, signals):
        """
        标记可疑片段
        """
        tokens = response.split()
        flags = []
        
        for i, token in enumerate(tokens):
            token_prob = signals['token_confidence'].get(i, 1.0)
            
            if token_prob < 0.3:
                flags.append({
                    'token': token,
                    'position': i,
                    'reason': 'low_probability',
                    'severity': 'high' if token_prob < 0.1 else 'medium'
                })
        
        return flags

实践指南

方法选择决策树

问题类型
├── 简单分类任务
│   └── Token概率 + 温度调节
├── 开放生成任务
│   ├── 实时应用 → 语义熵近似
│   └── 离线分析 → MC Dropout + CoT
├── 高风险场景
│   └── 混合方法 + 知识验证
└── 持续监控
    └── 语义ECE + 漂移检测

校准最佳实践

class UncertaintyCalibrator:
    """
    不确定性校准器
    """
    def __init__(self, uncertainty_estimator):
        self.estimator = uncertainty_estimator
        self.calibration_curve = None
    
    def calibrate(self, validation_data):
        """
        校准不确定性估计
        
        验证数据应该是模型容易出错的数据
        """
        uncertainties = []
        accuracies = []
        
        for question, answer, is_correct in validation_data:
            # 获取不确定性估计
            unc = self.estimator.estimate(question, answer)
            uncertainties.append(unc)
            accuracies.append(1 if is_correct else 0)
        
        # 计算校准曲线
        self.calibration_curve = compute_calibration_curve(
            uncertainties, accuracies, n_bins=10
        )
        
        return self.calibration_curve
    
    def apply_calibration(self, raw_uncertainty):
        """
        应用校准转换
        """
        # 使用 Platt Scaling 或 Isotonic Regression
        return platt_scale(raw_uncertainty, self.calibration_curve)

开源工具

工具	描述	链接
Laplace-Torch	神经网络Laplace近似	GitHub
DeepEnds	深度集成方法	GitHub
Evidential Deep Learning	证据深度学习	GitHub
BayesPrompts	LLM概率校准	GitHub

Metaphor

探索

LLM不确定性量化

概述

问题背景

LLM的过度自信问题

不确定性的类型

不确定性量化方法分类

方法总览

方法对比

内部方法

Token级概率

语义熵（Semantic Entropy）

注意力模式分析

外部方法

自我评估（Self-Evaluation）

CoT-UQ方法

Semantic ECE

贝叶斯方法

MC Dropout for LLM

LoRA + Laplace近似

幻觉检测

方法框架

实用系统实现

实践指南

方法选择决策树

校准最佳实践

开源工具

参考

相关阅读

关系图谱

目录

反向链接

Metaphor

探索

LLM不确定性量化

概述

问题背景

LLM的过度自信问题

不确定性的类型

不确定性量化方法分类

方法总览

方法对比

内部方法

Token级概率

语义熵（Semantic Entropy）

注意力模式分析

外部方法

自我评估（Self-Evaluation）

CoT-UQ方法

Semantic ECE

贝叶斯方法

MC Dropout for LLM

LoRA + Laplace近似

幻觉检测

方法框架

实用系统实现

实践指南

方法选择决策树

校准最佳实践

开源工具

参考

相关阅读

Footnotes

关系图谱

目录

反向链接