多智能体因果发现框架

概述

多智能体方法在因果发现中展现出独特优势:多个LLM智能体可以协作、辩论、相互验证,从而提高因果图发现的准确性和可靠性。Multi-Agent Causal Discovery (MAC)1框架是这一方向的代表性工作,荣获IJCAI 2025的关注。


背景:为什么需要多智能体?

单智能体的局限

问题描述影响
知识偏差LLM预训练知识不完整遗漏重要因果边
推理错误单次推理可能出错错误的方向判断
覆盖不足无法同时精通多领域专业领域表现差

多智能体优势

智能体A ──┐
           ├──→ 协作发现 ──→ 因果图
智能体B ──┤
           │
智能体C ──┘
    │
    └──→ 辩论验证 → 错误修正

MAC框架详解

整体架构

┌─────────────────────────────────────────────────────────────┐
│                      MAC框架                                  │
├─────────────────────────────────────────────────────────────┤
│                                                              │
│  ┌─────────────────────────────────────────────────────┐    │
│  │            Debate-Coding Module (DCM)              │    │
│  │  ┌─────────┐  ┌─────────┐  ┌─────────────────┐   │    │
│  │  │ Agent_A │  │ Agent_B │  │ ... (N个智能体) │   │    │
│  │  └────┬────┘  └────┬────┘  └────────┬────────┘   │    │
│  │       │            │                 │             │    │
│  │       └────────────┼─────────────────┘             │    │
│  │                    ▼                               │    │
│  │           辩论与编码阶段                           │    │
│  │                    │                               │    │
│  │                    ▼                               │    │
│  │           统计因果发现(SCD)                        │    │
│  │                    │                               │    │
│  │                    ▼                               │    │
│  │            初始因果图                              │    │
│  └─────────────────────────────────────────────────────┘    │
│                    │                                        │
│                    ▼                                        │
│  ┌─────────────────────────────────────────────────────┐    │
│  │           Meta-Debate Module (MDM)                  │    │
│  │                                                      │    │
│  │  元融合 ──→ 因果metadata ──→ 辩论 ──→ 最终因果图    │    │
│  │                                                      │    │
│  └─────────────────────────────────────────────────────┘    │
│                                                              │
└─────────────────────────────────────────────────────────────┘

DCM:Debate-Coding模块

智能体设计

每个智能体专注于不同视角:

class CausalAgent:
    def __init__(self, role, domain_expertise):
        self.role = role  # e.g., "temporal", "mechanism", "statistical"
        self.domain = domain_expertise
        
    def propose(self, data, metadata):
        """提出因果假设"""
        # 根据角色选择不同的推理策略
        if self.role == "temporal":
            # 时序优先:X在Y之前发生 → X→Y
            return self.temporal_reasoning(data, metadata)
        elif self.role == "mechanism":
            # 机制优先:领域知识推断因果机制
            return self.mechanism_reasoning(metadata)
        elif self.role == "statistical":
            # 统计优先:条件独立性检验
            return self.statistical_reasoning(data)

辩论机制

def debate(agents, edge_candidates):
    """多智能体辩论"""
    debates = []
    
    for edge in edge_candidates:
        # 每个智能体发表意见
        opinions = {}
        for agent in agents:
            opinion = agent.opinion_on(edge)
            opinions[agent.role] = opinion
        
        # 辩论过程
        debate_record = {
            "edge": edge,
            "initial_opinions": opinions,
            "arguments": [],
            "final_consensus": None
        }
        
        # 多轮辩论
        for round in range(max_rounds):
            # 收集论据
            for agent in agents:
                # 根据其他智能体的观点调整论点
                arguments = agent.respond_to_opponents(opinions)
                debate_record["arguments"].append({
                    "agent": agent.role,
                    "round": round,
                    "argument": arguments
                })
            
            # 检查是否达成共识
            if check_consensus(opinions):
                debate_record["final_consensus"] = summarize(opinions)
                break
        
        debates.append(debate_record)
    
    return debates

统计因果发现集成

辩论结果与统计方法结合:

def integrate_with_scd(debates, data):
    """集成辩论结果和SCD"""
    # 1. 获取辩论置信度
    debate_confidence = {d["edge"]: d["confidence"] 
                         for d in debates}
    
    # 2. 统计方法结果
    scd_results = run_scd_algorithm(data)  # PC, GES, etc.
    
    # 3. 加权融合
    final_scores = {}
    for edge in all_edges:
        d_score = debate_confidence.get(edge, 0.5)
        s_score = scd_results.get(edge, 0.5)
        
        # 根据一致性调整权重
        if d_score > 0.7 and s_score > 0.7:
            final_scores[edge] = 0.9  # 两者一致,高度置信
        elif d_score < 0.3 and s_score < 0.3:
            final_scores[edge] = 0.1  # 两者都否定
        else:
            final_scores[edge] = 0.5 * d_score + 0.5 * s_score
    
    return final_scores

MDM:Meta-Debate模块

因果Metadata融合

class MetaFusion:
    """将因果图转换为因果metadata"""
    
    def __init__(self):
        self.metadata_types = [
            "temporal_order",      # 时序顺序
            "domain_knowledge",     # 领域知识
            "strength_estimate",   # 因果强度估计
            "confidence_level",     # 置信水平
            "data_availability"    # 数据可用性
        ]
    
    def graph_to_metadata(self, causal_graph):
        """因果图 → 因果metadata"""
        metadata = {}
        
        for edge in causal_graph.edges:
            metadata[edge] = {
                "temporal": self.extract_temporal_info(edge),
                "domain": self.extract_domain_info(edge),
                "strength": self.estimate_strength(edge),
                "confidence": self.assess_confidence(edge),
                "data": self.check_data_quality(edge)
            }
        
        return metadata
    
    def extract_temporal_info(self, edge):
        """提取时序信息"""
        # 根据变量的时间属性确定时序
        return {
            "X_before_Y": self.is_before(edge.X, edge.Y),
            "X_after_Y": self.is_after(edge.X, edge.Y),
            "simultaneous": self.is_concurrent(edge.X, edge.Y)
        }

多智能体元辩论

def meta_debate(metadata, agents):
    """元辩论:细化因果结构"""
    current_graph = None
    
    for round in range(meta_rounds):
        # 每个智能体分析metadata
        analyses = {}
        for agent in agents:
            analysis = agent.analyze_metadata(
                metadata, 
                current_graph,
                role_specific=True
            )
            analyses[agent.role] = analysis
        
        # 识别分歧
        disagreements = identify_disagreements(analyses)
        
        # 解决分歧
        for disagreement in disagreements:
            resolution = resolve_through_debate(
                disagreement,
                agents,
                metadata
            )
            # 更新图结构
            current_graph = apply_resolution(
                current_graph,
                resolution
            )
        
        # 检查收敛
        if is_converged(current_graph):
            break
    
    return current_graph

算法流程

完整伪代码

def MAC_pipeline(data, metadata, variable_descriptions):
    """
    MAC多智能体因果发现主流程
    
    Args:
        data: 观测数据 (DataFrame)
        metadata: 变量元数据 (dict)
        variable_descriptions: 变量描述文本
    
    Returns:
        causal_graph: 推断的因果图
    """
    
    # ============ 阶段1:DCM初始化 ============
    print("阶段1:初始化辩论智能体...")
    
    # 创建多样化智能体
    agents = [
        CausalAgent("statistical", "数据分析"),
        CausalAgent("temporal", "时序推理"),
        CausalAgent("mechanism", "生物医学"),
        CausalAgent("mechanism", "工程系统"),
        CausalAgent("domain", metadata["domain"])
    ]
    
    # ============ 阶段2:辩论-编码 ============
    print("阶段2:辩论-编码...")
    
    # 2.1 边候选生成
    edge_candidates = generate_edge_candidates(
        data.columns, 
        variable_descriptions
    )
    
    # 2.2 智能体辩论
    debate_results = debate(agents, edge_candidates)
    
    # 2.3 选择最优SCD方法
    best_scd = select_scd_method(debate_results)
    
    # 2.4 运行SCD
    scd_graph = run_scd(data, best_scd)
    
    # 2.5 初始因果图
    initial_graph = fuse_debate_and_scd(debate_results, scd_graph)
    
    # ============ 阶段3:元融合 ============
    print("阶段3:元融合...")
    
    # 3.1 图→Metadata
    causal_metadata = MetaFusion().graph_to_metadata(initial_graph)
    
    # 3.2 元辩论
    refined_graph = meta_debate(causal_metadata, agents)
    
    # ============ 阶段4:输出 ============
    return refined_graph

时间复杂度分析

阶段时间复杂度主导因素
DCM辩论N边×R轮×A智能体
SCD执行V变量×I独立性检验
MDM元辩论M元边×R轮×A智能体

实验结果

数据集对比

数据集MACGranDAGGESPC
Asia89.2%76.5%68.2%71.4%
Alarm82.7%78.3%71.2%74.6%
Hepar278.4%72.1%65.8%68.9%
Sachs84.6%79.2%73.5%76.1%

边精度指标,MAC在所有数据集上取得最佳表现。

消融实验

方法变体AsiaAlarm平均
MAC完整版89.2%82.7%85.95%
- DCM辩论81.3%75.4%78.35%
- MDM元辩论83.7%78.9%81.30%
- Metadata融合84.1%79.2%81.65%
单独SCD72.4%68.5%70.45%

关键发现:DCM和MDM的贡献互补,缺一不可。

推理效率

方法推理时间相对效率
MAC~5分钟1.0x
GranDAG~12分钟2.4x
GES~8分钟1.6x
RL-based~15分钟3.0x

实践应用

医疗诊断

# 医疗因果发现示例
medical_data = load_medical_records()
medical_metadata = {
    "domain": "cardiology",
    "variables": {
        "Age": {"type": "demographic"},
        "Smoking": {"type": "behavior"},
        "Cholesterol": {"type": "biomarker"},
        "BloodPressure": {"type": "vital"},
        "HeartDisease": {"type": "outcome"}
    }
}
 
# 运行MAC
causal_graph = MAC_pipeline(
    data=medical_data,
    metadata=medical_metadata,
    variable_descriptions=load_medical_descriptions()
)
 
# 分析关键因果路径
key_pathways = extract_pathways(causal_graph, "Smoking", "HeartDisease")
# 输出: Smoking → Cholesterol → HeartDisease
#       Smoking → BloodPressure → HeartDisease

金融风控

# 金融因果发现
financial_data = load_market_data()
financial_metadata = {
    "domain": "finance",
    "variables": {
        "InterestRate": {"type": "economic"},
        "Inflation": {"type": "economic"},
        "MarketIndex": {"type": "market"},
        "Unemployment": {"type": "labor"}
    }
}
 
causal_graph = MAC_pipeline(
    data=financial_data,
    metadata=financial_metadata
)

扩展方向

动态多智能体

class DynamicMAC:
    """动态调整智能体组合"""
    
    def adapt_agents(self, task, current_performance):
        """根据任务和性能动态调整"""
        
        if current_performance < 0.7:
            # 性能不足,增加智能体多样性
            self.add_agent(role="critical", domain="...")
        
        if "temporal" in task:
            # 需要更多时序推理
            self.add_agent(role="temporal", domain="...")
        
        if "mechanism" in task:
            # 需要领域专家
            self.add_agent(role="mechanism", domain=task["domain"])

人机协作

class HumanInLoopMAC:
    """人类专家介入的多智能体"""
    
    def run_with_human_feedback(self, graph, expert):
        """结合人类专家反馈"""
        # 专家审查
        expert_annotations = expert.review(graph)
        
        # 整合专家知识
        enhanced_metadata = integrate_expert_knowledge(
            self.metadata,
            expert_annotations
        )
        
        # 继续优化
        return self.refine_with_metadata(enhanced_metadata)

相关内容


参考文献

Footnotes

  1. arXiv:2407.15073, “Multi-Agent Causal Discovery Using Large Language Models” (IJCAI 2025)