LLM引导的神经架构搜索

1. 概述

LLM引导的NAS(Large Language Model guided NAS)代表了NAS研究的前沿方向1。通过利用大语言模型中嵌入的神经网络设计知识,可以大幅加速架构搜索过程,甚至实现无训练的架构设计。

核心优势:

  • 利用预训练LLM的架构知识
  • 无需评估大量候选架构
  • 可结合人类设计经验

2. ZeroLM方法

2.1 方法背景

论文: Chen et al., “ZeroLM: Data-Free Transformer Architecture Search for Language Models” (arXiv 2503.18646)

核心问题: 传统NAS需要大量计算资源进行架构评估,ZeroLM提出无需数据即可搜索Transformer架构。

2.2 方法原理

ZeroLM利用LLM生成和评估Transformer架构,核心框架:

LLM (GPT-4/Claude)
    ↓ Prompt Engineering
架构描述 (JSON/Dict)
    ↓ LLM生成
候选架构集合
    ↓ Zero-Cost评估
性能排序
    ↓ 反馈循环
改进的搜索策略

2.3 搜索空间定义

ZeroLM针对Transformer架构设计搜索空间:

组件搜索选项
注意力类型Multi-head, Linear, Sparse, Flash
前馈网络FFN, SwiGLU, GLU, Feedforward
位置编码RoPE, ALiBi, Sinusoidal, None
层配置隐藏维度, 层数, 注意力头数
正则化LayerNorm, RMSNorm, Pre-Norm

2.4 Prompt设计

架构生成Prompt:

architecture_generation_prompt = """
You are an expert neural network architect specialized in Transformers.
 
Generate {num_architectures} Transformer architecture configurations 
for language modeling. Each configuration should specify:
 
1. num_layers: Number of transformer layers (4-32)
2. hidden_size: Hidden dimension (256-2048)
3. num_heads: Number of attention heads (4-32)
4. intermediate_size: FFN intermediate size (hidden_size * 2-8)
5. attention_type: Type of attention mechanism
6. position_encoding: Type of positional encoding
7. activation: Activation function type
 
Consider:
- Computational efficiency vs accuracy trade-offs
- Recent advances in Transformer design
- Hardware compatibility
 
Return as JSON list of dictionaries.
"""

2.5 评估机制

ZeroLM使用多个零代价代理的综合评分:

其中各子分数基于架构参数计算,无需实际训练。


3. RZ-NAS方法

3.1 方法背景

论文: Ji et al., “RZ-NAS: Enhancing LLM-guided NAS via Reflective Zero-Cost Strategy” (ICML 2025)

核心创新: 反思式零代价策略,增强LLM引导NAS的效果。

3.2 现有LLM-NAS的挑战

挑战描述
搜索空间限制LLM难以理解复杂搜索空间定义
搜索效率盲目生成缺乏针对性
幻觉问题LLM可能生成不合理的架构
评估缺失缺乏可靠的架构性能评估

3.3 RZ-NAS核心框架

三阶段流程:

class RZ_NAS:
    def __init__(self, llm, zero_cost_proxies):
        self.llm = llm
        self.proxies = zero_cost_proxies
        self.history = []
    
    def reflective_search(self, num_iterations=10):
        for iteration in range(num_iterations):
            # Stage 1: LLM生成
            candidates = self.llm.generate(
                prompt=self._build_prompt(iteration),
                num_samples=5
            )
            
            # Stage 2: Zero-Cost评估
            scores = []
            for arch in candidates:
                score = self.evaluate_zero_cost(arch)
                scores.append(score)
            
            # Stage 3: 反思与改进
            reflection = self.reflect(scores, candidates)
            self.history.append({
                'candidates': candidates,
                'scores': scores,
                'reflection': reflection
            })
            
            # 更新Prompt策略
            self.update_strategy(reflection)
        
        return self.get_best_architecture()

3.4 反思机制详解

反思Prompt设计:

reflection_prompt = """
Based on the previous generation results:
 
Previous architectures: {history}
Scores: {scores}
Feedback: {feedback}
 
Analyze:
1. Why did certain architectures receive higher scores?
2. What patterns are associated with good performance?
3. What mistakes were made in low-scoring architectures?
4. How can we improve the generation strategy?
 
Provide specific guidance for the next generation.
"""

3.5 零代价代理集成

RZ-NAS使用多个零代价代理的集成评估:

代理权重适用场景
Synflow0.3无数据评估
NTK0.25表达力评估
频谱0.2权重分布
参数效率0.25计算效率

4. LLM-NAS方法

4.1 硬件感知扩展

论文: Zhu et al., “LLM-NAS: LLM-driven Hardware-Aware Neural Architecture Search” (arXiv 2510.01472)

创新点: 将硬件约束纳入LLM引导的NAS框架。

4.2 多目标优化

LLM-NAS同时优化:

  1. 准确率 - 通过零代价代理预测
  2. 延迟 - 通过硬件模型预测
  3. 能耗 - 通过功耗模型估计
  4. 内存 - 基于参数量计算

4.3 LLM-NAS Prompt设计

hw_aware_prompt = """
Generate neural network architectures with the following hardware constraints:
 
Target Device: {device_type}
Latency Budget: {latency_ms}ms
Memory Budget: {memory_mb}MB
Power Budget: {power_w}W
 
For {device_type}, consider:
- {device_specific_constraints}
 
Balance accuracy with hardware efficiency.
Return architectures ranked by the trade-off score.
"""

5. PEL-NAS方法

5.1 提示协同进化

核心思想: 搜索过程同时优化架构和Prompt,实现协同进化。

class PEL_NAS:
    def coevolution(self):
        # 初始化
        arch_population = self.llm.generate_initial()
        prompt_population = self.design_initial_prompts()
        
        for generation in range(num_generations):
            # 架构评估
            for arch in arch_population:
                for prompt in prompt_population:
                    score = self.evaluate(arch, prompt)
                    self.fitness[arch, prompt] = score
            
            # 选择
            arch_population = self.select_architectures(arch_population)
            prompt_population = self.select_prompts(prompt_population)
            
            # 变异
            arch_population = self.mutate_architectures(arch_population)
            prompt_population = self.mutate_prompts(prompt_population)

5.2 协同进化效果

实验表明,协同进化比单独优化架构或Prompt效果更好:

方法Top-1准确率搜索时间
LLM-only72.3%2h
Prompt-only68.5%1h
顺序优化74.1%4h
协同进化76.8%3h

6. LLM-NAS的技术细节

6.1 架构表示方法

代码表示:

architecture_template = """
class {ModelName}(nn.Module):
    def __init__(self, config):
        super().__init__()
        self.layers = nn.ModuleList([
            TransformerLayer(
                hidden_size={hidden_size},
                num_heads={num_heads},
                intermediate_size={intermediate_size},
                attention_type='{attention_type}',
                activation='{activation}'
            )
            for _ in range({num_layers})
        ])
        self.norm = {norm_type}()
    
    def forward(self, x):
        for layer in self.layers:
            x = layer(x)
        return self.norm(x)
"""

JSON表示:

{
  "model_name": "ZeroLM_Transformer",
  "architecture": {
    "num_layers": 12,
    "hidden_size": 768,
    "num_heads": 12,
    "intermediate_size": 3072,
    "attention_type": "multi_head",
    "position_encoding": "RoPE",
    "activation": "silu",
    "norm": "RMSNorm",
    "dropout": 0.1
  },
  "estimated_metrics": {
    "params": "125M",
    "flops": "2.5e11",
    "latency_ms": 15.2
  }
}

6.2 反馈机制设计

多轮对话流程:

def multi_round_dialogue(llm, initial_architecture):
    messages = [
        {"role": "system", "content": "You are an expert NAS assistant."},
        {"role": "user", "content": f"Generate architectures: {initial_architecture}"}
    ]
    
    for round in range(num_rounds):
        response = llm.chat(messages)
        arch = parse_architecture(response)
        
        # 评估
        score = evaluate_zero_cost(arch)
        
        # 反馈
        messages.append({"role": "assistant", "content": response})
        messages.append({
            "role": "user", 
            "content": f"Score: {score}. Improve based on: {get_feedback(score)}"
        })
    
    return extract_best_architecture(messages)

6.3 LLM选择建议

LLM优势劣势推荐场景
GPT-4架构知识丰富成本高精确搜索
Claude推理能力强限制较多反思机制
Llama-3开源可本地部署知识有限快速迭代
Gemini多模态架构知识有限联合设计

7. 评估与比较

7.1 评估指标

指标描述测量方法
搜索效率生成有效架构的速度时间/s
Top-K准确率Top-K候选中包含最优的比例实验统计
Zero-Shot性能直接使用的性能验证集准确率
微调后性能少量微调后的性能微调实验

7.2 方法比较

方法无数据硬件感知反思机制多轮对话代码生成
ZeroLM
RZ-NAS部分
LLM-NAS部分部分
PEL-NAS部分部分

7.3 实验结果

在CIFAR-10和ImageNet上的典型结果:

方法CIFAR-10准确率ImageNet Top-1搜索时间
Random Search92.1%68.5%48h
DARTS94.3%73.1%24h
Zero-shot NAS93.8%72.4%2h
ZeroLM94.6%74.2%1.5h
RZ-NAS95.1%75.8%3h

8. 实践指南

8.1 快速实现模板

from transformers import AutoModelForCausalLM
import anthropic
 
class SimpleLLMNAS:
    def __init__(self, api_key, model="claude-3-5-sonnet"):
        self.client = anthropic.Anthropic(api_key=api_key)
        self.model = model
    
    def search(self, num_architectures=10):
        # 生成候选架构
        response = self.client.messages.create(
            model=self.model,
            max_tokens=4096,
            messages=[{
                "role": "user",
                "content": f"Generate {num_architectures} Transformer architectures..."
            }]
        )
        
        architectures = self.parse_response(response.content)
        
        # 零代价评估
        scores = [self.zero_cost_eval(arch) for arch in architectures]
        
        # 返回最优
        best_idx = argmax(scores)
        return architectures[best_idx], scores[best_idx]
    
    def zero_cost_eval(self, arch):
        # 实现零代价评估
        return ...

8.2 最佳实践

  1. Prompt Engineering: 详细的架构描述和约束条件
  2. 多轮迭代: 允许LLM根据反馈改进
  3. 集成评估: 使用多个零代价代理
  4. 候选多样性: 生成多样化的架构候选
  5. 后处理验证: 过滤不合理的架构

9. 相关主题


参考文献

Footnotes

  1. Chen, Z. S., et al. (2025). ZeroLM: Data-Free Transformer Architecture Search for Language Models. arXiv:2503.18646.