LLM引导的神经架构搜索

1. 概述

LLM引导的NAS（Large Language Model guided NAS）代表了NAS研究的前沿方向¹。通过利用大语言模型中嵌入的神经网络设计知识，可以大幅加速架构搜索过程，甚至实现无训练的架构设计。

核心优势：

利用预训练LLM的架构知识
无需评估大量候选架构
可结合人类设计经验

2. ZeroLM方法

2.1 方法背景

论文： Chen et al., “ZeroLM: Data-Free Transformer Architecture Search for Language Models” (arXiv 2503.18646)

核心问题： 传统NAS需要大量计算资源进行架构评估，ZeroLM提出无需数据即可搜索Transformer架构。

2.2 方法原理

ZeroLM利用LLM生成和评估Transformer架构，核心框架：

LLM (GPT-4/Claude)
    ↓ Prompt Engineering
架构描述 (JSON/Dict)
    ↓ LLM生成
候选架构集合
    ↓ Zero-Cost评估
性能排序
    ↓ 反馈循环
改进的搜索策略

2.3 搜索空间定义

ZeroLM针对Transformer架构设计搜索空间：

组件	搜索选项
注意力类型	Multi-head, Linear, Sparse, Flash
前馈网络	FFN, SwiGLU, GLU, Feedforward
位置编码	RoPE, ALiBi, Sinusoidal, None
层配置	隐藏维度, 层数, 注意力头数
正则化	LayerNorm, RMSNorm, Pre-Norm

2.4 Prompt设计

架构生成Prompt：

architecture_generation_prompt = """
You are an expert neural network architect specialized in Transformers.
 
Generate {num_architectures} Transformer architecture configurations 
for language modeling. Each configuration should specify:
 
1. num_layers: Number of transformer layers (4-32)
2. hidden_size: Hidden dimension (256-2048)
3. num_heads: Number of attention heads (4-32)
4. intermediate_size: FFN intermediate size (hidden_size * 2-8)
5. attention_type: Type of attention mechanism
6. position_encoding: Type of positional encoding
7. activation: Activation function type
 
Consider:
- Computational efficiency vs accuracy trade-offs
- Recent advances in Transformer design
- Hardware compatibility
 
Return as JSON list of dictionaries.
"""

2.5 评估机制

ZeroLM使用多个零代价代理的综合评分：

Score_{ZeroLM} (α) = w_{1} \cdot S_{flops} + w_{2} \cdot S_{params} + w_{3} \cdot S_{latency}

其中各子分数基于架构参数计算，无需实际训练。

3. RZ-NAS方法

3.1 方法背景

论文： Ji et al., “RZ-NAS: Enhancing LLM-guided NAS via Reflective Zero-Cost Strategy” (ICML 2025)

核心创新： 反思式零代价策略，增强LLM引导NAS的效果。

3.2 现有LLM-NAS的挑战

挑战	描述
搜索空间限制	LLM难以理解复杂搜索空间定义
搜索效率	盲目生成缺乏针对性
幻觉问题	LLM可能生成不合理的架构
评估缺失	缺乏可靠的架构性能评估

3.3 RZ-NAS核心框架

三阶段流程：

class RZ_NAS:
    def __init__(self, llm, zero_cost_proxies):
        self.llm = llm
        self.proxies = zero_cost_proxies
        self.history = []
    
    def reflective_search(self, num_iterations=10):
        for iteration in range(num_iterations):
            # Stage 1: LLM生成
            candidates = self.llm.generate(
                prompt=self._build_prompt(iteration),
                num_samples=5
            )
            
            # Stage 2: Zero-Cost评估
            scores = []
            for arch in candidates:
                score = self.evaluate_zero_cost(arch)
                scores.append(score)
            
            # Stage 3: 反思与改进
            reflection = self.reflect(scores, candidates)
            self.history.append({
                'candidates': candidates,
                'scores': scores,
                'reflection': reflection
            })
            
            # 更新Prompt策略
            self.update_strategy(reflection)
        
        return self.get_best_architecture()

3.4 反思机制详解

反思Prompt设计：

reflection_prompt = """
Based on the previous generation results:
 
Previous architectures: {history}
Scores: {scores}
Feedback: {feedback}
 
Analyze:
1. Why did certain architectures receive higher scores?
2. What patterns are associated with good performance?
3. What mistakes were made in low-scoring architectures?
4. How can we improve the generation strategy?
 
Provide specific guidance for the next generation.
"""

3.5 零代价代理集成

RZ-NAS使用多个零代价代理的集成评估：

代理	权重	适用场景
Synflow	0.3	无数据评估
NTK	0.25	表达力评估
频谱	0.2	权重分布
参数效率	0.25	计算效率

4. LLM-NAS方法

4.1 硬件感知扩展

论文： Zhu et al., “LLM-NAS: LLM-driven Hardware-Aware Neural Architecture Search” (arXiv 2510.01472)

创新点： 将硬件约束纳入LLM引导的NAS框架。

4.2 多目标优化

LLM-NAS同时优化：

准确率 - 通过零代价代理预测
延迟 - 通过硬件模型预测
能耗 - 通过功耗模型估计
内存 - 基于参数量计算

α max Acc (α) s.t. ⎩ ⎨ ⎧ Latency (α) \leq L_{m a x} Energy (α) \leq E_{m a x} Memory (α) \leq M_{m a x}

4.3 LLM-NAS Prompt设计

hw_aware_prompt = """
Generate neural network architectures with the following hardware constraints:
 
Target Device: {device_type}
Latency Budget: {latency_ms}ms
Memory Budget: {memory_mb}MB
Power Budget: {power_w}W
 
For {device_type}, consider:
- {device_specific_constraints}
 
Balance accuracy with hardware efficiency.
Return architectures ranked by the trade-off score.
"""

5. PEL-NAS方法

5.1 提示协同进化

核心思想： 搜索过程同时优化架构和Prompt，实现协同进化。

class PEL_NAS:
    def coevolution(self):
        # 初始化
        arch_population = self.llm.generate_initial()
        prompt_population = self.design_initial_prompts()
        
        for generation in range(num_generations):
            # 架构评估
            for arch in arch_population:
                for prompt in prompt_population:
                    score = self.evaluate(arch, prompt)
                    self.fitness[arch, prompt] = score
            
            # 选择
            arch_population = self.select_architectures(arch_population)
            prompt_population = self.select_prompts(prompt_population)
            
            # 变异
            arch_population = self.mutate_architectures(arch_population)
            prompt_population = self.mutate_prompts(prompt_population)

5.2 协同进化效果

实验表明，协同进化比单独优化架构或Prompt效果更好：

方法	Top-1准确率	搜索时间
LLM-only	72.3%	2h
Prompt-only	68.5%	1h
顺序优化	74.1%	4h
协同进化	76.8%	3h

6. LLM-NAS的技术细节

6.1 架构表示方法

代码表示：

architecture_template = """
class {ModelName}(nn.Module):
    def __init__(self, config):
        super().__init__()
        self.layers = nn.ModuleList([
            TransformerLayer(
                hidden_size={hidden_size},
                num_heads={num_heads},
                intermediate_size={intermediate_size},
                attention_type='{attention_type}',
                activation='{activation}'
            )
            for _ in range({num_layers})
        ])
        self.norm = {norm_type}()
    
    def forward(self, x):
        for layer in self.layers:
            x = layer(x)
        return self.norm(x)
"""

JSON表示：

{
  "model_name": "ZeroLM_Transformer",
  "architecture": {
    "num_layers": 12,
    "hidden_size": 768,
    "num_heads": 12,
    "intermediate_size": 3072,
    "attention_type": "multi_head",
    "position_encoding": "RoPE",
    "activation": "silu",
    "norm": "RMSNorm",
    "dropout": 0.1
  },
  "estimated_metrics": {
    "params": "125M",
    "flops": "2.5e11",
    "latency_ms": 15.2
  }
}

6.2 反馈机制设计

多轮对话流程：

def multi_round_dialogue(llm, initial_architecture):
    messages = [
        {"role": "system", "content": "You are an expert NAS assistant."},
        {"role": "user", "content": f"Generate architectures: {initial_architecture}"}
    ]
    
    for round in range(num_rounds):
        response = llm.chat(messages)
        arch = parse_architecture(response)
        
        # 评估
        score = evaluate_zero_cost(arch)
        
        # 反馈
        messages.append({"role": "assistant", "content": response})
        messages.append({
            "role": "user", 
            "content": f"Score: {score}. Improve based on: {get_feedback(score)}"
        })
    
    return extract_best_architecture(messages)

6.3 LLM选择建议

LLM	优势	劣势	推荐场景
GPT-4	架构知识丰富	成本高	精确搜索
Claude	推理能力强	限制较多	反思机制
Llama-3	开源可本地部署	知识有限	快速迭代
Gemini	多模态	架构知识有限	联合设计

7. 评估与比较

7.1 评估指标

指标	描述	测量方法
搜索效率	生成有效架构的速度	时间/s
Top-K准确率	Top-K候选中包含最优的比例	实验统计
Zero-Shot性能	直接使用的性能	验证集准确率
微调后性能	少量微调后的性能	微调实验

7.2 方法比较

方法	无数据	硬件感知	反思机制	多轮对话	代码生成
ZeroLM	✓	✗	✗	✗	✓
RZ-NAS	✓	部分	✓	✓	✓
LLM-NAS	✓	✓	部分	部分	✓
PEL-NAS	✓	部分	部分	✓	✓

7.3 实验结果

在CIFAR-10和ImageNet上的典型结果：

方法	CIFAR-10准确率	ImageNet Top-1	搜索时间
Random Search	92.1%	68.5%	48h
DARTS	94.3%	73.1%	24h
Zero-shot NAS	93.8%	72.4%	2h
ZeroLM	94.6%	74.2%	1.5h
RZ-NAS	95.1%	75.8%	3h

8. 实践指南

8.1 快速实现模板

from transformers import AutoModelForCausalLM
import anthropic
 
class SimpleLLMNAS:
    def __init__(self, api_key, model="claude-3-5-sonnet"):
        self.client = anthropic.Anthropic(api_key=api_key)
        self.model = model
    
    def search(self, num_architectures=10):
        # 生成候选架构
        response = self.client.messages.create(
            model=self.model,
            max_tokens=4096,
            messages=[{
                "role": "user",
                "content": f"Generate {num_architectures} Transformer architectures..."
            }]
        )
        
        architectures = self.parse_response(response.content)
        
        # 零代价评估
        scores = [self.zero_cost_eval(arch) for arch in architectures]
        
        # 返回最优
        best_idx = argmax(scores)
        return architectures[best_idx], scores[best_idx]
    
    def zero_cost_eval(self, arch):
        # 实现零代价评估
        return ...

8.2 最佳实践

Prompt Engineering： 详细的架构描述和约束条件
多轮迭代： 允许LLM根据反馈改进
集成评估： 使用多个零代价代理
候选多样性： 生成多样化的架构候选
后处理验证： 过滤不合理的架构

9. 相关主题

参考文献

Chen, Z. S., et al. (2025). ZeroLM: Data-Free Transformer Architecture Search for Language Models. arXiv:2503.18646. ↩

Metaphor

探索

LLM引导的神经架构搜索

LLM引导的神经架构搜索

1. 概述

2. ZeroLM方法

2.1 方法背景

2.2 方法原理

2.3 搜索空间定义

2.4 Prompt设计

2.5 评估机制

3. RZ-NAS方法

3.1 方法背景

3.2 现有LLM-NAS的挑战

3.3 RZ-NAS核心框架

3.4 反思机制详解

3.5 零代价代理集成

4. LLM-NAS方法

4.1 硬件感知扩展

4.2 多目标优化

4.3 LLM-NAS Prompt设计

5. PEL-NAS方法

5.1 提示协同进化

5.2 协同进化效果

6. LLM-NAS的技术细节

6.1 架构表示方法

6.2 反馈机制设计

6.3 LLM选择建议

7. 评估与比较

7.1 评估指标

7.2 方法比较

7.3 实验结果

8. 实践指南

8.1 快速实现模板

8.2 最佳实践

9. 相关主题

参考文献

关系图谱

目录

反向链接

Metaphor

探索

LLM引导的神经架构搜索

LLM引导的神经架构搜索

1. 概述

2. ZeroLM方法

2.1 方法背景

2.2 方法原理

2.3 搜索空间定义

2.4 Prompt设计

2.5 评估机制

3. RZ-NAS方法

3.1 方法背景

3.2 现有LLM-NAS的挑战

3.3 RZ-NAS核心框架

3.4 反思机制详解

3.5 零代价代理集成

4. LLM-NAS方法

4.1 硬件感知扩展

4.2 多目标优化

4.3 LLM-NAS Prompt设计

5. PEL-NAS方法

5.1 提示协同进化

5.2 协同进化效果

6. LLM-NAS的技术细节

6.1 架构表示方法

6.2 反馈机制设计

6.3 LLM选择建议

7. 评估与比较

7.1 评估指标

7.2 方法比较

7.3 实验结果

8. 实践指南

8.1 快速实现模板

8.2 最佳实践

9. 相关主题

参考文献

Footnotes

关系图谱

目录

反向链接