LLM引导的神经架构搜索
1. 概述
LLM引导的NAS(Large Language Model guided NAS)代表了NAS研究的前沿方向1。通过利用大语言模型中嵌入的神经网络设计知识,可以大幅加速架构搜索过程,甚至实现无训练的架构设计。
核心优势:
- 利用预训练LLM的架构知识
- 无需评估大量候选架构
- 可结合人类设计经验
2. ZeroLM方法
2.1 方法背景
论文: Chen et al., “ZeroLM: Data-Free Transformer Architecture Search for Language Models” (arXiv 2503.18646)
核心问题: 传统NAS需要大量计算资源进行架构评估,ZeroLM提出无需数据即可搜索Transformer架构。
2.2 方法原理
ZeroLM利用LLM生成和评估Transformer架构,核心框架:
LLM (GPT-4/Claude)
↓ Prompt Engineering
架构描述 (JSON/Dict)
↓ LLM生成
候选架构集合
↓ Zero-Cost评估
性能排序
↓ 反馈循环
改进的搜索策略
2.3 搜索空间定义
ZeroLM针对Transformer架构设计搜索空间:
| 组件 | 搜索选项 |
|---|---|
| 注意力类型 | Multi-head, Linear, Sparse, Flash |
| 前馈网络 | FFN, SwiGLU, GLU, Feedforward |
| 位置编码 | RoPE, ALiBi, Sinusoidal, None |
| 层配置 | 隐藏维度, 层数, 注意力头数 |
| 正则化 | LayerNorm, RMSNorm, Pre-Norm |
2.4 Prompt设计
架构生成Prompt:
architecture_generation_prompt = """
You are an expert neural network architect specialized in Transformers.
Generate {num_architectures} Transformer architecture configurations
for language modeling. Each configuration should specify:
1. num_layers: Number of transformer layers (4-32)
2. hidden_size: Hidden dimension (256-2048)
3. num_heads: Number of attention heads (4-32)
4. intermediate_size: FFN intermediate size (hidden_size * 2-8)
5. attention_type: Type of attention mechanism
6. position_encoding: Type of positional encoding
7. activation: Activation function type
Consider:
- Computational efficiency vs accuracy trade-offs
- Recent advances in Transformer design
- Hardware compatibility
Return as JSON list of dictionaries.
"""2.5 评估机制
ZeroLM使用多个零代价代理的综合评分:
其中各子分数基于架构参数计算,无需实际训练。
3. RZ-NAS方法
3.1 方法背景
论文: Ji et al., “RZ-NAS: Enhancing LLM-guided NAS via Reflective Zero-Cost Strategy” (ICML 2025)
核心创新: 反思式零代价策略,增强LLM引导NAS的效果。
3.2 现有LLM-NAS的挑战
| 挑战 | 描述 |
|---|---|
| 搜索空间限制 | LLM难以理解复杂搜索空间定义 |
| 搜索效率 | 盲目生成缺乏针对性 |
| 幻觉问题 | LLM可能生成不合理的架构 |
| 评估缺失 | 缺乏可靠的架构性能评估 |
3.3 RZ-NAS核心框架
三阶段流程:
class RZ_NAS:
def __init__(self, llm, zero_cost_proxies):
self.llm = llm
self.proxies = zero_cost_proxies
self.history = []
def reflective_search(self, num_iterations=10):
for iteration in range(num_iterations):
# Stage 1: LLM生成
candidates = self.llm.generate(
prompt=self._build_prompt(iteration),
num_samples=5
)
# Stage 2: Zero-Cost评估
scores = []
for arch in candidates:
score = self.evaluate_zero_cost(arch)
scores.append(score)
# Stage 3: 反思与改进
reflection = self.reflect(scores, candidates)
self.history.append({
'candidates': candidates,
'scores': scores,
'reflection': reflection
})
# 更新Prompt策略
self.update_strategy(reflection)
return self.get_best_architecture()3.4 反思机制详解
反思Prompt设计:
reflection_prompt = """
Based on the previous generation results:
Previous architectures: {history}
Scores: {scores}
Feedback: {feedback}
Analyze:
1. Why did certain architectures receive higher scores?
2. What patterns are associated with good performance?
3. What mistakes were made in low-scoring architectures?
4. How can we improve the generation strategy?
Provide specific guidance for the next generation.
"""3.5 零代价代理集成
RZ-NAS使用多个零代价代理的集成评估:
| 代理 | 权重 | 适用场景 |
|---|---|---|
| Synflow | 0.3 | 无数据评估 |
| NTK | 0.25 | 表达力评估 |
| 频谱 | 0.2 | 权重分布 |
| 参数效率 | 0.25 | 计算效率 |
4. LLM-NAS方法
4.1 硬件感知扩展
论文: Zhu et al., “LLM-NAS: LLM-driven Hardware-Aware Neural Architecture Search” (arXiv 2510.01472)
创新点: 将硬件约束纳入LLM引导的NAS框架。
4.2 多目标优化
LLM-NAS同时优化:
- 准确率 - 通过零代价代理预测
- 延迟 - 通过硬件模型预测
- 能耗 - 通过功耗模型估计
- 内存 - 基于参数量计算
4.3 LLM-NAS Prompt设计
hw_aware_prompt = """
Generate neural network architectures with the following hardware constraints:
Target Device: {device_type}
Latency Budget: {latency_ms}ms
Memory Budget: {memory_mb}MB
Power Budget: {power_w}W
For {device_type}, consider:
- {device_specific_constraints}
Balance accuracy with hardware efficiency.
Return architectures ranked by the trade-off score.
"""5. PEL-NAS方法
5.1 提示协同进化
核心思想: 搜索过程同时优化架构和Prompt,实现协同进化。
class PEL_NAS:
def coevolution(self):
# 初始化
arch_population = self.llm.generate_initial()
prompt_population = self.design_initial_prompts()
for generation in range(num_generations):
# 架构评估
for arch in arch_population:
for prompt in prompt_population:
score = self.evaluate(arch, prompt)
self.fitness[arch, prompt] = score
# 选择
arch_population = self.select_architectures(arch_population)
prompt_population = self.select_prompts(prompt_population)
# 变异
arch_population = self.mutate_architectures(arch_population)
prompt_population = self.mutate_prompts(prompt_population)5.2 协同进化效果
实验表明,协同进化比单独优化架构或Prompt效果更好:
| 方法 | Top-1准确率 | 搜索时间 |
|---|---|---|
| LLM-only | 72.3% | 2h |
| Prompt-only | 68.5% | 1h |
| 顺序优化 | 74.1% | 4h |
| 协同进化 | 76.8% | 3h |
6. LLM-NAS的技术细节
6.1 架构表示方法
代码表示:
architecture_template = """
class {ModelName}(nn.Module):
def __init__(self, config):
super().__init__()
self.layers = nn.ModuleList([
TransformerLayer(
hidden_size={hidden_size},
num_heads={num_heads},
intermediate_size={intermediate_size},
attention_type='{attention_type}',
activation='{activation}'
)
for _ in range({num_layers})
])
self.norm = {norm_type}()
def forward(self, x):
for layer in self.layers:
x = layer(x)
return self.norm(x)
"""JSON表示:
{
"model_name": "ZeroLM_Transformer",
"architecture": {
"num_layers": 12,
"hidden_size": 768,
"num_heads": 12,
"intermediate_size": 3072,
"attention_type": "multi_head",
"position_encoding": "RoPE",
"activation": "silu",
"norm": "RMSNorm",
"dropout": 0.1
},
"estimated_metrics": {
"params": "125M",
"flops": "2.5e11",
"latency_ms": 15.2
}
}6.2 反馈机制设计
多轮对话流程:
def multi_round_dialogue(llm, initial_architecture):
messages = [
{"role": "system", "content": "You are an expert NAS assistant."},
{"role": "user", "content": f"Generate architectures: {initial_architecture}"}
]
for round in range(num_rounds):
response = llm.chat(messages)
arch = parse_architecture(response)
# 评估
score = evaluate_zero_cost(arch)
# 反馈
messages.append({"role": "assistant", "content": response})
messages.append({
"role": "user",
"content": f"Score: {score}. Improve based on: {get_feedback(score)}"
})
return extract_best_architecture(messages)6.3 LLM选择建议
| LLM | 优势 | 劣势 | 推荐场景 |
|---|---|---|---|
| GPT-4 | 架构知识丰富 | 成本高 | 精确搜索 |
| Claude | 推理能力强 | 限制较多 | 反思机制 |
| Llama-3 | 开源可本地部署 | 知识有限 | 快速迭代 |
| Gemini | 多模态 | 架构知识有限 | 联合设计 |
7. 评估与比较
7.1 评估指标
| 指标 | 描述 | 测量方法 |
|---|---|---|
| 搜索效率 | 生成有效架构的速度 | 时间/s |
| Top-K准确率 | Top-K候选中包含最优的比例 | 实验统计 |
| Zero-Shot性能 | 直接使用的性能 | 验证集准确率 |
| 微调后性能 | 少量微调后的性能 | 微调实验 |
7.2 方法比较
| 方法 | 无数据 | 硬件感知 | 反思机制 | 多轮对话 | 代码生成 |
|---|---|---|---|---|---|
| ZeroLM | ✓ | ✗ | ✗ | ✗ | ✓ |
| RZ-NAS | ✓ | 部分 | ✓ | ✓ | ✓ |
| LLM-NAS | ✓ | ✓ | 部分 | 部分 | ✓ |
| PEL-NAS | ✓ | 部分 | 部分 | ✓ | ✓ |
7.3 实验结果
在CIFAR-10和ImageNet上的典型结果:
| 方法 | CIFAR-10准确率 | ImageNet Top-1 | 搜索时间 |
|---|---|---|---|
| Random Search | 92.1% | 68.5% | 48h |
| DARTS | 94.3% | 73.1% | 24h |
| Zero-shot NAS | 93.8% | 72.4% | 2h |
| ZeroLM | 94.6% | 74.2% | 1.5h |
| RZ-NAS | 95.1% | 75.8% | 3h |
8. 实践指南
8.1 快速实现模板
from transformers import AutoModelForCausalLM
import anthropic
class SimpleLLMNAS:
def __init__(self, api_key, model="claude-3-5-sonnet"):
self.client = anthropic.Anthropic(api_key=api_key)
self.model = model
def search(self, num_architectures=10):
# 生成候选架构
response = self.client.messages.create(
model=self.model,
max_tokens=4096,
messages=[{
"role": "user",
"content": f"Generate {num_architectures} Transformer architectures..."
}]
)
architectures = self.parse_response(response.content)
# 零代价评估
scores = [self.zero_cost_eval(arch) for arch in architectures]
# 返回最优
best_idx = argmax(scores)
return architectures[best_idx], scores[best_idx]
def zero_cost_eval(self, arch):
# 实现零代价评估
return ...8.2 最佳实践
- Prompt Engineering: 详细的架构描述和约束条件
- 多轮迭代: 允许LLM根据反馈改进
- 集成评估: 使用多个零代价代理
- 候选多样性: 生成多样化的架构候选
- 后处理验证: 过滤不合理的架构
9. 相关主题
参考文献
Footnotes
-
Chen, Z. S., et al. (2025). ZeroLM: Data-Free Transformer Architecture Search for Language Models. arXiv:2503.18646. ↩