Metaphor

标签: llm-training

此标签下有8条笔记。

2026年6月20日
强化学习后训练专题索引
2026年6月20日
PPO、GRPO与DAPO算法对比分析
2026年5月17日
Group Policy Gradient 简单有效的LLM推理强化学习
2026年5月14日
RLVR推理能力训练
2026年5月12日
DAPO（离散策略优化算法）
2026年5月12日
GRPO（组相对策略优化）
2026年5月12日
ORPO（Odds Ratio Preference Optimization）
2026年5月08日
UltraLong-8B：从128K到4M上下文训练

Created with Quartz v4.5.2 © 2026

GitHub
Discord Community