Score-based采样理论

1 引言

Score-based模型是现代生成模型的重要分支，通过学习数据分布的**梯度场（score function）**来实现采样。¹

核心思想：

不直接建模概率密度 $p (x)$
学习对数密度的梯度 $\nabla_{x} lo g p (x)$
利用该梯度进行采样

本章系统介绍Score-based采样理论，包括Score Matching、朗之万动力学、以及在扩散模型中的应用。

2 Score Matching基础

2.1 Score函数定义

Score函数是对数密度的梯度：

s_{θ} (x) = \nabla_{x} lo g p_{θ} (x)

性质：

指向密度增长最快的方向
不需要归一化常数 $Z$
与能量模型密切相关： $s_{θ} (x) = - \nabla_{x} E_{θ} (x)$

2.2 Score Matching目标

问题：如何学习score函数？

Score Matching损失（Hyvärinen, 2005）：

L_{SM} (θ) = E_{p_{data} (x)} [\frac{1}{2} ∥ s_{θ} (x) ∥^{2} + tr (\nabla_{x} s_{θ} (x))]

直观理解：最小化score函数的Fisher散度。

2.3 简化Score Matching

直接计算 $tr (\nabla_{x} s_{θ} (x))$ 在高维时很困难。

切片Score Matching（SSM）：

L_{SSM} (θ) = E_{p_{data} (x)} E_{p_{v} (v)} [v^{T} \nabla_{x} s_{θ} (x) v + \frac{1}{2} ∥ v^{T} s_{θ} (x) ∥^{2}]

其中 $v \sim N (0, I)$ 是随机投影方向。

2.4 代码实现

import torch
import torch.nn as nn
 
class ScoreNetwork(nn.Module):
    """Score网络"""
    
    def __init__(self, dim, hidden_dims=[128, 256, 128]):
        super().__init__()
        
        layers = []
        prev_dim = dim
        for h_dim in hidden_dims:
            layers.extend([
                nn.Linear(prev_dim, h_dim),
                nn.LeakyReLU(0.2)
            ])
            prev_dim = h_dim
        layers.append(nn.Linear(prev_dim, dim))
        
        self.network = nn.Sequential(*layers)
    
    def forward(self, x):
        return self.network(x)
 
def sliced_score_matching_loss(score_net, x, n_projections=10):
    """
    切片Score Matching损失
    """
    x.requires_grad_(True)
    
    score = score_net(x)
    
    # 计算导数
    grads = torch.autograd.grad(
        outputs=(score * x).sum(),
        inputs=x,
        create_graph=True
    )[0]
    
    # 随机投影
    batch_size = x.shape[0]
    loss = 0
    
    for _ in range(n_projections):
        v = torch.randn_like(x)
        v = v / v.norm(dim=-1, keepdim=True)
        
        # 切片损失
        loss += 0.5 * (v * score).pow(2).sum(dim=-1).mean()
        loss += (v * grads * v).sum(dim=-1).mean()
    
    return loss / n_projections

3 朗之万动力学采样

3.1 连续时间朗之万动力学

朗之万动力学是物理中的布朗运动模型，结合势能梯度进行采样：

d X_{t} = - \nabla_{x} lo g p (X_{t}) d t + 2 d W_{t}

其中 $W_{t}$ 是维纳过程（Wiener Process）。

稳态分布：当 $t \to \infty$ 时， $X_{t}$ 的分布收敛到 $p (x)$ 。

3.2 离散时间迭代

欧拉—Maruyama离散化：

x_{t + Δ t} = x_{t} - Δ t \cdot \nabla_{x} lo g p (x_{t}) + 2Δ t \cdot ϵ_{t}

其中 $ϵ_{t} \sim N (0, I)$ 。

3.3 朗之万动力学与Score Matching的联系

定理：朗之万动力学采样使用学习到的score函数：

x_{t + Δ t} = x_{t} - Δ t \cdot s_{θ} (x_{t}) + 2Δ t \cdot ϵ_{t}

3.4 收敛性分析

朗之万动力学的收敛速度取决于：

步长 $Δ t$ ：越小越准确，但需要更多迭代
目标分布性质：强凸快速收敛
初始点：远离模式需要更多迭代

KL散度收敛界：

KL (x_{t} ∥ p) \leq e^{- t} \cdot KL (x_{0} ∥ p) + (1 - e^{- t}) \cdot C

其中 $C$ 是与步长相关的常数。

3.5 代码实现

def langevin_sampling(score_net, n_samples, dim, n_steps=1000,
                     step_size=0.01, noise_scale=1.0):
    """
    朗之万动力学采样
    """
    # 从先验采样（通常是标准高斯）
    x = torch.randn(n_samples, dim)
    x.requires_grad_(True)
    
    samples = []
    
    for step in range(n_steps):
        # 计算score
        score = score_net(x)
        
        # 朗之万更新
        noise = torch.randn_like(x)
        x = x - step_size * score + noise_scale * np.sqrt(2 * step_size) * noise
        
        # 记录样本（burn-in后）
        if step > n_steps // 2:
            samples.append(x.detach().clone())
    
    return torch.stack(samples)

4 ODE采样器

4.1 概率流ODE

Score-based模型对应一个确定性的ODE系统：

\frac{d x ( t )}{d t} = - \nabla_{x} lo g \tilde{p}_{t} (x (t))

其中 $\tilde{p}_{t}$ 是时间 $t$ 的边际分布。

关键性质：

轨迹是确定的
可逆
可以使用高精度ODE求解器

4.2 从噪声到数据

前向过程（数据 → 噪声）：

x (t) = \overset{α}{ˉ}_{t} \cdot x_{0} + 1 - \overset{α}{ˉ}_{t} \cdot ϵ

反向过程（ODE求解）：

\frac{d x}{d t} = - \frac{1}{2} \frac{d σ ^{2} ( t )}{d t} \nabla_{x} lo g p (x (t))

4.3 ODE求解器

Euler方法（一阶）：

def euler_ode_sampler(score_net, xT, n_steps=100, dt=0.01):
    """Euler ODE采样器"""
    trajectory = [xT]
    x = xT
    
    for _ in range(n_steps):
        score = score_net(x)
        x = x - dt * score  # Euler步
        trajectory.append(x)
    
    return trajectory[-1]

Heun方法（二阶，更稳定）：

def heun_ode_sampler(score_net, xT, n_steps=100, dt=0.01):
    """Heun ODE采样器（二阶）"""
    x = xT
    
    for _ in range(n_steps):
        k1 = -score_net(x)
        k2 = -(score_net(x + dt * k1))
        
        x = x + (dt / 2) * (k1 + k2)
    
    return x

4.4 自适应步长

使用自适应步长提高效率：

from scipy.integrate import solve_ivp
 
def adaptive_ode_sampler(score_net, xT, T=1.0, tol=1e-5):
    """自适应步长ODE采样器"""
    
    def ode_system(t, x):
        x = torch.tensor(x, dtype=torch.float32).reshape(1, -1)
        score = score_net(x)
        return -score.squeeze().numpy()
    
    # 使用scipy求解器
    sol = solve_ivp(ode_system, [T, 0], xT.squeeze().numpy(),
                   method='RK45', rtol=tol, atol=tol)
    
    return torch.tensor(sol.y[:, -1]).reshape(xT.shape)

4.5 与SDE的关系

ODE是SDE的确定性近似：

SDE : d x = f (x) d t + g (t) d W

ODE : d x = f (x) d t

当 $g (t) \to 0$ 时，SDE退化为ODE。

5 SDE采样器

5.1 前向与反向SDE

前向SDE（数据 → 噪声）：

d X_{t} = f (X_{t}, t) d t + g (t) d W_{t}

反向SDE（噪声 → 数据）：

d X_{t} = [f (X_{t}, t) - g (t)^{2} \nabla_{x} lo g p_{t} (X_{t})] d t + g (t) d \overset{ˉ}{W}_{t}

其中 $\overset{ˉ}{W}_{t}$ 是反向时间的维纳过程。

5.2 常用SDE形式

Variance Preserving (VP)：

d X_{t} = - \frac{1}{2} β (t) X_{t} d t + β (t) d W_{t}

Variance Exploding (VE)：

d X_{t} = \frac{d σ ^{2} ( t )}{d t} d W_{t}

sub-VP（介于两者之间）：

d X_{t} = - \frac{1}{2} β (t) X_{t} d t + β (t) (1 - e^{- 2 \int_{0}^{t} β (s) d s}) d W_{t}

5.3 SDE求解器

Euler-Maruyama方法：

def sde_sampler(score_net, xT, T=1.0, n_steps=1000):
    """SDE采样器"""
    dt = T / n_steps
    x = xT
    trajectory = [x]
    
    for i in reversed(range(n_steps)):
        t = i * dt
        
        # 漂移和扩散项（以VP为例）
        beta_t = beta_schedule(t)
        drift = -0.5 * beta_t * x
        diffusion = np.sqrt(beta_t)
        
        # SDE步
        score = score_net(x, t)
        dw = torch.randn_like(x) * np.sqrt(dt)
        x = x + drift * dt + diffusion * dw
        
        trajectory.append(x)
    
    return trajectory[-1], trajectory

5.4 ODE vs SDE

方面	ODE采样器	SDE采样器
轨迹	确定性	随机性
计算量	较低	较高
样本多样性	较低	较高
可逆性	完全可逆	不可逆
精度	高（高精度求解器）	依赖步长

6 加速采样技术

6.1 ODE加速方法

高阶求解器：

方法	阶数	每次迭代成本
Euler	1	1×score
Heun	2	2×score
RK4	4	4×score
Adams	可变	可变

动态分辨率ODE：

在低噪声区域使用大步长：

Δ t \propto \frac{1}{∥ \nabla _{x} lo g p ( x ) ∥}

6.2 SDE加速方法

方差缩减：

x_{t + Δ t} = x_{t} + drift \cdot Δ t + Δ t \cdot σ \cdot (ϵ + c)

其中 $c$ 是控制变量。

截断高噪声：

在 $t \to T$ 时减少步长，因为此时梯度变化剧烈。

6.3 蒸馏方法

一致性模型（Consistency Models）：

一致性模型学习一个函数 $f_{θ} (x_{t}, t)$ ，满足：

f_{θ} (x_{t}, t) = f_{θ} (x_{t + Δ}, t + Δ)

采样：

x_{t + Δ} = f_{θ} (x_{t}, t)

只需 1-2 步即可采样。

6.4 对抗采样

对抗散度：

使用判别器指导采样过程：

class AdversarialSampler:
    """对抗采样器"""
    
    def __init__(self, generator, discriminator, n_steps=10):
        self.generator = generator
        self.discriminator = discriminator
        self.n_steps = n_steps
    
    def sample(self, z):
        """对抗采样"""
        x = z
        for step in reversed(range(self.n_steps)):
            # 判别器反馈
            score = self.discriminator(x)
            
            # 生成器更新
            x = self.generator.step(x, score)
        
        return x

6.5 伪代码：完整加速采样流程

def accelerated_sampling(score_net, config):
    """
    加速采样完整流程
    """
    method = config.get('method', 'ode')
    n_steps = config.get('n_steps', 100)
    use_heun = config.get('use_heun', True)
    
    # 初始化
    x = torch.randn(config['batch_size'], config['dim'])
    
    if method == 'ode':
        if use_heun:
            return heun_sampling(score_net, x, n_steps)
        else:
            return euler_sampling(score_net, x, n_steps)
    
    elif method == 'sde':
        return sde_sampling(score_net, x, n_steps)
    
    elif method == 'consistency':
        return consistency_sampling(score_net, x, n_steps)
    
    elif method == 'adversarial':
        return adversarial_sampling(score_net, x, n_steps)

7 与扩散模型的联系

7.1 DDPM的采样视角

DDPM的反向过程本质上是一个离散化的SDE：

x_{t - 1} = \frac{1}{α _{t}} (x_{t} - \frac{1 - α _{t}}{1 - α ˉ _{t}} ϵ_{θ} (x_{t}, t)) + σ_{t} z_{t}

与Score Matching的联系：

\nabla_{x_{t}} lo g p_{t} (x_{t}) \approx - \frac{1}{1 - α ˉ _{t}} ϵ_{θ} (x_{t}, t)

7.2 Score-based隐式生成模型

Score-based模型可以理解为隐式生成模型：

x = G (z, θ), p (x) = p (z) det \frac{\partial G ^{- 1}}{\partial x}

详见 score-matching-sde 和 diffusion-model。

7.3 Flow Matching

Flow Matching是score-based模型的连续替代：

\frac{d x ( t )}{d t} = v_{θ} (x (t), t)

其中 $v_{θ}$ 是学习的向量场。

详见 diffusion-flow-matching 和 rectified-flows-optimal-transport。

8 实践指南

8.1 方法选择

场景	推荐方法	理由
高质量需求	SDE + 精细步长	最高质量
快速生成	Consistency Model	1-2步
平衡质量速度	ODE + Heun	中等质量速度
确定性生成	ODE	可重复
多样性需求	SDE	随机性

8.2 步长选择

步长	质量	速度
$Δ t = 0.1$	低	快
$Δ t = 0.01$	中	中
$Δ t = 0.001$	高	慢
自适应	取决于求解器	最优

8.3 质量-效率权衡

质量 ↑
      │
      │     ● SDE (N=1000)
      │   ●
      │  ● ODE (Heun, N=100)
      │ ●
      │● ODE (Euler, N=50)
      │●
      └──────────────────→ 速度

9 与相关内容的联系

9.1 扩散模型

完整扩散模型理论：

diffusion-model — 扩散模型基础
score-matching-sde — Score Matching与SDE
diffusion-sampling-acceleration — 采样加速技术

9.2 生成模型

Score-based模型与其他生成模型的关系：

9.3 采样理论

基础采样理论：

sampling-theory-deep — 采样理论深度
langevin-dynamics — 朗之万动力学
mcmc-methods — MCMC方法

10 总结

本章系统介绍了Score-based采样理论：

Score Matching基础：学习score函数的理论和方法
朗之万动力学：基于随机微分的采样方法
ODE采样器：确定性采样、Heun等高阶方法
SDE采样器：随机性采样、与ODE的关系
加速采样技术：蒸馏、一致性模型、对抗采样

Score-based采样代表了现代生成模型的核心技术，为扩散模型和一致性模型提供了理论基础。

参考文献

Song, Y., et al. (2021). Score-based generative modeling through stochastic differential equations. International Conference on Learning Representations. ↩

Metaphor

探索

Score-based采样理论

Score-based采样理论

1 引言

2 Score Matching基础

2.1 Score函数定义

2.2 Score Matching目标

2.3 简化Score Matching

2.4 代码实现

3 朗之万动力学采样

3.1 连续时间朗之万动力学

3.2 离散时间迭代

3.3 朗之万动力学与Score Matching的联系

3.4 收敛性分析

3.5 代码实现

4 ODE采样器

4.1 概率流ODE

4.2 从噪声到数据

4.3 ODE求解器

4.4 自适应步长

4.5 与SDE的关系

5 SDE采样器

5.1 前向与反向SDE

5.2 常用SDE形式

5.3 SDE求解器

5.4 ODE vs SDE

6 加速采样技术

6.1 ODE加速方法

6.2 SDE加速方法

6.3 蒸馏方法

6.4 对抗采样

6.5 伪代码：完整加速采样流程

7 与扩散模型的联系

7.1 DDPM的采样视角

7.2 Score-based隐式生成模型

7.3 Flow Matching

8 实践指南

8.1 方法选择

8.2 步长选择

8.3 质量-效率权衡

9 与相关内容的联系

9.1 扩散模型

9.2 生成模型

9.3 采样理论

10 总结

参考文献

Footnotes

关系图谱

目录