Flow Matching最优传输理论

1. 引言

Flow Matching（流匹配）是一种新兴的生成建模框架，通过学习从源分布到目标分布的概率路径来实现生成。不同于传统扩散模型需要模拟随机微分方程，Flow Matching直接学习确定性的速度场，使得训练和采样更加高效。

本文档为 Diffusion Model Theory 的进阶内容，假设读者已熟悉扩散模型基础理论。

2. Flow Matching基础

2.1 问题定义

Flow Matching的目标是学习一个时间依赖的速度场 $v : R^{d} \times [0, 1] \to R^{d}$ ，使得：

\frac{d x _{t}}{d t} = v (x_{t}, t), x_{0} \sim p_{0}, x_{1} \sim p_{1}

其中 $p_{0}$ 是先验分布（如标准高斯）， $p_{1}$ 是数据分布。

2.2 条件速度场

条件Flow Matching通过条件速度场来简化学习：

v (x_{t}, t) = \int v (x_{t} ∣ x_{1}) q (x_{1} ∣ x_{t}, t) d x_{1}

其中 $q (x_{1} ∣ x_{t}, t)$ 是后验分布。

2.3 训练目标

Flow Matching的损失函数定义为：

L_{FM} = E_{t, x_{0}, x_{1}} [∥ v_{θ} (x_{t}, t) - v^{*} (x_{t}, t) ∥^{2}]

其中 $x_{t} = ϕ_{t} (x_{0}, x_{1})$ 是插值路径。

3. 最优传输视角

3.1 最优传输基础

最优传输（Optimal Transport, OT）理论研究如何以最小成本将一个分布”传输”到另一个分布。Wasserstein距离定义为：

W_{2} (p_{0}, p_{1}) = (γ \in Γ (p_{0}, p_{1}) in f \int ∥ x - y ∥^{2} d γ (x, y))^{1/2}

其中 $Γ (p_{0}, p_{1})$ 是所有从 $p_{0}$ 到 $p_{1}$ 的耦合分布。

3.2 最优传输Flow Matching

当插值路径是最优传输映射时，得到最优传输Flow Matching¹：

定理1：如果 $v^{*}$ 是来自最优传输映射的速度场，则对应Flow Matching解为：

ϕ_{t} (x_{0}) = (1 - t) x_{0} + tT (x_{0})

其中 $T$ 是Monge最优传输映射。

3.3 插值路径设计

不同插值路径对应不同Flow Matching变体：

路径类型	公式	特性
线性插值	$x_{t} = (1 - t) x_{0} + t x_{1}$	简单但非最优
能量插值	$x_{t} = σ_{t} x_{0} + \overset{σ}{ˉ}_{t} x_{1}$	高斯假设
最优传输	$x_{t} = (1 - t) x_{0} + tT (x_{0})$	最优但需映射

4. Rectified Flows与最优传输

4.1 Rectified Flows定义

Rectified Flows是一种特殊的Flow Matching方法，通过”修正”随机过程来获得更优的性质²：

修正过程：

x_{t}^{rectified} = (1 - t) x_{0} + t x_{1} + \frac{1}{2} t (1 - t) (\nabla lo g p_{t} (x_{t}) - \nabla lo g p_{0} (x_{t}))

4.2 与最优传输的联系

Hertrich等人（2025）建立了Rectified Flows与最优传输的深层联系³：

定理2：对于某些分布，Rectified Flow的速度场与最优传输映射的速度场一致。

4.3 统一框架

统一框架将Diffusion Bridge、Flow Matching和Rectified Flows纳入同一理论框架⁴：

graph TD
    A[扩散模型] --> B[Flow Matching]
    A --> C[Diffusion Bridge]
    B --> D[最优传输Flow Matching]
    C --> D
    D --> E[统一理论框架]

5. 收敛性理论

5.1 维度改进的KL界

Gentiloni Silveri等人（2026）提出了改进的Flow Matching收敛保证⁵：

定理3（维度改进的KL界）：对于布朗运动基Flow Matching：

K L (p_{1}^{learned} ∥ p_{1}^{true}) \leq O (\frac{d}{N} + N ϵ^{2})

其中：

$d$ 是数据维度
$N$ 是采样步数
$ϵ$ 是速度场近似误差

5.2 Wasserstein保证

定理4（Wasserstein收敛）：

W_{2} (p_{1}^{learned}, p_{1}^{true}) \leq C \cdot ϵ + O (\frac{1}{N})

5.3 与扩散模型比较

指标	扩散Flow Matching	标准扩散模型
训练目标	确定性速度场	随机score
采样	ODE（固定步长）	SDE或ODE
理论保证	逐渐建立	较完善
实践效率	通常更快	依赖调度

6. Flow Matching with Neural Networks

6.1 理论框架

He等人（2026）建立了神经网络参数化Flow Matching的严格理论⁶：

设置：使用2层ReLU神经网络参数化条件速度场：

v_{θ} (x, t) = \frac{1}{m} r = 1 \sum m a_{r} σ (w_{r}^{T} x + b_{r}) + c^{T} x + d

6.2 收敛保证

定理5（过参数化Regime的梯度下降收敛）：

假设网络宽度 $m \geq \tilde{Ω} (d)$ ，则随机梯度下降可以在 $O (lo g (1/ ε))$ 迭代内找到 $ε$ -近似解。

6.3 泛化边界

定理6（泛化边界）：对于条件速度场估计：

E [∥ v_{θ} - v^{*} ∥^{2}] \leq \tilde{O} (\frac{R ^{2}}{n} + λ_{m i n}^{- 1} ϵ_{opt})

其中 $n$ 是样本数， $R$ 是网络参数范围。

7. 熵控制Flow Matching

7.1 动机

标准Flow Matching目标不直接控制轨迹的信息几何，可能导致低熵瓶颈⁷：

某些语义模式可能暂时消失
生成样本多样性受限

7.2 熵控制方法

Entropy-Controlled Flow Matching 引入熵正则化：

L_{ent} = L_{FM} + λ \cdot H ({p_{t}})

其中 $H ({p_{t}}) = \int_{0}^{1} H (p_{t}) d t$ 是轨迹熵。

7.3 几何解释

熵控制Flow Matching确保：

轨迹不穿过低密度区域
语义模式在整个生成过程中保持
生成分布具有适当的方差

8. 多边际Flow Matching

8.1 问题扩展

多边际Flow Matching处理有中间观测边际的情况⁸：

x_{0} \sim p_{0}, x_{t_{1}} \sim p_{1}, \dots, x_{t_{k}} \sim p_{k}, x_{1} \sim p_{1}

8.2 最优传输势函数

利用最优传输势函数学习多边际耦合：

L_{MM} = E [i = 1 \sum k ∥ v_{θ} (x_{t_{i}}, t_{i}) - v_{i}^{*} (x_{t_{i}}) ∥^{2}]

8.3 应用场景

多步图像到图像转换
时序数据生成
跨域对齐

9. 动态不平衡最优传输

9.1 WFR-FM框架

WFR-FM（Workflow Rectified Flow Matching）处理不平衡质量传输⁹：

核心思想：允许质量在传输过程中创建或销毁：

\frac{d x _{t}}{d t} = v (x_{t}, t) + s (t) \cdot x_{t}

其中 $s (t)$ 是源/汇项。

9.2 科学应用

该方法在单细胞RNA测序数据对齐等科学应用中表现优异：

处理不同大小的细胞群体
保持局部结构同时对齐全局分布

10. Wasserstein梯度流视角

10.1 扩散模型重解释

Vuong等人（2026）提出扩散模型的Wasserstein梯度流重解释¹⁰：

核心发现：扩散模型的score matching目标等价于在Wasserstein空间中执行梯度流。

10.2 统一视角

graph LR
    A[Score Matching] --> B[Wasserstein梯度流]
    C[Flow Matching] --> B
    D[Rectified Flows] --> B
    B --> E[统一生成框架]

10.3 理论优势

共享的理论基础
统一的最优性分析
更好的误差分析

11. 实践实现

11.1 基础实现

import torch
import torch.nn as nn
 
class FlowMatchingNet(nn.Module):
    def __init__(self, dim, hidden=1024):
        super().__init__()
        self.net = nn.Sequential(
            nn.Linear(dim + 1, hidden),  # +1 for time
            nn.SiLU(),
            nn.Linear(hidden, hidden),
            nn.SiLU(),
            nn.Linear(hidden, dim)
        )
    
    def forward(self, x, t):
        t_emb = t.view(-1, 1).expand(-1, x.shape[1])
        h = torch.cat([x, t_emb], dim=-1)
        return self.net(h)
 
def flow_matching_loss(model, x0, x1, t=None):
    if t is None:
        t = torch.rand(x0.shape[0], device=x0.device)
    # Interpolate
    xt = t.view(-1, 1) * x1 + (1 - t.view(-1, 1)) * x0
    # Target velocity
    v_target = x1 - x0
    # Predicted velocity
    v_pred = model(xt, t)
    return ((v_pred - v_target) ** 2).mean()

11.2 最优传输Flow Matching

def ot_flow_matching_loss(model, x0, x1):
    """Optimal Transport Flow Matching with Sinkhorn."""
    # Compute optimal transport plan
    C = cost_matrix(x0, x1)  # pairwise cost
    P = sinkhorn(x0, x1, C, reg=0.01)  # Sinkhorn plan
    
    # Sample from plan
    idx = torch.multinomial(P.sum(dim=0), 1).squeeze()
    x1_matched = x1[idx]
    
    # Standard flow matching with matched pairs
    t = torch.rand(x0.shape[0], device=x0.device)
    xt = (1 - t.view(-1, 1)) * x0 + t.view(-1, 1) * x1_matched
    v_target = x1_matched - x0
    
    return ((model(xt, t) - v_target) ** 2).mean()

12. 与wiki现有内容的联系

本文档与以下文档形成完整的生成模型知识体系：

Diffusion Model Theory - 扩散模型基础理论
Diffusion vs Flow Matching - 对比分析
Flow Map Framework - 统一生成建模
Rectified Flows - 最优传输版本

参考文献

Lipman et al. (2022). “Flow Matching for Generative Modeling.” ICLR 2022. ↩
Liu et al. (2022). “Flow Straight and Fast: Learning to Generate and Transfer Data with Rectified Flow.” ↩
Hertrich et al. (2025). “On the Relation between Rectified Flows and Optimal Transport.” arXiv:2505.19712. ↩
Zhu et al. (2025). “Diffusion Bridge or Flow Matching? A Unifying Framework and Comparative Analysis.” arXiv:2509.24531. ↩
Gentiloni Silveri et al. (2026). “Diffusion Flow Matching: Dimension-Improved KL Bounds and Wasserstein Guarantees.” arXiv:2606.16610. ↩
He et al. (2026). “A Theory on Flow Matching with Neural Networks.” arXiv:2606.10089. ↩
Maduabuchi (2026). “Entropy-Controlled Flow Matching.” arXiv:2602.22265. ↩
Kansal et al. (2026). “Multimarginal flow matching with optimal transport potentials.” arXiv:2606.05327. ↩
Peng et al. (2026). “WFR-FM: Simulation-Free Dynamic Unbalanced Optimal Transport.” arXiv:2601.06810. ↩
Vuong et al. (2026). “Are We Really Learning the Score Function? Reinterpreting Diffusion Models Through Wasserstein Gradient Flow Matching.” OpenReview. ↩

Metaphor

探索

Flow Matching - Optimal Transport Theory