DreamFusion与Score Distillation

概述

DreamFusion是由Google Research和UC Berkeley于2022年提出的开创性工作，首次实现了从文本描述直接生成3D内容。它利用预训练的2D扩散模型（如Imagen）作为3D生成的监督信号，开创了”分数蒸馏采样”(Score Distillation Sampling, SDS)范式。

SDS的核心思想：

利用2D先验：使用大规模2D图像-文本数据预训练的扩散模型
可微渲染：通过可微渲染器将3D表示渲染为2D图像
知识蒸馏：将2D扩散模型的知识迁移到3D生成

Score Distillation Sampling原理

背景：扩散模型

扩散模型通过逐步加噪和去噪学习数据分布。训练时学习去噪网络 $ϵ_{θ} (x_{t}, t, c)$ ，其中：

$x_{t}$ ：时间步 $t$ 的噪声图像
$c$ ：条件（文本描述）
$ϵ_{θ}$ ：预测的噪声

推理时，从纯噪声开始逐步去噪生成图像。

SDS目标函数

SDS从3D表示 $θ$ 渲染2D图像 $x = g (θ)$ ，然后利用扩散模型的评分函数更新 $θ$ ：

\nabla_{θ} L_{S D S} (θ) = E_{t, ϵ} [w (t) \cdot (\overset{ϵ}{^}_{θ} (x_{t}, t, c) - ϵ) \cdot \frac{\partial x}{\partial θ}]

其中：

$x_{t} = α_{t} x + σ_{t} ϵ$ ：加噪后的图像
$\overset{ϵ}{^}_{θ}$ ：预训练扩散模型的噪声预测
$\frac{\partial x}{\partial θ}$ ：渲染器的梯度

简化推导

论文提供了简化版的梯度计算：

\nabla_{θ} L_{S D S} \approx E_{t} [\frac{σ _{t}}{α _{t}} w (t) \cdot (ϵ_{θ} (x_{t}; c) - ϵ) \cdot \frac{\partial x}{\partial θ}]

当权重 $w (t) = \frac{α _{t}}{σ _{t}}$ 时，梯度简化为：

\nabla_{θ} L_{S D S} \approx E_{t} [(ϵ_{θ} (x_{t}; c) - ϵ) \cdot \frac{\partial x}{\partial θ}]

物理解释

SDS可以理解为在渲染图像的邻域中找到”更符合文本描述”的图像，并沿梯度方向更新3D表示：

渲染当前3D表示得到图像 $x$
将图像送入扩散模型获取梯度方向
梯度指向”更优”的图像方向
通过渲染器的反向传播更新3D参数

DreamFusion架构

整体流程

文本输入 → 3D表示初始化 → 迭代优化（SDS）
                                    ↓
                              渲染器 → 2D图像
                                    ↓
                              扩散模型 → SDS梯度
                                    ↓
                              反向传播 → 更新3D表示

3D表示

DreamFusion使用神经辐射场（NeRF）作为3D表示：

class DreamFusionNeRF(nn.Module):
    def __init__(self):
        self.mlp = MLP(input_dim=5, output_dim=4)  # 位置+视角 → 颜色+密度
        
    def forward(self, position, direction):
        """
        position: (N, 3) - 采样点位置
        direction: (N, 3) - 观察方向
        输出: (N, 4) - RGB + 密度
        """
        x = torch.cat([position, direction], dim=-1)
        return self.mlp(x)
    
    def render(self, rays, num_samples=64):
        """体积渲染"""
        points, weights = sample_along_rays(rays, num_samples)
        rgbs, densities = self.forward(points, rays.directions)
        colors = volume_render(rgbs, densities, weights)
        return colors

训练过程

def dreamfusion_train(text_prompt, num_steps=500):
    # 初始化NeRF
    nerf = DreamFusionNeRF().cuda()
    optimizer = torch.optim.Adam(nerf.parameters(), lr=5e-4)
    
    # 预训练扩散模型
    diffusion = load_pretrained_diffusion()  # Imagen
    
    for step in range(num_steps):
        optimizer.zero_grad()
        
        # 1. 随机相机视角
        camera = sample_random_camera()
        
        # 2. 渲染图像
        image = nerf.render(camera)
        
        # 3. 计算SDS梯度
        sds_grad = compute_sds_gradient(diffusion, image, text_prompt)
        
        # 4. 反向传播更新NeRF
        image.backward(sds_grad)
        optimizer.step()
        
    return nerf

SDS梯度计算实现

def compute_sds_gradient(diffusion, image, text_prompt, guidance_scale=100):
    """计算SDS梯度"""
    B = image.shape[0]
    
    # 随机时间步
    t = torch.randint(0, diffusion.num_timesteps, (B,), device=image.device)
    
    # 加噪
    noise = torch.randn_like(image)
    image_t = diffusion.scheduler.add_noise(image, noise, t)
    
    # 噪声预测
    with torch.no_grad():
        noise_pred = diffusion.unet(image_t, t, encoder_hidden_states=text_prompt)
    
    # SDS梯度
    grad = guidance_scale * (noise_pred - noise)
    
    return grad

技术挑战与改进

多面 Janus 问题

问题：DreamFusion经常产生”Janus”效应，即多个视角的正面同时出现在物体上。

原因：

2D扩散模型对每个视角独立优化
缺乏跨视角一致性约束

解决方案：

多视角条件：引入多视角一致性损失
视角采样策略：分阶段学习不同视角
3D一致性正则化：鼓励不同视角渲染结果的相似性

训练不稳定

问题：SDS训练容易出现：

过度饱和/颜色失真
棉絮状伪影
训练发散

解决方案：

改进的SDS变体

VSD (Variational Score Distillation)：

ProlificDreamer提出的变分分数蒸馏使用贝叶斯公式：

p (θ ∣ text) \propto p (text ∣ θ) p (θ)

引入3D表示的先验分布：

def vsd_gradient(diffusion, image, text, beta=1.0):
    """VSD梯度"""
    # 采样LoRA权重扰动
    delta_theta = sample_from_prior()
    
    # 加噪LoRA图像
    t = sample_timestep()
    noise = sample_noise()
    image_noisy = add_noise(image, t)
    
    # 双路径噪声预测
    noise_pred_clean = diffusion(image_noisy, text)
    noise_pred_perturbed = diffusion(image_noisy, text, lora=delta_theta)
    
    # VSD梯度
    grad = noise_pred_clean - beta * noise_pred_perturbed
    
    return grad

几何质量差

问题：原始SDS倾向于生成”2.5D”表示，缺乏真实的3D几何。

解决方案：

引入几何先验：使用单目法向量估计
深度正则化：鼓励深度图一致性
多阶段优化：先优化几何，后优化纹理

Score Jacobian Chaining

概述

Score Jacobian Chaining (SJC) 提出了更理论化的SDS框架。

核心思想

SJC认为SDS等价于链式法则：

\nabla_{θ} lo g p (x ∣ θ) = \frac{\partial lo g p ( x )}{\partial x} \cdot \frac{\partial x}{\partial θ}

其中 $\frac{\partial l o g p ( x )}{\partial x}$ 是扩散模型的对数概率梯度。

SJC vs SDS

方面	SDS	SJC
理论基础	噪声预测	对数概率
梯度方向	噪声空间	图像空间
数值稳定性	中等	更好
适用场景	通用	特定场景

改进方法对比

方法汇总

方法	年份	主要贡献	优点	缺点
DreamFusion	2022	开创SDS范式	首创性强	质量有限
Magic3D	2022	高分辨率改进	质量提升	速度较慢
ProlificDreamer	2023	VSD框架	质量显著提升	训练复杂
SJC	2023	理论框架	理论严谨	实现复杂
Fantasia3D	2023	几何-纹理解耦	可控性强	需要额外数据

SDS变体演进

DreamFusion (2022)
    ↓
Magic3D (2022): Coarse-to-Fine, 高分辨率
    ↓
ProlificDreamer (2023): VSD, LoRA先验
    ↓
Score Jacobian Chaining (2023): 理论改进
    ↓
主流方法 (2024+): 多方法融合

实现细节与技巧

渲染器配置

class ImprovedRenderer:
    def __init__(self):
        self.render_resolution = 64  # 初始低分辨率
        self.num_samples = 128      # 采样点数
        self.background_color = (1.0, 1.0, 1.0)  # 白色背景
        
    def render_with_augmentation(self, camera):
        """带数据增强的渲染"""
        # 随机光照
        light_direction = random_unit_vector()
        
        # 渲染
        image = self.render(camera, light_direction)
        
        # 增强
        image = color_jitter(image)
        image = random_crop(image, crop_size=64)
        
        return image

训练策略

课程学习

def curriculum_training(prompt, model):
    # 阶段1: 粗略几何
    optimize_geometry(model, prompt, epochs=100, lr=1e-3)
    
    # 阶段2: 细节优化
    optimize_detail(model, prompt, epochs=200, lr=5e-4)
    
    # 阶段3: 纹理细化
    optimize_texture(model, prompt, epochs=100, lr=1e-4)

相机采样

def diverse_camera_sampling():
    """多样化相机采样"""
    # 方位角均匀采样
    azimuths = torch.linspace(0, 2*PI, num_views)
    
    # 俯仰角集中在水平线附近
    elevations = torch.normal(mean=0, std=0.3, size=(num_views,))
    
    cameras = []
    for az, el in zip(azimuths, elevations):
        cameras.append(create_camera(azimuth=az, elevation=el))
    
    return cameras

Metaphor

探索

DreamFusion与Score Distillation

DreamFusion与Score Distillation

概述

Score Distillation Sampling原理

背景：扩散模型

SDS目标函数

简化推导

物理解释

DreamFusion架构

整体流程

3D表示

训练过程

SDS梯度计算实现

技术挑战与改进

多面 Janus 问题

训练不稳定

改进的SDS变体

几何质量差

Score Jacobian Chaining

概述

核心思想

SJC vs SDS

改进方法对比

方法汇总

SDS变体演进

实现细节与技巧

渲染器配置

训练策略

课程学习

相机采样

应用场景

文本到3D资产生成

产品设计

科学研究

局限性与未来方向

当前局限

未来方向

参考论文

相关资源

关系图谱

目录