概述

双曲神经网络(Hyperbolic Neural Networks, HNN)将神经网络的基本操作推广到双曲空间。与欧几里得网络相比,双曲网络在处理层次结构数据时具有更高的表达能力。

核心思想:保持网络的层次归纳偏置,将传统操作(线性变换、激活、归一化)替换为它们在黎曼流形上的对应物。


双曲线性层

基本原理

双曲线性层的核心是 Mobius 矩阵乘法

其中 是学习参数, 是输入。

完整的前向传播

其中 是非线性激活函数在 Mobius 意义下的推广。

非线性激活

在双曲空间中,非线性激活需要满足Mobius 非线性的定义:

其中 是欧几里得空间中的非线性函数(如 ReLU、Sigmoid)。

常用激活的 Mobius 推广

欧几里得激活Mobius 推广公式
IdentityIdentity
Sigmoid
ReLU

数值稳定性

Mobius 运算在高维或深层网络中容易出现 NaN 问题。常见策略:

  1. 投影到有界区域
  2. 使用 Lorentz 模型:数值更稳定
  3. 梯度裁剪:限制梯度范数
def mobius_add_safe(x, v, c, eps=1e-8):
    """安全的Mobius加法"""
    norm_x = torch.norm(x, dim=-1, keepdim=True).clamp(min=eps)
    norm_v = torch.norm(v, dim=-1, keepdim=True).clamp(min=eps)
    
    # 投影到安全区域
    x = x * torch.clamp(norm_x, max=c * (1 - eps))
    
    # 计算
    theta = torch.sum(x * v, dim=-1, keepdim=True) * 2 * c - c * torch.sum(v * v, dim=-1, keepdim=True)
    denom = 1 - 2 * c * theta / c**2 + c**2 * torch.sum(v * v, dim=-1, keepdim=True) / c**4
    
    return ((1 + c * torch.sum(x * x, dim=-1, keepdim=True) / c**2) * v - 
            2 * c * theta / c**2 * x) / denom.clamp(min=eps)

双曲注意力机制

背景

注意力机制需要计算 Query-Key-Value 的相似度。在双曲空间中,这通过切空间投影实现。

双曲注意力计算

  1. 投影到切空间

  2. 计算注意力权重(在切空间中):

  1. 加权求和(在双曲空间中):

双曲多头注意力

其中每个 head 在独立的 Poincaré ball 中运行:

Lorentz 注意力

更稳定实现使用 Lorentz 模型的内积:

Lorentz 注意力分数


双曲归一化

黎曼批量归一化(Riemannian BatchNorm)

双曲空间中的批量归一化需要计算数据的 Fréchet 均值(黎曼质心):

Riemannian BatchNorm 算法

  1. 计算黎曼质心
  2. 计算黎曼协方差(切空间中)
  3. 执行白化变换
  4. 重新缩放和平移
class RiemannianBatchNorm(nn.Module):
    """黎曼批量归一化"""
    
    def __init__(self, dim, c=1.0, momentum=0.1, eps=1e-5):
        super().__init__()
        self.c = c
        self.momentum = momentum
        self.eps = eps
        
        # 可学习参数
        self.weight = nn.Parameter(torch.ones(dim))
        self.bias = nn.Parameter(torch.zeros(dim))
        
        # 统计量
        self.running_mean = None
        self.running_var = None
    
    def forward(self, x):
        c = self.c
        
        if self.training:
            # 计算黎曼质心
            mean = self._riemannian_mean(x)
            
            # 投影到切空间并计算方差
            x_log = self._log_map(x, mean)
            var = torch.var(x_log, dim=0, unbiased=False)
            
            # 更新运行统计量
            if self.running_mean is None:
                self.running_mean = mean.detach()
                self.running_var = var.detach()
            else:
                self.running_mean = (1 - self.momentum) * self.running_mean + self.momentum * mean.detach()
                self.running_var = (1 - self.momentum) * self.running_var + self.momentum * var.detach()
        else:
            mean = self.running_mean
            var = self.running_var
        
        # 归一化
        x_centered = self._mobius_add(-mean, x)
        x_normalized = x_centered / (torch.sqrt(var + self.eps) * torch.sqrt(c))
        
        # 缩放和平移
        return self._mobius_add(self.weight * x_normalized, self.bias)
    
    def _riemannian_mean(self, x, lr=0.1, max_iter=100):
        """黎曼梯度下降求质心"""
        c = self.c
        y = x.mean(dim=0)
        
        for _ in range(max_iter):
            y_exp = self._exp_map(y)
            grads = self._log_map(x, y).mean(dim=0)
            y = self._exp_map(y + lr / c * grads)
            y = self._project(y)
        
        return y
    
    def _exp_map(self, v, base=None):
        """指数映射到切空间"""
        if base is None:
            # 从原点
            v_norm = torch.norm(v, dim=-1, keepdim=True).clamp(min=1e-10)
            return torch.tanh(v_norm / c) * v / v_norm * c
        else:
            # 从base点
            base_exp = self._log_map(base)
            return self._exp_map(base_exp + v)
    
    def _log_map(self, x, base=None):
        """对数映射到切空间"""
        if base is None:
            base = torch.zeros_like(x)
            base[..., 0] = c
        
        diff = self._mobius_add(-base, x)
        diff_norm = torch.norm(diff, dim=-1, keepdim=True).clamp(min=1e-10)
        return diff * (2 * torch.atanh(torch.norm(diff, dim=-1, keepdim=True) / c) / 
                       (c * diff_norm))
    
    def _mobius_add(self, u, v):
        """Mobius加法"""
        return expm0_c(torch.logm0_c(u) + torch.logm0_c(v))
    
    def _project(self, x):
        """投影到Poincaré ball内"""
        norm = torch.norm(x, dim=-1, keepdim=True)
        return x * torch.clamp(norm, max=c * (1 - 1e-5)) / norm.clamp(min=1e-10)

双曲残差连接

残差连接是深度网络的关键组件。在双曲空间中,Mobius 残差块定义为:

其中 是双曲线性变换。

Mobius 残差连接

class HyperbolicResidualBlock(nn.Module):
    """双曲残差块"""
    
    def __init__(self, dim, c=1.0):
        super().__init__()
        self.c = c
        self.lin1 = HyperbolicLinear(dim, dim, c)
        self.lin2 = HyperbolicLinear(dim, dim, c)
        self.act = HyperbolicActivation()
    
    def forward(self, x):
        residual = x
        out = self.act(self.lin1(x))
        out = self.lin2(out)
        return hyperbolic_add(residual, out, self.c)

完整双曲 MLP

class HyperbolicMLP(nn.Module):
    """双曲多层感知机"""
    
    def __init__(self, input_dim, hidden_dim, output_dim, num_layers, c=1.0):
        super().__init__()
        self.c = c
        
        # 嵌入层:欧几里得 → 双曲
        self.embedding = nn.Linear(input_dim, hidden_dim)
        
        # 双曲隐藏层
        self.layers = nn.ModuleList([
            HyperbolicLinear(hidden_dim, hidden_dim, c)
            for _ in range(num_layers - 1)
        ])
        
        # 输出层:双曲 → 欧几里得
        self.readout = HyperbolicToEuclidean(c)
    
    def forward(self, x):
        # 嵌入并映射到双曲空间
        x = self.embedding(x)
        x = exp_map_0(x, self.c)
        
        # 双曲层
        for layer in self.layers:
            x = hyperbolic_activation(layer(x), self.c)
        
        # 映射回欧几里得空间进行分类/回归
        return self.readout(x)

与欧几里得网络的对比

表达能力

特性欧几里得网络双曲网络
参数效率线性指数(树结构)
层次表示需要显式编码自然嵌入
梯度流动各向同性各向异性(向根节点集中)
计算成本

何时使用双曲网络

适合场景

  • 数据具有明显的树状层次结构
  • 层次深度较大(如深度知识图谱)
  • 需要高效嵌入层次关系

不适合场景

  • 数据无明显层次结构
  • 数据量小(双曲网络参数效率优势不明显)
  • 需要欧几里得几何假设(如欧氏距离语义)

混合架构

实践中常用双曲-欧几里得混合架构

class HybridNet(nn.Module):
    """
    双曲-欧几里得混合网络
    浅层使用双曲空间捕获层次,深层使用欧几里得空间进行分类
    """
    def __init__(self, dim, c=1.0):
        super().__init__()
        self.hyper_layers = nn.ModuleList([
            HyperbolicLinear(dim, dim, c) for _ in range(3)
        ])
        self.euclidean_layers = nn.ModuleList([
            nn.Linear(dim, dim), nn.ReLU(), 
            nn.Linear(dim, dim), nn.ReLU()
        ])
        self.classifier = nn.Linear(dim, num_classes)
    
    def forward(self, x):
        # 双曲阶段
        x = exp_map_0(x, c=self.c)
        for layer in self.hyper_layers:
            x = hyperbolic_activation(layer(x), c=self.c)
        
        # 映射到欧几里得
        x = log_map_0(x, c=self.c)
        
        # 欧几里得阶段
        for layer in self.euclidean_layers:
            x = layer(x)
        
        return self.classifier(x)

优化策略

黎曼自适应优化器

标准的 Adam/SGD 需要修改以适应黎曼几何:

class RiemannianAdam(torch.optim.Optimizer):
    """黎曼Adam优化器"""
    
    def __init__(self, params, lr=1e-3, c=1.0, beta1=0.9, beta2=0.999):
        defaults = dict(lr=lr, c=c, beta1=beta1, beta2=beta2)
        super().__init__(params, defaults)
    
    def step(self, closure=None):
        loss = None
        if closure is not None:
            loss = closure()
        
        for group in self.param_groups:
            c = group['c']
            
            for p in group['params']:
                if p.grad is None:
                    continue
                
                grad = p.grad.data
                
                # Mobius梯度
                state = self.state[p]
                if len(state) == 0:
                    state['exp_avg'] = torch.zeros_like(p.data)
                    state['exp_avg_sq'] = torch.zeros_like(p.data)
                    state['step'] = 0
                
                exp_avg, exp_avg_sq = state['exp_avg'], state['exp_avg_sq']
                beta1, beta2 = group['beta1'], group['beta2']
                
                state['step'] += 1
                
                # 黎曼梯度修正
                riemannian_grad = ((c - torch.norm(p.data, dim=-1, keepdim=True)**2)**2 / 
                                  (4 * c)) * grad
                
                exp_avg.mul_(beta1).add_(riemannian_grad, alpha=1 - beta1)
                exp_avg_sq.mul_(beta2).addcmul_(riemannian_grad, riemannian_grad, value=1 - beta2)
                
                # 偏差校正
                bias_correction1 = 1 - beta1 ** state['step']
                bias_correction2 = 1 - beta2 ** state['step']
                
                # 黎曼自适应估计
                step_size = group['lr'] / bias_correction1
                
                # 沿黎曼梯度更新
                update = exp_avg / (torch.sqrt(exp_avg_sq / bias_correction2) + 1e-8)
                p.data = expm0_c(p.data, -step_size * update, c)
                p.data = project_to_ball(p.data, c)  # 投影到球内
        
        return loss

应用案例

层次文本分类

输入:"The dog is eating food"
         ↓
    [双曲嵌入层]
         ↓
    ┌─────────────┐
    │  hyper.attention │
    │  ┌───────────┐ │
    │  │ animal ←─┼──── food  │  (层次注意力)
    │  │ dog   ←─┘       │  (语义聚类)
    │  └───────────┘ │
    └─────────────┘
         ↓
    [欧几里得分类器]
         ↓
    类别: "Living Things" > "Animals" > "Dogs"

知识图谱嵌入

双曲空间能自然表示”is-a”层次关系:

  • Dog ⊂ Mammal ⊂ Animal ⊂ LivingThing
  • 每升高一层,距离约

参考