概述
双曲神经网络(Hyperbolic Neural Networks, HNN)将神经网络的基本操作推广到双曲空间。与欧几里得网络相比,双曲网络在处理层次结构数据时具有更高的表达能力。
核心思想:保持网络的层次归纳偏置,将传统操作(线性变换、激活、归一化)替换为它们在黎曼流形上的对应物。
双曲线性层
基本原理
双曲线性层的核心是 Mobius 矩阵乘法:
其中 是学习参数, 是输入。
完整的前向传播:
其中 是非线性激活函数在 Mobius 意义下的推广。
非线性激活
在双曲空间中,非线性激活需要满足Mobius 非线性的定义:
其中 是欧几里得空间中的非线性函数(如 ReLU、Sigmoid)。
常用激活的 Mobius 推广:
| 欧几里得激活 | Mobius 推广 | 公式 |
|---|---|---|
| Identity | Identity | |
| Sigmoid | — | |
| ReLU | — |
数值稳定性
Mobius 运算在高维或深层网络中容易出现 NaN 问题。常见策略:
- 投影到有界区域:
- 使用 Lorentz 模型:数值更稳定
- 梯度裁剪:限制梯度范数
def mobius_add_safe(x, v, c, eps=1e-8):
"""安全的Mobius加法"""
norm_x = torch.norm(x, dim=-1, keepdim=True).clamp(min=eps)
norm_v = torch.norm(v, dim=-1, keepdim=True).clamp(min=eps)
# 投影到安全区域
x = x * torch.clamp(norm_x, max=c * (1 - eps))
# 计算
theta = torch.sum(x * v, dim=-1, keepdim=True) * 2 * c - c * torch.sum(v * v, dim=-1, keepdim=True)
denom = 1 - 2 * c * theta / c**2 + c**2 * torch.sum(v * v, dim=-1, keepdim=True) / c**4
return ((1 + c * torch.sum(x * x, dim=-1, keepdim=True) / c**2) * v -
2 * c * theta / c**2 * x) / denom.clamp(min=eps)双曲注意力机制
背景
注意力机制需要计算 Query-Key-Value 的相似度。在双曲空间中,这通过切空间投影实现。
双曲注意力计算
-
投影到切空间:
-
计算注意力权重(在切空间中):
- 加权求和(在双曲空间中):
双曲多头注意力
其中每个 head 在独立的 Poincaré ball 中运行:
Lorentz 注意力
更稳定实现使用 Lorentz 模型的内积:
Lorentz 注意力分数:
双曲归一化
黎曼批量归一化(Riemannian BatchNorm)
双曲空间中的批量归一化需要计算数据的 Fréchet 均值(黎曼质心):
Riemannian BatchNorm 算法:
- 计算黎曼质心
- 计算黎曼协方差(切空间中)
- 执行白化变换
- 重新缩放和平移
class RiemannianBatchNorm(nn.Module):
"""黎曼批量归一化"""
def __init__(self, dim, c=1.0, momentum=0.1, eps=1e-5):
super().__init__()
self.c = c
self.momentum = momentum
self.eps = eps
# 可学习参数
self.weight = nn.Parameter(torch.ones(dim))
self.bias = nn.Parameter(torch.zeros(dim))
# 统计量
self.running_mean = None
self.running_var = None
def forward(self, x):
c = self.c
if self.training:
# 计算黎曼质心
mean = self._riemannian_mean(x)
# 投影到切空间并计算方差
x_log = self._log_map(x, mean)
var = torch.var(x_log, dim=0, unbiased=False)
# 更新运行统计量
if self.running_mean is None:
self.running_mean = mean.detach()
self.running_var = var.detach()
else:
self.running_mean = (1 - self.momentum) * self.running_mean + self.momentum * mean.detach()
self.running_var = (1 - self.momentum) * self.running_var + self.momentum * var.detach()
else:
mean = self.running_mean
var = self.running_var
# 归一化
x_centered = self._mobius_add(-mean, x)
x_normalized = x_centered / (torch.sqrt(var + self.eps) * torch.sqrt(c))
# 缩放和平移
return self._mobius_add(self.weight * x_normalized, self.bias)
def _riemannian_mean(self, x, lr=0.1, max_iter=100):
"""黎曼梯度下降求质心"""
c = self.c
y = x.mean(dim=0)
for _ in range(max_iter):
y_exp = self._exp_map(y)
grads = self._log_map(x, y).mean(dim=0)
y = self._exp_map(y + lr / c * grads)
y = self._project(y)
return y
def _exp_map(self, v, base=None):
"""指数映射到切空间"""
if base is None:
# 从原点
v_norm = torch.norm(v, dim=-1, keepdim=True).clamp(min=1e-10)
return torch.tanh(v_norm / c) * v / v_norm * c
else:
# 从base点
base_exp = self._log_map(base)
return self._exp_map(base_exp + v)
def _log_map(self, x, base=None):
"""对数映射到切空间"""
if base is None:
base = torch.zeros_like(x)
base[..., 0] = c
diff = self._mobius_add(-base, x)
diff_norm = torch.norm(diff, dim=-1, keepdim=True).clamp(min=1e-10)
return diff * (2 * torch.atanh(torch.norm(diff, dim=-1, keepdim=True) / c) /
(c * diff_norm))
def _mobius_add(self, u, v):
"""Mobius加法"""
return expm0_c(torch.logm0_c(u) + torch.logm0_c(v))
def _project(self, x):
"""投影到Poincaré ball内"""
norm = torch.norm(x, dim=-1, keepdim=True)
return x * torch.clamp(norm, max=c * (1 - 1e-5)) / norm.clamp(min=1e-10)双曲残差连接
残差连接是深度网络的关键组件。在双曲空间中,Mobius 残差块定义为:
其中 是双曲线性变换。
Mobius 残差连接:
class HyperbolicResidualBlock(nn.Module):
"""双曲残差块"""
def __init__(self, dim, c=1.0):
super().__init__()
self.c = c
self.lin1 = HyperbolicLinear(dim, dim, c)
self.lin2 = HyperbolicLinear(dim, dim, c)
self.act = HyperbolicActivation()
def forward(self, x):
residual = x
out = self.act(self.lin1(x))
out = self.lin2(out)
return hyperbolic_add(residual, out, self.c)完整双曲 MLP
class HyperbolicMLP(nn.Module):
"""双曲多层感知机"""
def __init__(self, input_dim, hidden_dim, output_dim, num_layers, c=1.0):
super().__init__()
self.c = c
# 嵌入层:欧几里得 → 双曲
self.embedding = nn.Linear(input_dim, hidden_dim)
# 双曲隐藏层
self.layers = nn.ModuleList([
HyperbolicLinear(hidden_dim, hidden_dim, c)
for _ in range(num_layers - 1)
])
# 输出层:双曲 → 欧几里得
self.readout = HyperbolicToEuclidean(c)
def forward(self, x):
# 嵌入并映射到双曲空间
x = self.embedding(x)
x = exp_map_0(x, self.c)
# 双曲层
for layer in self.layers:
x = hyperbolic_activation(layer(x), self.c)
# 映射回欧几里得空间进行分类/回归
return self.readout(x)与欧几里得网络的对比
表达能力
| 特性 | 欧几里得网络 | 双曲网络 |
|---|---|---|
| 参数效率 | 线性 | 指数(树结构) |
| 层次表示 | 需要显式编码 | 自然嵌入 |
| 梯度流动 | 各向同性 | 各向异性(向根节点集中) |
| 计算成本 |
何时使用双曲网络
适合场景:
- 数据具有明显的树状层次结构
- 层次深度较大(如深度知识图谱)
- 需要高效嵌入层次关系
不适合场景:
- 数据无明显层次结构
- 数据量小(双曲网络参数效率优势不明显)
- 需要欧几里得几何假设(如欧氏距离语义)
混合架构
实践中常用双曲-欧几里得混合架构:
class HybridNet(nn.Module):
"""
双曲-欧几里得混合网络
浅层使用双曲空间捕获层次,深层使用欧几里得空间进行分类
"""
def __init__(self, dim, c=1.0):
super().__init__()
self.hyper_layers = nn.ModuleList([
HyperbolicLinear(dim, dim, c) for _ in range(3)
])
self.euclidean_layers = nn.ModuleList([
nn.Linear(dim, dim), nn.ReLU(),
nn.Linear(dim, dim), nn.ReLU()
])
self.classifier = nn.Linear(dim, num_classes)
def forward(self, x):
# 双曲阶段
x = exp_map_0(x, c=self.c)
for layer in self.hyper_layers:
x = hyperbolic_activation(layer(x), c=self.c)
# 映射到欧几里得
x = log_map_0(x, c=self.c)
# 欧几里得阶段
for layer in self.euclidean_layers:
x = layer(x)
return self.classifier(x)优化策略
黎曼自适应优化器
标准的 Adam/SGD 需要修改以适应黎曼几何:
class RiemannianAdam(torch.optim.Optimizer):
"""黎曼Adam优化器"""
def __init__(self, params, lr=1e-3, c=1.0, beta1=0.9, beta2=0.999):
defaults = dict(lr=lr, c=c, beta1=beta1, beta2=beta2)
super().__init__(params, defaults)
def step(self, closure=None):
loss = None
if closure is not None:
loss = closure()
for group in self.param_groups:
c = group['c']
for p in group['params']:
if p.grad is None:
continue
grad = p.grad.data
# Mobius梯度
state = self.state[p]
if len(state) == 0:
state['exp_avg'] = torch.zeros_like(p.data)
state['exp_avg_sq'] = torch.zeros_like(p.data)
state['step'] = 0
exp_avg, exp_avg_sq = state['exp_avg'], state['exp_avg_sq']
beta1, beta2 = group['beta1'], group['beta2']
state['step'] += 1
# 黎曼梯度修正
riemannian_grad = ((c - torch.norm(p.data, dim=-1, keepdim=True)**2)**2 /
(4 * c)) * grad
exp_avg.mul_(beta1).add_(riemannian_grad, alpha=1 - beta1)
exp_avg_sq.mul_(beta2).addcmul_(riemannian_grad, riemannian_grad, value=1 - beta2)
# 偏差校正
bias_correction1 = 1 - beta1 ** state['step']
bias_correction2 = 1 - beta2 ** state['step']
# 黎曼自适应估计
step_size = group['lr'] / bias_correction1
# 沿黎曼梯度更新
update = exp_avg / (torch.sqrt(exp_avg_sq / bias_correction2) + 1e-8)
p.data = expm0_c(p.data, -step_size * update, c)
p.data = project_to_ball(p.data, c) # 投影到球内
return loss应用案例
层次文本分类
输入:"The dog is eating food"
↓
[双曲嵌入层]
↓
┌─────────────┐
│ hyper.attention │
│ ┌───────────┐ │
│ │ animal ←─┼──── food │ (层次注意力)
│ │ dog ←─┘ │ (语义聚类)
│ └───────────┘ │
└─────────────┘
↓
[欧几里得分类器]
↓
类别: "Living Things" > "Animals" > "Dogs"
知识图谱嵌入
双曲空间能自然表示”is-a”层次关系:
Dog ⊂ Mammal ⊂ Animal ⊂ LivingThing- 每升高一层,距离约