持久同调基础

持久同调（Persistent Homology, PH）是拓扑数据分析（Topological Data Analysis, TDA）的核心工具，通过追踪数据在不同尺度上的拓扑特征变化，揭示数据的内在几何结构。¹

1. 拓扑不变性基础

1.1 什么是拓扑

拓扑学研究在连续变形下保持不变的几何性质。与欧几里得几何不同，拓扑不关心精确的距离和角度，只关心”本质结构”。

1.2 核心拓扑不变量

不变量	维度	直观理解
连通分量	0维	数据点是否连成一片
环路	1维	是否有”洞”（如甜甜圈的孔）
空洞	2维	是否有”空腔”（如球体内部）
高维洞	k维	$k$ -维边界

定义：如果两个对象可以通过拉伸、扭曲（但不撕裂或粘合）相互转换，则它们在拓扑上是等价的。

1.3 同调群

$k$ 维同调群 $H_{k}$ 是代数结构，用于量化 $k$ 维洞的数量：

H_{k} ≅ Z^{b_{k}} \oplus (torsion terms)

其中 $b_{k}$ 是第 $k$ 维洞的数量（Betti数）。

2. 持久同调核心概念

2.1 过滤函数

过滤函数（Filtration）是定义尺度参数 $τ$ 的函数：

f : X \to R

常见的过滤函数包括：

Vietoris-Rips过滤：基于点间距离
Čech过滤：基于覆盖半径
sublevel过滤：基于函数值（如密度）

2.2 单纯复形

单纯复形是拓扑空间的多面体表示：

维度	单纯形	例子
0维	顶点	点
1维	边	线段
2维	三角形	平面三角形
3维	四面体	立体四面体

Vietoris-Rips复形：给定距离阈值 $ϵ$ ，包含所有距离 $\leq ϵ$ 的点构成的单纯复形。

2.3 持久图定义

持久图（Persistence Diagram）是 $R^{2}$ 中有限的点集，每个点 $(b, d)$ 表示一个拓扑特征：

$b$ ：特征”出生”（出现）的尺度
$d$ ：特征”死亡”（消失）的尺度
持久度： $p ers i s t e n ce = d - b$ ，衡量特征的重要性

2.4 持久条形码

持久条形码（Persistence Barcode）是持久图的另一种可视化：

特征1: |----------|
特征2: |------|
特征3: |-----|
特征4: |-----|
背景:  |

长条表示高持久度的重要特征，短条表示可能的噪声。

3. 稳定性理论

3.1 Bottleneck距离

Bottleneck距离度量两个持久图之间的最大偏差：

d_{B} (D_{1}, D_{2}) = γ in f x \in D_{1} \cup Δ sup ∥ x - γ (x) ∥_{\infty}

其中 $Δ$ 是对角线， $γ$ 是匹配。

3.2 Wasserstein距离

$p$ -Wasserstein距离是更精细的度量：

d_{W}^{p} (D_{1}, D_{2}) = (γ in f x \in D_{1} \cup Δ \sum ∥ x - γ (x) ∥_{\infty}^{p})^{1/ p}

稳定性定理：如果过滤函数 $f$ 和 $g$ 满足 $∣ f (x) - g (x) ∣ \leq ϵ$ ，则：

d_{B} (D g m (f), D g m (g)) \leq ϵ

这保证了拓扑特征对噪声的鲁棒性。

4. 计算算法

4.1 标准算法流程

def compute_persistent_homology(points, max_distance):
    # 1. 构建距离矩阵
    dist_matrix = compute_pairwise_distances(points)
    
    # 2. 按距离排序边
    edges = sort_edges_by_distance(dist_matrix)
    
    # 3. 逐步添加边，构建VR复形
    filtration = []
    for epsilon in range(0, max_distance, step):
        add_edges_within_distance(filtration, epsilon)
        add_triangles_from_edges(filtration)
        filtration.append(current_complex)
    
    # 4. 计算边界矩阵
    boundary_matrix = compute_boundary_matrix(filtration)
    
    # 5. 约简算法计算持久性
    persistence_pairs = matrix_reduction(boundary_matrix)
    
    return persistence_pairs

4.2 Ripser算法

Ripser是目前最高效的持久同调计算库之一：

import numpy as np
from ripser import ripser
from persim import plot_diagrams
import matplotlib.pyplot as plt
 
# 生成示例数据：两个圆环
np.random.seed(42)
circle1 = generate_circle(n_points=50, radius=1, center=(0, 0))
circle2 = generate_circle(n_points=50, radius=1, center=(3, 0))
data = np.vstack([circle1, circle2])
 
# 计算持久同调
result = ripser(data, maxdim=2, thresh=2.0)
 
# 可视化持久图
plot_diagrams(result['dgms'], show=True)

4.3 GUDHI实现

from gudhi import RipsComplex, PersistentHomologyDiagram
import numpy as np
 
# 创建Rips复形
points = np.random.random((100, 2))
rips = RipsComplex(points=points, max_edge_length=1.0)
 
# 构建复形并计算持久性
simplex_tree = rips.create_simplex_tree(max_dimension=2)
diag = simplex_tree.persistence()
 
# 可视化
gudhi.plot_persistence_diagram(diag)
gudhi.plot_persistence_barcode(diag)

5. 深度学习中的持久同调

5.1 拓扑特征作为输入

将持久图转化为向量作为额外输入：

import numpy as np
from ripser import ripser
from sklearn.preprocessing import StandardScaler
 
def extract_topological_features(points, max_dim=2):
    """提取拓扑特征向量"""
    # 计算持久同调
    result = ripser(points, maxdim=max_dim)
    
    features = []
    for dim in range(max_dim + 1):
        # 持久度统计
        pers = result['dgms'][dim]
        if len(pers) > 0:
            # 移除无穷远点
            pers_finite = pers[pers[:, 1] < np.inf]
            
            features.extend([
                len(pers_finite),                    # 特征数量
                np.mean(pers_finite[:, 1] - pers_finite[:, 0]),  # 平均持久度
                np.max(pers_finite[:, 1] - pers_finite[:, 0]),   # 最大持久度
                np.sum(pers_finite[:, 1] - pers_finite[:, 0]),  # 总持久度
            ])
        else:
            features.extend([0, 0, 0, 0])
    
    return np.array(features)
 
# 使用示例
points = np.random.random((100, 2))
topo_features = extract_topological_features(points)
print(f"拓扑特征维度: {len(topo_features)}")

5.2 可微分持久同调层

为支持端到端学习，需要可微分的拓扑层：

import torch
import torch.nn as nn
 
class SoftPersistentHomology(nn.Module):
    """软化持久同调层 - 可微分近似"""
    
    def __init__(self, max_samples=100, temperature=0.1):
        super().__init__()
        self.max_samples = max_samples
        self.temperature = temperature
    
    def forward(self, points):
        """
        points: (batch_size, n_points, dim)
        返回: (batch_size, n_features)
        """
        batch_size = points.shape[0]
        
        # 计算距离矩阵（可微分）
        dist_matrix = self._compute_distances(points)
        
        # 软化排序（可微分）
        weights = torch.softmax(-dist_matrix / self.temperature, dim=-1)
        
        # 估计持久度统计量
        features = self._estimate_persistence(weights, dist_matrix)
        
        return features
    
    def _compute_distances(self, points):
        """计算成对距离"""
        n = points.shape[1]
        # 欧几里得距离
        diff = points.unsqueeze(2) - points.unsqueeze(1)  # (B, N, N, D)
        dist = torch.norm(diff, dim=-1)  # (B, N, N)
        return dist
    
    def _estimate_persistence(self, weights, distances):
        """估计持久度特征"""
        # 加权平均持久度
        weighted_persistence = (weights * distances).sum(dim=[-1, -2])
        return weighted_persistence

5.3 拓扑损失函数

添加拓扑约束到损失函数：

class TopologyRegularizer(nn.Module):
    """拓扑正则化器"""
    
    def __init__(self, weight=0.1, target_dim=1):
        super().__init__()
        self.weight = weight
        self.target_dim = target_dim
    
    def forward(self, generated_points, reference_dgm):
        """
        generated_points: 生成的点云
        reference_dgm: 参考持久图
        """
        # 计算生成分布的持久图
        gen_dgm = self._compute_diagram(generated_points)
        
        # Wasserstein距离损失
        wasserstein_loss = self._wasserstein_loss(gen_dgm, reference_dgm)
        
        return self.weight * wasserstein_loss
    
    def _compute_diagram(self, points):
        """计算持久图（需要外部库或近似）"""
        # 实际实现中可调用Ripser
        pass
    
    def _wasserstein_loss(self, dgm1, dgm2):
        """计算Wasserstein距离"""
        # 使用Sinkhorn近似
        return torch.cdist(dgm1, dgm2, p=2).mean()

6. 应用实例

6.1 分子结构分析

import numpy as np
from ripser import ripser
 
def analyze_molecular_pockets(atom_positions):
    """
    分析分子口袋的拓扑结构
    atom_positions: 原子坐标 (N, 3)
    """
    # 计算持久同调
    result = ripser(atom_positions, maxdim=2, thresh=10.0)
    
    # 提取特征
    diagrams = result['dgms']
    
    # 口袋体积估计（基于1维特征）
    pocket_volume_est = np.sum(diagrams[1][:, 1] - diagrams[1][:, 0])
    
    # 连接性分析（基于0维特征）
    n_components = len(diagrams[0])
    
    return {
        'n_components': n_components,
        'pocket_volume_est': pocket_volume_est,
        'diagram_1d': diagrams[1]
    }
 
# 示例：分析药物分子口袋
atom_positions = load_drug_molecule()
features = analyze_molecular_pockets(atom_positions)

6.2 图像纹理分析

def extract_image_topology(image, patch_size=8):
    """
    提取图像拓扑纹理特征
    image: (H, W) 灰度图像
    """
    from skimage import io, color
    from scipy.ndimage import distance_transform_edt
    
    # 转换为一维信号并计算距离变换
    dist_transform = distance_transform_edt(image)
    
    # 提取局部补丁的拓扑特征
    features = []
    for i in range(0, image.shape[0] - patch_size, patch_size // 2):
        for j in range(0, image.shape[1] - patch_size, patch_size // 2):
            patch = dist_transform[i:i+patch_size, j:j+patch_size]
            
            # 计算子补丁的持久同调
            result = ripser(patch.reshape(-1, 1), maxdim=1)
            pers = result['dgms'][1]
            
            if len(pers) > 0:
                features.append(np.max(pers[:, 1] - pers[:, 0]))
            else:
                features.append(0)
    
    return np.array(features)

7. 数学形式化

7.1 持久同调代数定义

给定过滤函数 $f : X \to R$ ，定义子水平集：

X_{t} = f^{- 1} ((- \infty, t])

子水平集随 $t$ 单调递增，构成嵌套序列：

X_{t_{1}} \subseteq X_{t_{2}} \subseteq X_{t_{3}} \subseteq \dots

持久同调追踪包含关系诱导的同调群之间的映射：

\dots \to H_{k} (X_{t_{1}}) \to H_{k} (X_{t_{2}}) \to H_{k} (X_{t_{3}}) \to \dots

7.2 条形码与持久图等价性

定理（稳定性）：持久图完全编码了过滤的拓扑结构，且与条形码双射等价。

7.3 计算复杂度

算法	时间复杂度	空间复杂度
朴素矩阵约简	$O (n^{3})$	$O (n^{2})$
标准持久同调	$O (n^{ω})$	$O (n^{2})$
Ripser	$O (n^{2})$	$O (n^{2})$
稀疏持久同调	$O (n lo g n)$	$O (n)$

8. 实践建议

8.1 参数选择

参数	建议	注意事项
maxdim	2-3	过高维度计算量大
thresh	数据直径的0.3-0.5	影响特征检测范围
metric	根据数据选择	常用：euclidean, cosine

8.2 常见陷阱

维度选择：过高维度引入噪声，过低维度丢失信息
阈值设置：过小阈值产生过多短条，过大阈值计算量大
特征融合：简单拼接可能导致维度灾难

8.3 性能优化

# 并行计算多个样本的持久同调
from concurrent.futures import ProcessPoolExecutor
 
def parallel_persistent_homology(data_list, n_workers=4):
    with ProcessPoolExecutor(max_workers=n_workers) as executor:
        results = list(executor.map(ripser, data_list))
    return results
 
# 使用子采样处理大数据集
def subsample_persistent_homology(points, n_samples=1000):
    if len(points) > n_samples:
        indices = np.random.choice(len(points), n_samples, replace=False)
        points = points[indices]
    return ripser(points)

Metaphor

探索

持久同调基础

持久同调基础

1. 拓扑不变性基础

1.1 什么是拓扑

1.2 核心拓扑不变量

1.3 同调群

2. 持久同调核心概念

2.1 过滤函数

2.2 单纯复形

2.3 持久图定义

2.4 持久条形码

3. 稳定性理论

3.1 Bottleneck距离

3.2 Wasserstein距离

4. 计算算法

4.1 标准算法流程

4.2 Ripser算法

4.3 GUDHI实现

5. 深度学习中的持久同调

5.1 拓扑特征作为输入

5.2 可微分持久同调层

5.3 拓扑损失函数

6. 应用实例

6.1 分子结构分析

6.2 图像纹理分析

7. 数学形式化

7.1 持久同调代数定义

7.2 条形码与持久图等价性

7.3 计算复杂度

8. 实践建议

8.1 参数选择

8.2 常见陷阱

8.3 性能优化

参考文献

相关文档

关系图谱

目录

反向链接

Metaphor

探索

持久同调基础

持久同调基础

1. 拓扑不变性基础

1.1 什么是拓扑

1.2 核心拓扑不变量

1.3 同调群

2. 持久同调核心概念

2.1 过滤函数

2.2 单纯复形

2.3 持久图定义

2.4 持久条形码

3. 稳定性理论

3.1 Bottleneck距离

3.2 Wasserstein距离

4. 计算算法

4.1 标准算法流程

4.2 Ripser算法

4.3 GUDHI实现

5. 深度学习中的持久同调

5.1 拓扑特征作为输入

5.2 可微分持久同调层

5.3 拓扑损失函数

6. 应用实例

6.1 分子结构分析

6.2 图像纹理分析

7. 数学形式化

7.1 持久同调代数定义

7.2 条形码与持久图等价性

7.3 计算复杂度

8. 实践建议

8.1 参数选择

8.2 常见陷阱

8.3 性能优化

参考文献

相关文档

Footnotes

关系图谱

目录

反向链接