图卷积神经网络（GCN）

图卷积神经网络（Graph Convolutional Network, GCN）是将卷积操作推广到图结构数据的里程碑工作，由Thomas Kipf和Max Welling在2016年提出。¹

1. 图论基础

1.1 图的定义

图 $G = (V, E)$ 由以下要素组成：

$V = {v_{1}, v_{2}, \dots, v_{N}}$ ：顶点集合， $N = ∣ V ∣$ 为节点数
$E \subseteq V \times V$ ：边集合
$A \in R^{N \times N}$ ：邻接矩阵

邻接矩阵：

A_{ij} = {10 如果 (v_{i}, v_{j}) \in E 否则

度矩阵 $D$ ：

D_{ii} = j \sum A_{ij}, D_{ij} = 0 (i \neq = j)

1.2 图的类型

类型	定义	示例
无向图	边无方向， $A_{ij} = A_{ji}$	社交网络好友关系
有向图	边有方向	引用网络、知识图谱
加权图	边有权重	道路网络距离
有属性图	节点/边有特征	分子图

1.3 图的拉普拉斯矩阵

拉普拉斯矩阵是GCN的核心工具：

L = D - A

归一化拉普拉斯矩阵：

L_{sy m} = D^{- 1/2} L D^{- 1/2} = I - D^{- 1/2} A D^{- 1/2}

随机游走归一化拉普拉斯：

L_{r w} = D^{- 1} L = I - D^{- 1} A

1.4 拉普拉斯矩阵的性质

谱分解：

L = U Λ U^{T}

其中：

$U = [u_{0}, u_{1}, \dots, u_{N - 1}]$ ：特征向量矩阵
$Λ = diag (λ_{0}, λ_{1}, \dots, λ_{N - 1})$ ：特征值对角矩阵

性质：

$L$ 是半正定的
$0 = λ_{0} \leq λ_{1} \leq \dots \leq λ_{N - 1} \leq 2$
最小特征值 $λ_{0} = 0$ 对应特征向量 $u_{0} = 1$ （全1向量）
$x^{T} L x = \frac{1}{2} \sum_{i, j} A_{ij} (x_{i} - x_{j})^{2}$ （图的Dirichlet能量）

2. 图上的卷积

2.1 卷积定理

传统傅里叶变换的卷积定理：

(f * g) (t) = \int_{- \infty}^{\infty} \hat{f} (ω) \overset{g}{^} (ω) e^{iω t} d ω

将这个思想推广到图上：

图傅里叶变换：

\hat{f} = U^{T} f

图逆傅里叶变换：

f = U \hat{f}

图卷积：

(f * g)_{G} = U ((U^{T} f) ⊙ (U^{T} g))

其中 $⊙$ 是Hadamard积（逐元素乘积）。

2.2 谱域图卷积

将图卷积表示为滤波器：

g_{θ} * x = U g_{θ} U^{T} x

其中 $g_{θ} = diag (θ)$ 是谱域中的滤波器参数。

问题：计算 $U$ 需要对整个图做特征分解，复杂度 $O (N^{3})$ ，对于大图不可行。

3. Chebyshev多项式近似

3.1 动机

直接在谱域计算卷积的复杂度太高。Chebyshev多项式提供了一种多项式近似方法：

g_{θ} (Λ) \approx k = 0 \sum K θ_{k} T_{k} (\tilde{Λ})

其中：

$\tilde{Λ} = \frac{2}{λ _{ma x}} Λ - I$ ：缩放后的特征值（确保在 $[- 1, 1]$ 内）
$T_{k}$ 是Chebyshev多项式

3.2 Chebyshev多项式定义

T_{k} (x) = ⎩ ⎨ ⎧ 1 x 2 x T_{k - 1} (x) - T_{k - 2} (x) k = 0 k = 1 k \geq 2

前几项：

$T_{0} (x) = 1$
$T_{1} (x) = x$
$T_{2} (x) = 2 x^{2} - 1$
$T_{3} (x) = 4 x^{3} - 3 x$

3.3 ChebNet卷积

ChebNet (Defferrard et al., 2016) 的卷积操作：

g_{θ} * x \approx k = 0 \sum K θ_{k} T_{k} (\tilde{L}) x

其中 $T_{k} (\tilde{L})$ 可以通过递推高效计算：

T_{k} (\tilde{L}) = 2 \tilde{L} T_{k - 1} (\tilde{L}) - T_{k - 2} (\tilde{L})

复杂度： $O (K ∣ E ∣)$ ，仅依赖边数，与节点数无关。

4. GCN：一阶近似

4.1 Kipf & Welling的简化

GCN的核心创新是对ChebNet的进一步简化：

假设：

$K = 1$ （只考虑一阶邻居）
$λ_{ma x} \approx 2$

得到：

g_{θ} * x \approx θ_{0} x + θ_{1} (L - I) x = θ_{0} x - θ_{1} D^{- 1} A x

对称归一化：

θ_{0} x - θ_{1} D^{- 1/2} A D^{- 1/2} x

4.2 GCN层定义

最终形式（令 $θ_{0} = θ_{1} = θ$ ）：

H^{(l + 1)} = σ (\tilde{D}^{- 1/2} \tilde{A} \tilde{D}^{- 1/2} H^{(l)} W^{(l)})

其中：

$\tilde{A} = A + I$ ：添加自环的邻接矩阵
$\tilde{D}$ ：对应的度矩阵
$H^{(l)}$ ：第 $l$ 层的节点特征
$W^{(l)}$ ：可学习的权重矩阵
$σ$ ：非线性激活函数

4.3 数学推导

为什么使用 $\tilde{D}^{- 1/2} \tilde{A} \tilde{D}^{- 1/2}$ ？

考虑消息传递：

h_{i}^{(l + 1)} = σ W^{(l)} j \in N (i) \cup {i} \sum \frac{1}{d _{i} d _{j}} h_{j}^{(l)}

其中 $d_{i} = deg (v_{i})$ 是节点度数。

物理解释：

每个节点从自身和邻居接收消息
消息权重与度的几何平均成反比
归一化保证特征缩放不变性

4.4 单层GCN的数学形式

H^{(l + 1)} = ReLU (\hat{A} H^{(l)} W^{(l)})

其中 $\hat{A} = \tilde{D}^{- 1/2} \tilde{A} \tilde{D}^{- 1/2}$ 。

维度变化：

H^{(l)} \in R^{N \times F^{(l)}} W^{(l)} \in R^{F^{(l)} \times F^{(l + 1)}} R^{N \times F^{(l + 1)}}

5. GCN的实现

5.1 NumPy实现

import numpy as np
 
def normalize_adjacency(A):
    """对称归一化邻接矩阵"""
    # 添加自环
    A = A + np.eye(A.shape[0])
    
    # 度矩阵
    D = np.sum(A, axis=1)
    D_inv_sqrt = np.power(D, -0.5)
    D_inv_sqrt[np.isinf(D_inv_sqrt)] = 0.0
    
    # 归一化
    D_inv_sqrt_mat = np.diag(D_inv_sqrt)
    A_norm = D_inv_sqrt_mat @ A @ D_inv_sqrt_mat
    
    return A_norm
 
 
def gcn_layer(A_norm, H, W):
    """
    单层GCN前向传播
    
    参数:
        A_norm: 归一化邻接矩阵 (N, N)
        H: 输入特征 (N, F_in)
        W: 权重矩阵 (F_in, F_out)
    
    返回:
        输出特征 (N, F_out)
    """
    # 消息传递：聚合邻居信息
    H_agg = A_norm @ H
    
    # 线性变换
    H_out = H_agg @ W
    
    return H_out
 
 
def gcn_forward(X, A, W_list, activations):
    """
    完整GCN前向传播
    
    参数:
        X: 输入特征 (N, F_0)
        A: 邻接矩阵 (N, N)
        W_list: 权重矩阵列表
        activations: 激活函数列表
    
    返回:
        最终输出
    """
    H = X
    A_norm = normalize_adjacency(A)
    
    for W, activation in zip(W_list, activations):
        H = gcn_layer(A_norm, H, W)
        H = activation(H)
    
    return H

5.2 PyTorch实现

import torch
import torch.nn as nn
import torch.nn.functional as F
 
 
class GraphConvolution(nn.Module):
    """单层图卷积"""
    
    def __init__(self, in_features, out_features, bias=True):
        super().__init__()
        self.in_features = in_features
        self.out_features = out_features
        
        self.weight = nn.Parameter(torch.FloatTensor(in_features, out_features))
        if bias:
            self.bias = nn.Parameter(torch.FloatTensor(out_features))
        else:
            self.register_parameter('bias', None)
        
        self.reset_parameters()
    
    def reset_parameters(self):
        nn.init.xavier_uniform_(self.weight)
        if self.bias is not None:
            nn.init.zeros_(self.bias)
    
    def forward(self, input, adj):
        """
        前向传播
        
        参数:
            input: 节点特征 (batch, N, F_in) 或 (N, F_in)
            adj: 邻接矩阵 (N, N)
        
        返回:
            输出特征 (batch, N, F_out) 或 (N, F_out)
        """
        support = torch.matmul(input, self.weight)
        output = torch.matmul(adj, support)
        
        if self.bias is not None:
            output = output + self.bias
        
        return output
 
 
class GCN(nn.Module):
    """图卷积网络"""
    
    def __init__(self, input_dim, hidden_dim, output_dim, num_layers=2, 
                 dropout=0.5):
        super().__init__()
        
        self.num_layers = num_layers
        self.dropout = dropout
        
        # 构建层
        self.layers = nn.ModuleList()
        
        # 第一层
        self.layers.append(
            GraphConvolution(input_dim, hidden_dim)
        )
        
        # 中间层
        for _ in range(num_layers - 2):
            self.layers.append(
                GraphConvolution(hidden_dim, hidden_dim)
            )
        
        # 输出层
        self.layers.append(
            GraphConvolution(hidden_dim, output_dim)
        )
        
        self.activation = nn.ReLU()
    
    def forward(self, x, adj):
        """
        前向传播
        
        参数:
            x: 节点特征 (N, F_in)
            adj: 邻接矩阵 (N, N)
        
        返回:
            输出特征 (N, output_dim)
        """
        for i, layer in enumerate(self.layers):
            x = layer(x, adj)
            if i < len(self.layers) - 1:  # 非输出层
                x = self.activation(x)
                x = F.dropout(x, p=self.dropout, training=self.training)
        
        return x
 
 
class GCNWithSelfLoops(nn.Module):
    """带可学习自环权重的GCN"""
    
    def __init__(self, input_dim, hidden_dim, output_dim):
        super().__init__()
        
        self.gc1 = GraphConvolution(input_dim, hidden_dim)
        self.gc2 = GraphConvolution(hidden_dim, output_dim)
        
        # 可学习的自环权重
        self.self_loop_weight = nn.Parameter(torch.ones(1))
    
    def normalized_adj(self, adj):
        """归一化邻接矩阵（带自环）"""
        N = adj.shape[0]
        
        # 添加自环
        adj = adj + torch.eye(N, device=adj.device)
        
        # 度矩阵
        D = torch.sum(adj, dim=1)
        D_inv_sqrt = torch.pow(D, -0.5)
        D_inv_sqrt[torch.isinf(D_inv_sqrt)] = 0
        
        # 归一化
        D_inv_sqrt_mat = torch.diag(D_inv_sqrt)
        adj_norm = D_inv_sqrt_mat @ adj @ D_inv_sqrt_mat
        
        return adj_norm
    
    def forward(self, x, adj):
        adj_norm = self.normalized_adj(adj)
        
        # 第一层
        x = F.relu(self.gc1(x, adj_norm))
        
        # 第二层
        x = self.gc2(x, adj_norm)
        
        return x

5.3 批量处理版本

class BatchGCN(nn.Module):
    """支持批量处理的GCN"""
    
    def __init__(self, input_dim, hidden_dim, output_dim):
        super().__init__()
        self.gc1 = GraphConvolution(input_dim, hidden_dim)
        self.gc2 = GraphConvolution(hidden_dim, output_dim)
    
    @staticmethod
    def normalize_adj(adj):
        """批量归一化邻接矩阵"""
        # adj: (batch, N, N)
        batch_size = adj.shape[0]
        N = adj.shape[1]
        
        # 添加自环
        adj = adj + torch.eye(N, device=adj.device).unsqueeze(0)
        
        # 度矩阵
        D = torch.sum(adj, dim=2)  # (batch, N)
        D_inv_sqrt = torch.pow(D, -0.5)
        D_inv_sqrt[torch.isinf(D_inv_sqrt)] = 0
        
        # 构建归一化矩阵
        D_inv_sqrt_mat = torch.diag_embed(D_inv_sqrt)  # (batch, N, N)
        adj_norm = D_inv_sqrt_mat @ adj @ D_inv_sqrt_mat
        
        return adj_norm
    
    def forward(self, x, adj):
        """
        批量前向传播
        
        参数:
            x: (batch, N, F_in)
            adj: (batch, N, N)
        
        返回:
            (batch, N, output_dim)
        """
        adj_norm = self.normalize_adj(adj)
        
        x = F.relu(self.gc1(x, adj_norm))
        x = self.gc2(x, adj_norm)
        
        return x

6. 半监督节点分类

6.1 任务定义

给定：

图结构 $G = (V, E)$
部分节点标签 $y_{i}, i \in V_{L}$ （ $V_{L}$ 是有标签节点集）
节点特征 $X$

目标：预测无标签节点 $i \in V_{U}$ 的标签

6.2 训练框架

def train_gcn半监督():
    """
    GCN半监督节点分类训练
    """
    # 数据：Cora引用网络
    # 2708节点，7类，1433维特征
    adj, features, labels, train_mask, val_mask, test_mask = load_cora()
    
    # 模型
    model = GCN(
        input_dim=features.shape[1],
        hidden_dim=16,
        output_dim=7,
        num_layers=2
    )
    
    optimizer = torch.optim.Adam(model.parameters(), lr=0.01)
    criterion = nn.CrossEntropyLoss()
    
    for epoch in range(200):
        model.train()
        optimizer.zero_grad()
        
        # 前向传播
        output = model(features, adj)
        
        # 仅在有标签节点上计算损失
        loss = criterion(output[train_mask], labels[train_mask])
        
        # 反向传播
        loss.backward()
        optimizer.step()
        
        # 评估
        if epoch % 20 == 0:
            model.eval()
            with torch.no_grad():
                output = model(features, adj)
                acc = evaluate(output, labels, test_mask)
                print(f"Epoch {epoch}: Loss={loss.item():.4f}, Test Acc={acc:.4f}")
 
 
def evaluate(output, labels, mask):
    """评估准确率"""
    model.eval()
    pred = output[mask].max(1)[1]
    acc = pred.eq(labels[mask]).sum().item() / mask.sum().item()
    return acc

6.3 图自编码器变体

class GCNEncoder(nn.Module):
    """GCN编码器（用于图自编码器）"""
    
    def __init__(self, input_dim, hidden_dim, embedding_dim):
        super().__init__()
        self.gc1 = GraphConvolution(input_dim, hidden_dim)
        self.gc2 = GraphConvolution(hidden_dim, embedding_dim)
    
    def forward(self, x, adj):
        x = F.relu(self.gc1(x, adj))
        x = self.gc2(x, adj)
        return x
 
 
class GraphAutoEncoder(nn.Module):
    """图自编码器"""
    
    def __init__(self, input_dim, hidden_dim, embedding_dim):
        super().__init__()
        self.encoder = GCNEncoder(input_dim, hidden_dim, embedding_dim)
    
    def decode(self, z):
        """内积解码器"""
        # z: (N, embedding_dim)
        adj_reconstructed = torch.sigmoid(z @ z.T)
        return adj_reconstructed
    
    def forward(self, x, adj):
        z = self.encoder(x, adj)
        adj_recon = self.decode(z)
        return z, adj_recon

7. GCN的表达能力

7.1 与Weisfeiler-Lehman测试的关系

GCN的表达能力受限于1-WL测试（也称颜色细化）：

WL测试：

初始化所有节点颜色
迭代聚合邻居颜色
如果颜色分布不同，则节点不同

GCN ≈ 谱域的1-WL：

GCN的消息传递机制与WL测试类似
无法区分同构图中结构不同的节点

7.2 GCN的局限性

局限性	说明
过平滑	多层GCN后，节点表示趋向相同
感受野受限	只考虑K-hop邻居
无法学习边的权重/类型	原始GCN忽略边特征
Transductive	需要全图进行推理

7.3 解决方案

1. 过平滑缓解

class DenseGCN(nn.Module):
    """Dense连接缓解过平滑"""
    
    def __init__(self, input_dim, hidden_dim, output_dim, num_layers):
        super().__init__()
        
        self.layers = nn.ModuleList()
        self.skip_connections = nn.ModuleList()
        
        for i in range(num_layers):
            in_dim = input_dim if i == 0 else hidden_dim
            self.layers.append(GraphConvolution(in_dim, hidden_dim))
            
            # 跳跃连接
            if i > 0:
                self.skip_connections.append(
                    nn.Linear(in_dim, hidden_dim)
                )
            else:
                self.skip_connections.append(None)
    
    def forward(self, x, adj):
        inputs = [x]
        
        for i, layer in enumerate(self.layers):
            # 当前层输出
            h = layer(x, adj)
            h = F.relu(h)
            
            # 跳跃连接
            if self.skip_connections[i] is not None:
                skip = self.skip_connections[i](inputs[-1])
                h = h + skip
            
            inputs.append(h)
            x = h
        
        return x

2. 注意力机制（GAT）

class GraphAttentionLayer(nn.Module):
    """图注意力层（GAT的核心组件）"""
    
    def __init__(self, in_features, out_features, dropout=0.6, alpha=0.2):
        super().__init__()
        
        self.in_features = in_features
        self.out_features = out_features
        self.dropout = dropout
        self.alpha = alpha
        
        self.W = nn.Parameter(torch.zeros(in_features, out_features))
        nn.init.xavier_uniform_(self.W)
        
        self.a = nn.Parameter(torch.zeros(2 * out_features, 1))
        nn.init.xavier_uniform_(self.a)
    
    def forward(self, input, adj):
        """
        参数:
            input: (N, F_in)
            adj: (N, N) 邻接矩阵
        
        返回:
            (N, F_out)
        """
        h = torch.matmul(input, self.W)  # (N, F_out)
        
        N = h.size(0)
        
        # 计算注意力系数
        a_input = torch.cat([
            h.repeat(1, N).view(N * N, -1),
            h.repeat(N, 1)
        ], dim=1).view(N, -1, 2 * self.out_features)
        
        e = F.leaky_relu(
            torch.matmul(a_input, self.a).squeeze(2),  # (N, N)
            self.alpha
        )
        
        # 掩码（使用-adj使非邻居注意力为-∞）
        attention = torch.where(adj > 0, e, -9e15 * torch.ones_like(e))
        attention = F.softmax(attention, dim=1)
        attention = F.dropout(attention, self.dropout, training=self.training)
        
        # 加权聚合
        h_prime = torch.matmul(attention, h)
        
        return F.elu(h_prime)

8. GCN的应用场景

领域	应用	示例
社交网络	节点分类	用户兴趣预测
知识图谱	链接预测	实体关系推理
生物信息	分子性质预测	药物发现
推荐系统	协同过滤	商品推荐
交通网络	流量预测	道路拥堵预测
计算机视觉	点云处理	3D物体识别

9. 总结

GCN的核心贡献：

谱域到空域的桥梁：通过一阶近似，将谱域卷积转化为高效的空域消息传递
简洁优雅： $\hat{A} H W$ 的形式简单而强大
可扩展性： $O (∣ E ∣)$ 的计算复杂度
半监督学习：利用图结构进行归纳学习

GCN的局限推动了后续研究：

GAT：引入注意力机制
GraphSAGE：归纳学习和采样策略
GNN的深度学习理论：表达能力与WL测试的关系

参考文献

Kipf, T. N., & Welling, M. (2017). Semi-Supervised Classification with Graph Convolutional Networks. ICLR 2017. https://arxiv.org/abs/1609.02907 ↩

Metaphor

探索

图卷积神经网络（GCN）

图卷积神经网络（GCN）

1. 图论基础

1.1 图的定义

1.2 图的类型

1.3 图的拉普拉斯矩阵

1.4 拉普拉斯矩阵的性质

2. 图上的卷积

2.1 卷积定理

2.2 谱域图卷积

3. Chebyshev多项式近似

3.1 动机

3.2 Chebyshev多项式定义

3.3 ChebNet卷积

4. GCN：一阶近似

4.1 Kipf & Welling的简化

4.2 GCN层定义

4.3 数学推导

4.4 单层GCN的数学形式

5. GCN的实现

5.1 NumPy实现

5.2 PyTorch实现

5.3 批量处理版本

6. 半监督节点分类

6.1 任务定义

6.2 训练框架

6.3 图自编码器变体

7. GCN的表达能力

7.1 与Weisfeiler-Lehman测试的关系

7.2 GCN的局限性

7.3 解决方案

8. GCN的应用场景

9. 总结

参考文献

关系图谱

目录

Metaphor

探索

图卷积神经网络（GCN）

图卷积神经网络（GCN）

1. 图论基础

1.1 图的定义

1.2 图的类型

1.3 图的拉普拉斯矩阵

1.4 拉普拉斯矩阵的性质

2. 图上的卷积

2.1 卷积定理

2.2 谱域图卷积

3. Chebyshev多项式近似

3.1 动机

3.2 Chebyshev多项式定义

3.3 ChebNet卷积

4. GCN：一阶近似

4.1 Kipf & Welling的简化

4.2 GCN层定义

4.3 数学推导

4.4 单层GCN的数学形式

5. GCN的实现

5.1 NumPy实现

5.2 PyTorch实现

5.3 批量处理版本

6. 半监督节点分类

6.1 任务定义

6.2 训练框架

6.3 图自编码器变体

7. GCN的表达能力

7.1 与Weisfeiler-Lehman测试的关系

7.2 GCN的局限性

7.3 解决方案

8. GCN的应用场景

9. 总结

参考文献

Footnotes

关系图谱

目录