分子动力学模拟与ML增强

1. 引言

分子动力学(Molecular Dynamics, MD)模拟是计算化学和生物物理学的基石方法,通过数值积分牛顿运动方程,模拟分子系统的原子级动力学行为。

MD模拟的应用

领域应用时间尺度
生物化学蛋白质折叠、配体结合ns - ms
材料科学相变、缺陷扩散ps - μs
药物发现靶点结合、ADMET预测ns - μs
化学工程反应动力学、流体性质ps - ns

2. 经典分子动力学基础

2.1 牛顿运动方程

对于包含 个原子的系统:

其中 是势能面, 是作用在原子 上的力。

2.2 积分算法

Velocity Verlet算法

最常用的积分算法:

def velocity_verlet(positions, velocities, forces, masses, dt):
    """
    Velocity Verlet积分
    
    Args:
        positions: [N, 3] 原子位置
        velocities: [N, 3] 原子速度
        forces: [N, 3] 原子受力
        masses: [N] 原子质量
        dt: 时间步长
    """
    # 1. 半步更新速度
    velocities_half = velocities + 0.5 * forces / masses * dt
    
    # 2. 全步更新位置
    positions_new = positions + velocities_half * dt
    
    # 3. 计算新力
    forces_new = compute_forces(positions_new)
    
    # 4. 完成速度更新
    velocities_new = velocities_half + 0.5 * forces_new / masses * dt
    
    return positions_new, velocities_new, forces_new

Leapfrog算法

def leapfrog(positions, velocities, forces, masses, dt):
    """
    Leapfrog积分算法
    """
    # 1. 半步跳跃速度
    velocities += 0.5 * forces / masses * dt
    
    # 2. 全步更新位置
    positions += velocities * dt
    
    # 3. 计算新力
    forces_new = compute_forces(positions)
    
    # 4. 完成速度跳跃
    velocities += 0.5 * forces_new / masses * dt
    
    return positions, velocities, forces_new

2.3 势能函数

经典MD使用经验力场,势能分解为:

成键项

# 键长项 (Harmonic)
V_{\text{bond}} = \sum_{\text{bonds}} k_b (r - r_0)^2
 
# 键角项 (Harmonic)
V_{\text{angle}} = \sum_{\text{angles}} k_\theta (\theta - \theta_0)^2
 
# 二面角项 (Fourier)
V_{\text{dihedral}} = \sum_{\text{dihedrals}} \sum_n \frac{V_n}{2} [1 + \cos(n\phi - \gamma)]

非键项

# 范德华相互作用 (Lennard-Jones)
V_{\text{LJ}} = \sum_{i < j} 4\epsilon_{ij} \left[ \left(\frac{\sigma_{ij}}{r_{ij}}\right)^{12} - \left(\frac{\sigma_{ij}}{r_{ij}}\right)^6 \right]
 
# 静电相互作用 (Coulomb)
V_{\text{elec}} = \sum_{i < j} \frac{q_i q_j}{4\pi\epsilon_0 r_{ij}}

2.4 系综与温度控制

系综描述应用
NVE粒子数、体积、能量守恒能量守恒验证
NVT粒子数、体积、温度恒定热力学平衡
NPT粒子数、压力、温度恒定相变研究

温度控制:Langevin动力学

class LangevinThermostat:
    """
    Langevin热浴
    
    朗之万方程:
    m dv/dt = F - γv + √(2mγkT) ζ(t)
    其中ζ(t)是高斯白噪声
    """
    def __init__(self, damping=1.0, temperature=300):
        self.gamma = damping
        self.temp = temperature
        self.kB = 0.0019872041  # kcal/(mol·K)
    
    def step(self, velocities, forces, masses, dt):
        # 随机力
        noise_std = np.sqrt(2 * self.gamma * self.kB * self.temp / dt)
        random_force = np.random.normal(0, noise_std, velocities.shape)
        
        # 更新速度
        velocities += (forces - self.gamma * velocities + random_force) / masses * dt
        
        return velocities

3. 经典力场

3.1 常见力场

力场适用体系特点
AMBER蛋白质、核酸ff19SB, ff14SB
CHARMM蛋白质、脂质C36m, 糖类支持好
OPLS有机分子适合小分子模拟
GROMOS生物膜联合原子模型
GAFF通用小分子与AMBER兼容

3.2 力场参数化

# 使用OpenFF生成小分子力场
from openff.toolkit import Molecule
from openff.toolkit.forces import ForceField
 
# 加载分子
molecule = Molecule.from_smiles("CCO")  # 乙醇
 
# 生成GAFF力场
forcefield = ForceField("openff-2.1.0.offxml")
 
# 参数化
system = forcefield.create_openmm_system(molecule)

4. 边界条件与截断

4.1 周期性边界条件

class PeriodicBoundary:
    """
    周期性边界条件
    """
    def __init__(self, box_vectors):
        self.box = box_vectors  # [3, 3] 盒子的三个向量
    
    def minimum_image(self, r_ij):
        """
        最小镜像约定
        
        选择最近的周期性映像
        """
        # 计算盒子长度的一半
        half_box = np.diag(self.box) / 2
        
        # 镜像位移
        r_ij = r_ij - self.box @ np.floor(r_ij / half_box + 0.5)
        
        return r_ij
    
    def pbc_distance(self, r_i, r_j):
        """
        计算周期性距离
        """
        r_ij = r_j - r_i
        return np.linalg.norm(self.minimum_image(r_ij))

4.2 截断与长程相互作用

class PME:
    """
    Particle Mesh Ewald求和方法
    处理周期性体系的静电相互作用
    """
    def __init__(self, alpha=0.34, grid_dim=80):
        self.alpha = alpha
        self.grid = grid_dim
    
    def compute_ewald(self, charges, positions, box):
        """
        Ewald求和
        """
        # 1. 真实空间贡献(截断)
        real_space = self.compute_real_space(charges, positions, cutoff=1.0)
        
        # 2. 倒空间贡献(Particle Mesh Ewald)
        reciprocal_space = self.compute_reciprocal_space(charges, positions, box)
        
        # 3. 自能校正
        self_energy = self.compute_self_energy(charges)
        
        return real_space + reciprocal_space - self_energy

5. 机器学习增强MD

5.1 ML势能函数

将神经网络替代经验势能函数:

class MLPotentialMD:
    """
    机器学习势能函数的MD模拟
    """
    def __init__(self, model, cutoff=6.0):
        self.model = model
        self.cutoff = cutoff
    
    def compute_forces(self, positions):
        """
        使用ML模型计算力
        力 = -能量对位置的梯度
        """
        positions.requires_grad_(True)
        
        # 预测能量
        energy = self.model(positions)
        
        # 自动微分计算力
        forces = -torch.autograd.grad(
            outputs=energy.sum(),
            inputs=positions,
            create_graph=False
        )[0]
        
        return energy.detach(), forces.detach()
    
    def simulation_step(self, positions, velocities, dt):
        """
        MD步
        """
        # 计算能量和力
        energy, forces = self.compute_forces(positions)
        
        # Velocity Verlet
        velocities += 0.5 * forces / self.masses * dt
        positions += velocities * dt
        forces_new, _ = self.compute_forces(positions)
        velocities += 0.5 * forces_new / self.masses * dt
        
        return positions, velocities, energy

5.2 增强采样方法

Metadynamics

class Metadynamics:
    """
    Metadynamics增强采样
    
    通过在已探索区域填充高斯势能,
    鼓励系统探索新的构象空间
    """
    def __init__(self, cv_indices, height=1.0, sigma=0.5, stride=100):
        self.cv_indices = cv_indices  # 反应坐标
        self.height = height
        self.sigma = sigma
        self.stride = stride
        self.history = []
    
    def add_gaussian(self, positions, step):
        """
        添加高斯势能
        """
        if step % self.stride == 0:
            cv = self.compute_cv(positions)
            self.history.append(cv)
    
    def bias_potential(self, cv):
        """
        计算偏置势能
        """
        bias = 0.0
        for cv_hist in self.history:
            diff = cv - cv_hist
            bias += self.height * np.exp(-0.5 * (diff / self.sigma) ** 2)
        return bias
    
    def compute_cv(self, positions):
        """
        计算集合变量(如RMSD、距离等)
        """
        return positions[self.cv_indices].norm()

Umbrella Sampling

class UmbrellaSampling:
    """
    伞形采样
    
    在不同窗口中模拟,计算自由能剖面
    """
    def __init__(self, cv_range, num_windows, k_spring=10.0):
        self.windows = np.linspace(cv_range[0], cv_range[1], num_windows)
        self.k_spring = k_spring
    
    def run_window(self, window_center, positions, n_steps=100000):
        """
        运行单个窗口的模拟
        """
        for step in range(n_steps):
            # MD步
            positions, velocities = self.md_step(positions, velocities)
            
            # 计算CV
            cv = self.compute_cv(positions)
            
            # 伞形偏置力
            f_spring = self.k_spring * (cv - window_center)
            
            # 应用力到系统
            
        return self.collect_trajectory()
    
    def wham_analysis(self, trajectories, temperatures):
        """
        加权直方图分析法(WHAM)
        重建自由能剖面
        """
        # 合并所有窗口的数据
        # 求解WHAM方程
        F = self.solve_wham_equations(trajectories)
        
        return F

5.3 自由能计算

TI(热力学积分)

class ThermodynamicIntegration:
    """
    热力学积分
    
    通过在λ路径上积分计算自由能差
    ΔG = ∫ ⟨∂V/∂λ⟩_λ dλ
    """
    def __init__(self, model):
        self.model = model
        self.lambda_values = np.linspace(0, 1, 20)
    
    def run(self, positions):
        """
        运行TI计算
        """
        dG = 0.0
        
        for i, lam in enumerate(self.lambda_values):
            # 设置混合势能
            V_mixed = (1 - lam) * self.VA + lam * self.VB
            
            # 运行模拟,计算⟨∂V/∂λ⟩
            dV_dlambda = self.compute_dV_dlambda(positions, lam)
            
            # 梯形法则积分
            if i > 0:
                dG += 0.5 * (dV_dlambda + dV_dlambda_prev) * (lam - lam_prev)
            
            dV_dlambda_prev = dV_dlambda
            lam_prev = lam
        
        return dG

6. 粗粒化模型

6.1 粗粒化原理

将多个原子合并为单个珠子:

class CoarseGraining:
    """
    粗粒化映射
    
    CG beads = f(atom positions)
    """
    def __init__(self, mapping):
        """
        Args:
            mapping: {bead_id: [atom_indices]}
        """
        self.mapping = mapping
    
    def atom_to_cg(self, positions):
        """
        原子坐标 -> CG珠子坐标
        """
        cg_positions = {}
        
        for bead_id, atom_indices in self.mapping.items():
            # 取原子位置的平均作为珠子位置
            cg_positions[bead_id] = positions[atom_indices].mean(axis=0)
        
        return cg_positions
    
    def cg_to_atom(self, cg_positions, resolution=1):
        """
        CG珠子坐标 -> 原子坐标(回映射)
        使用反向传播学习
        """
        # 简化的反向映射
        atom_positions = np.zeros((self.num_atoms, 3))
        
        for bead_id, atom_indices in self.mapping.items():
            cg_pos = cg_positions[bead_id]
            # 均匀分布
            for atom_idx in atom_indices:
                atom_positions[atom_idx] = cg_pos + np.random.randn(3) * 0.5
        
        return atom_positions

6.2 图神经网络的粗粒化

class GraphCoarseGraining(nn.Module):
    """
    基于GNN的自适应粗粒化
    """
    def __init__(self, d_model, num_coarse):
        super().__init__()
        self.num_coarse = num_coarse
        
        # 节点聚合
        self.aggregator = nn.ModuleList([
            nn.Linear(d_model, d_model) for _ in range(num_coarse)
        ])
    
    def forward(self, node_features, edge_index, batch):
        """
        层次化粗粒化
        """
        # 第一次粗粒化:N -> N/2
        coarse_1 = self.coarse_level(node_features, edge_index, factor=2)
        
        # 第二次粗粒化:N/2 -> N/4
        coarse_2 = self.coarse_level(coarse_1, None, factor=2)
        
        return coarse_1, coarse_2
    
    def fine_level(self, coarse_features):
        """
        从粗粒化特征恢复细粒度特征
        """
        return self.decoder(coarse_features)

7. 计算软件与工具

7.1 MD软件生态

软件特点GPU支持
LAMMPS通用、开源CUDA/OpenCL
GROMACS高性能、易用CUDA/OpenCL
OpenMMPython API、灵活CUDA/OpenCL/Metal
NAMD大规模并行CUDA
AMBER生物分子CUDA

7.2 OpenMM示例

import openmm as mm
from openmm import app
 
# 创建系统
pdb = app.PDBFile("protein.pdb")
forcefield = app.ForceField("amber14-all.xml", "amber14/tip3p.xml")
system = forcefield.createSystem(
    pdb.topology,
    nonbondedMethod=app.PME,
    constraints=app.HBonds
)
 
# 添加自定义力(ML势能)
class MLForce(mm.CustomExternalForce):
    def __init__(self, model):
        super().__init__("0")
        self.model = model
        
        # 添加粒子
        positions = pdb.getPositions()
        for i in range(len(positions)):
            self.addParticle(i)
    
    def calcForceAndEnergy(self, positions, velocities, forces, energy):
        # ML计算
        pass
 
# 创建模拟器
platform = mm.Platform.getPlatformByName("CUDA")
simulation = app.Simulation(pdb.topology, system, mm.LangevinIntegrator(300, 1, 1), platform)
simulation.context.setPositions(pdb.getPositions())
 
# 能量最小化
simulation.minimizeEnergy(maxIterations=1000)
 
# 运行MD
simulation.step(1000000)  # 1 ns
 
# 保存轨迹
with open("trajectory.dcd", "wb") as f:
    # 写入轨迹
    pass

7.3 MDAnalysis

import MDAnalysis as mda
import numpy as np
 
# 加载轨迹
u = mda.Universe("protein.gro", "trajectory.xtc")
 
# 分析RMSD
rmsd = []
for ts in u.trajectory:
    ca = u.select_atoms("name CA")
    rmsd.append(mda.analysis.rms.rmsd(ca.positions, u.trajectory[0][ca.indices]))
 
# 计算回旋半径
for ts in u.trajectory:
    rg = u.select_atoms("protein").radius_of_gyration()

8. 未来趋势

方向现状发展预测
ML力场NequIP、Allegro更大体系、更长模拟
量子-经典混合QM/MM端到端可微
自适应采样基于不确定性主动学习集成
逆设计结构→序列功能驱动设计
量子计算早期探索量子化学加速

参考文献


相关主题NequIP与Allegro, PINNs, 等变图神经网络