MergeKit工具详解

1. 概述

MergeKit 1 是最流行的开源模型合并工具,支持多种合并方法,提供简单易用的 YAML 配置接口。

2. 安装

pip install mergekit[full]

或仅安装核心功能:

pip install mergekit

3. 基本用法

3.1 YAML配置模板

# merge.yaml
models:
  - model: meta-llama/Llama-2-7b-hf
    parameters:
      weight: 1.0
  - model: meta-llama/Llama-2-7b-chat-hf
    parameters:
      weight: 1.0
 
merge_method: ties
base_model: meta-llama/Llama-2-7b-hf
 
ties:
  density: 0.5
  weight: 0.5

3.2 执行合并

mergekit-yaml merge.yaml output --copy-config

4. 合并方法配置

4.1 简单平均

# average.yaml
models:
  - model: model1
    parameters:
      weight: 0.5
  - model: model2
    parameters:
      weight: 0.5
 
merge_method: average

4.2 TIES-Merging

# ties.yaml
models:
  - model: model1
  - model: model2
  - model: model3
 
merge_method: ties
base_model: base_model
 
ties:
  density: 0.7      # 保留70%的参数
  weight: 0.5      # 最终权重
  normalize: true  # 归一化

4.3 DARE

# dare.yaml
models:
  - model: model1
  - model: model2
 
merge_method: dare
base_model: base_model
 
dare:
  density: 0.3      # 保留30%的参数
  weight: 0.5      # 最终权重
  redistribute_bias: true

4.4 SLERP

# slerp.yaml
models:
  - model: model1
  - model: model2
 
merge_method: slerp
base_model: model1  # SLERP需要指定基准模型
 
slerp:
  t: 0.5           # 插值参数

4.5 Task Arithmetic

# task_arithmetic.yaml
models:
  - model: model1
    parameters:
      weight: 1.0
  - model: model2
    parameters:
      weight: -0.5  # 负权重 = 遗忘
 
merge_method: task_arithmetic
base_model: base_model

5. 高级配置

5.1 部分参数合并

# partial_merge.yaml
models:
  - model: model1
  - model: model2
 
merge_method: ties
 
# 只合并attention层
layer_spec:
  attention:
    density: 0.8
  mlp:
    density: 0.5
  embedding:
    merge: false  # 不合并embedding

5.2 模型混合

# mixture.yaml
slices:
  - sources:
      - model: expert1
        layer_range: [0, 16]
      - model: expert2
        layer_range: [16, 32]
      - model: expert3
        layer_range: [32, 48]

5.3 参数缩放

# scaling.yaml
models:
  - model: model1
    parameters:
      scale: 1.2
  - model: model2
    parameters:
      scale: 0.8
 
merge_method: linear
base_model: base_model

6. Python API

6.1 基本使用

import mergekit.merge
 
# 方法1:YAML配置
mergekit.merge.main(
    input_path='merge.yaml',
    output_path='./merged_model',
    copy_tokenizer=True,
    copy_tokenizer_config=True
)
 
# 方法2:Python API
from mergekit.merge import merge_models
from mergekit.config import MergeConfig
 
config = MergeConfig(
    models=[
        ModelReference(model="model1"),
        ModelReference(model="model2"),
    ],
    merge_method="ties",
    base_model="base_model"
)
 
merge_models(config, output_path="./merged_model")

6.2 自定义合并方法

from mergekit.merge_methods import MergeMethod
from mergekit.config import MergeMethodConfig
 
class CustomMerge(MergeMethod):
    name = "custom"
    
    def __call__(
        self,
        tensor_manager: TensorManager,
        inputs: Dict[str, torch.Tensor],
        config: MergeMethodConfig,
        **kwargs
    ) -> torch.Tensor:
        # 自定义合并逻辑
        tensors = list(inputs.values())
        weights = [c.weight for c in config.model_configs]
        
        # 自定义加权逻辑
        result = sum(w * t for w, t in zip(weights, tensors))
        return result
 
# 注册并使用
mergekit.merge.register_merge_method("custom", CustomMerge)
 
# YAML中使用
# merge_method: custom

6.3 梯度合并(联邦学习)

from mergekit.federated import federated_average
 
def federated_aggregate(client_updates, client_weights):
    """
    联邦学习聚合
    """
    aggregated = federated_average(
        updates=client_updates,
        weights=client_weights
    )
    return aggregated

7. 最佳实践

7.1 模型准备

# 1. 确保模型格式一致
# MergeKit支持:safetensors, pytorch, gguf
 
# 2. 下载模型
huggingface-cli download meta-llama/Llama-2-7b-hf
 
# 3. 验证模型可加载
python -c "from transformers import AutoModel; AutoModel.from_pretrained('model_path')"

7.2 配置检查

# 验证YAML配置
mergekit-yaml validate merge.yaml
 
# 预览合并计划
mergekit-yaml merge.yaml --dry-run

7.3 性能优化

# 使用CPU合并(内存优化)
mergekit-yaml merge.yaml output --low-cpu-memory
 
# 使用GPU加速
mergekit-yaml merge.yaml output --cuda
 
# 并行处理
mergekit-yaml merge.yaml output --jobs 4

8. 常见问题

8.1 OOM错误

# 减少内存使用
models:
  - model: large_model_1
    parameters:
      trust_remote_code: true
  - model: large_model_2
 
# 或使用分片加载
merge_method: ties
shard_model: true

8.2 模型不兼容

# 转换模型格式
mergekit-copy model1 --output-fp16 --output-dir ./converted
 
# 对齐配置
mergekit-align-configs model1 model2 --reference base_model

8.3 合并质量差

# 检查模型对齐
from mergekit.align import align_model_parameters
 
aligned = align_model_parameters(
    model1.state_dict(),
    model2.state_dict(),
    method='euclidean'
)

9. 评估工具

# 使用mergekit-eval评估
mergekit-eval merged_model --tasks mmlu,truthfulqa,hellaswag
 
# 或使用lm-evaluation-harness
lm_eval --model hf \
    --model_args pretrained=./merged_model \
    --tasks mmlu \
    --batch_size 8

10. 参考资料

Footnotes

  1. Labonne, M. (2024). MergeKit: A toolkit for merging large language models. https://github.com/arcee-ai/mergekit