MergeKit工具详解
1. 概述
MergeKit 1 是最流行的开源模型合并工具,支持多种合并方法,提供简单易用的 YAML 配置接口。
2. 安装
pip install mergekit[full]或仅安装核心功能:
pip install mergekit3. 基本用法
3.1 YAML配置模板
# merge.yaml
models:
- model: meta-llama/Llama-2-7b-hf
parameters:
weight: 1.0
- model: meta-llama/Llama-2-7b-chat-hf
parameters:
weight: 1.0
merge_method: ties
base_model: meta-llama/Llama-2-7b-hf
ties:
density: 0.5
weight: 0.53.2 执行合并
mergekit-yaml merge.yaml output --copy-config4. 合并方法配置
4.1 简单平均
# average.yaml
models:
- model: model1
parameters:
weight: 0.5
- model: model2
parameters:
weight: 0.5
merge_method: average4.2 TIES-Merging
# ties.yaml
models:
- model: model1
- model: model2
- model: model3
merge_method: ties
base_model: base_model
ties:
density: 0.7 # 保留70%的参数
weight: 0.5 # 最终权重
normalize: true # 归一化4.3 DARE
# dare.yaml
models:
- model: model1
- model: model2
merge_method: dare
base_model: base_model
dare:
density: 0.3 # 保留30%的参数
weight: 0.5 # 最终权重
redistribute_bias: true4.4 SLERP
# slerp.yaml
models:
- model: model1
- model: model2
merge_method: slerp
base_model: model1 # SLERP需要指定基准模型
slerp:
t: 0.5 # 插值参数4.5 Task Arithmetic
# task_arithmetic.yaml
models:
- model: model1
parameters:
weight: 1.0
- model: model2
parameters:
weight: -0.5 # 负权重 = 遗忘
merge_method: task_arithmetic
base_model: base_model5. 高级配置
5.1 部分参数合并
# partial_merge.yaml
models:
- model: model1
- model: model2
merge_method: ties
# 只合并attention层
layer_spec:
attention:
density: 0.8
mlp:
density: 0.5
embedding:
merge: false # 不合并embedding5.2 模型混合
# mixture.yaml
slices:
- sources:
- model: expert1
layer_range: [0, 16]
- model: expert2
layer_range: [16, 32]
- model: expert3
layer_range: [32, 48]5.3 参数缩放
# scaling.yaml
models:
- model: model1
parameters:
scale: 1.2
- model: model2
parameters:
scale: 0.8
merge_method: linear
base_model: base_model6. Python API
6.1 基本使用
import mergekit.merge
# 方法1:YAML配置
mergekit.merge.main(
input_path='merge.yaml',
output_path='./merged_model',
copy_tokenizer=True,
copy_tokenizer_config=True
)
# 方法2:Python API
from mergekit.merge import merge_models
from mergekit.config import MergeConfig
config = MergeConfig(
models=[
ModelReference(model="model1"),
ModelReference(model="model2"),
],
merge_method="ties",
base_model="base_model"
)
merge_models(config, output_path="./merged_model")6.2 自定义合并方法
from mergekit.merge_methods import MergeMethod
from mergekit.config import MergeMethodConfig
class CustomMerge(MergeMethod):
name = "custom"
def __call__(
self,
tensor_manager: TensorManager,
inputs: Dict[str, torch.Tensor],
config: MergeMethodConfig,
**kwargs
) -> torch.Tensor:
# 自定义合并逻辑
tensors = list(inputs.values())
weights = [c.weight for c in config.model_configs]
# 自定义加权逻辑
result = sum(w * t for w, t in zip(weights, tensors))
return result
# 注册并使用
mergekit.merge.register_merge_method("custom", CustomMerge)
# YAML中使用
# merge_method: custom6.3 梯度合并(联邦学习)
from mergekit.federated import federated_average
def federated_aggregate(client_updates, client_weights):
"""
联邦学习聚合
"""
aggregated = federated_average(
updates=client_updates,
weights=client_weights
)
return aggregated7. 最佳实践
7.1 模型准备
# 1. 确保模型格式一致
# MergeKit支持:safetensors, pytorch, gguf
# 2. 下载模型
huggingface-cli download meta-llama/Llama-2-7b-hf
# 3. 验证模型可加载
python -c "from transformers import AutoModel; AutoModel.from_pretrained('model_path')"7.2 配置检查
# 验证YAML配置
mergekit-yaml validate merge.yaml
# 预览合并计划
mergekit-yaml merge.yaml --dry-run7.3 性能优化
# 使用CPU合并(内存优化)
mergekit-yaml merge.yaml output --low-cpu-memory
# 使用GPU加速
mergekit-yaml merge.yaml output --cuda
# 并行处理
mergekit-yaml merge.yaml output --jobs 48. 常见问题
8.1 OOM错误
# 减少内存使用
models:
- model: large_model_1
parameters:
trust_remote_code: true
- model: large_model_2
# 或使用分片加载
merge_method: ties
shard_model: true8.2 模型不兼容
# 转换模型格式
mergekit-copy model1 --output-fp16 --output-dir ./converted
# 对齐配置
mergekit-align-configs model1 model2 --reference base_model8.3 合并质量差
# 检查模型对齐
from mergekit.align import align_model_parameters
aligned = align_model_parameters(
model1.state_dict(),
model2.state_dict(),
method='euclidean'
)9. 评估工具
# 使用mergekit-eval评估
mergekit-eval merged_model --tasks mmlu,truthfulqa,hellaswag
# 或使用lm-evaluation-harness
lm_eval --model hf \
--model_args pretrained=./merged_model \
--tasks mmlu \
--batch_size 810. 参考资料
Footnotes
-
Labonne, M. (2024). MergeKit: A toolkit for merging large language models. https://github.com/arcee-ai/mergekit ↩