# 💡 Summary
融合排序(Score Fusion)
## 本质
```Java
融合排序 = 投票机制
向量检索说:"post_1 最相关(1.0分)"
关键词检索说:"post_3 最相关(1.0分)"
融合后:
- post_1 综合得分 0.947(两个系统都认可)
- post_3 综合得分 0.3(只有一个系统认可)
→ 最终 post_1 获胜
```
# 🧩 Cues
# 🪞Notes
原理:融合排序是在「分数层面」合并多个检索结果,不需要重新计算。
```python
# 归一化分数
vector_scores = [r['vector_score'] for r in all_results.values()]
keyword_scores = [r['keyword_score'] for r in all_results.values()]
vector_max = max(vector_scores) if vector_scores else 1
keyword_max = max(keyword_scores) if keyword_scores else 1
# 计算混合分数
for post_id, scores in all_results.items():
norm_vector = scores['vector_score'] / vector_max # 归一化到 [0, 1]
norm_keyword = scores['keyword_score'] / keyword_max # 归一化到 [0, 1]
scores['final_score'] = alpha * norm_vector + (1 - alpha) * norm_keyword
```
步骤拆解
问题背景:
向量检索给出分数:$[0.89, 0.75, 0.62,...]$ (余弦相似度)
关键词检索给出分数:$[12.5, 8.3, 5.1,...]$ (BM25 分数)
两个分数量纲不同,无法直接比较!
解决方案:归一化
步骤1:提取所有分数
```python
# 假设有 3 个结果
all_results = {
'post_1': {'vector_score': 0.89, 'keyword_score': 12.5},
'post_2': {'vector_score': 0.75, 'keyword_score': 8.3},
'post_3': {'vector_score': 0.0, 'keyword_score': 15.2}, # 只在关键词中召回
}
vector_scores = [0.89, 0.75, 0.0]
keyword_scores = [12.5, 8.3, 15.2]
```
步骤2:找到最大值
```python
vector_max = 0.89
keyword_max = 15.2
```
步骤3:归一化(Min-Max Normalization)
```python
# post_1
norm_vector_1 = 0.89 / 0.89 = 1.0 # 向量检索中排第一
norm_keyword_1 = 12.5 / 15.2 = 0.822 # 关键词检索中排第二
# post_2
norm_vector_2 = 0.75 / 0.89 = 0.843
norm_keyword_2 = 8.3 / 15.2 = 0.546
# post_3
norm_vector_3 = 0.0 / 0.89 = 0.0 # 向量检索没召回
norm_keyword_3 = 15.2 / 15.2 = 1.0 # 关键词检索中排第一
```
步骤4:加权融合
```python
alpha = 0.7 # 向量检索权重 70%,关键词权重 30%
# post_1
final_score_1 = 0.7 * 1.0 + 0.3 * 0.822 = 0.7 + 0.247 = 0.947
# post_2
final_score_2 = 0.7 * 0.843 + 0.3 * 0.546 = 0.590 + 0.164 = 0.754
# post_3
final_score_3 = 0.7 * 0.0 + 0.3 * 1.0 = 0.0 + 0.3 = 0.3
```
步骤5:排序
```python
排序结果:
1. post_1: 0.947 ✅ (向量和关键词都不错)
2. post_2: 0.754
3. post_3: 0.3 (只靠关键词,向量很差)
```