使用openai的embedding库获得向量结果是错误的 #2589

xiyuan-lee · 2024-11-26T12:23:08Z

System Info / 系統信息

ubuntu 20.04

Running Xinference with Docker? / 是否使用 Docker 运行 Xinfernece？

docker / docker
pip install / 通过 pip install 安装
installation from source / 从源码安装

Version info / 版本信息

所有版本

The command used to start Xinference / 用以启动 xinference 的命令

docker部署

Reproduction / 复现过程

使用xinference部署的m3e-large模型，model id 设置为
text-embedding-ada-002，
使用openai的embedding库获得向量结果是错误的，只要第一个字相同，向量结果就是相同的

import os
from langchain_openai import OpenAIEmbeddings
os.environ["OPENAI_API_KEY"] = "empty"
os.environ["OPENAI_API_BASE"] = "http://127.0.0.1:9997/v1/"

text = ['《中华人民共和国突发事件应对法》\n\n\n','《中华人民共和国突发事件应对法》\n\n\n（2007年8月30日第十届全国人民代表大会常务委员会第二十九次\n会议通过）\n\n\n']

embedding_model = OpenAIEmbeddings(model="text-embedding-ada-002",chunk_size=1000)
result1 = embedding_model.embed_query(text[0])
result2 = embedding_model.embed_query(text[1])

if str(result1) == str(result2):
print("相同")
else:
print("不相同")

print(result2[:5])
print(result1[:5])
print()
print(result2[-5:])
print(result1[-5:])

Expected behavior / 期待表现

不相同

qinxuye · 2024-11-27T07:56:47Z

@codingl2k1 能看下吗？

xiyuan-lee · 2024-11-27T11:26:27Z

找到问题了，是openai embedding将query转成tokens了，我增加代码将tokens转回query就好了，等我提交个pr修复一下这个问题

XprobeBot added this to the v1.x milestone Nov 26, 2024

codingl2k1 self-assigned this Nov 27, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

使用openai的embedding库获得向量结果是错误的 #2589

使用openai的embedding库获得向量结果是错误的 #2589

xiyuan-lee commented Nov 26, 2024 •

edited

Loading

qinxuye commented Nov 27, 2024

xiyuan-lee commented Nov 27, 2024

使用openai的embedding库获得向量结果是错误的 #2589

使用openai的embedding库获得向量结果是错误的 #2589

Comments

xiyuan-lee commented Nov 26, 2024 • edited Loading

System Info / 系統信息

Running Xinference with Docker? / 是否使用 Docker 运行 Xinfernece？

Version info / 版本信息

The command used to start Xinference / 用以启动 xinference 的命令

Reproduction / 复现过程

Expected behavior / 期待表现

qinxuye commented Nov 27, 2024

xiyuan-lee commented Nov 27, 2024

xiyuan-lee commented Nov 26, 2024 •

edited

Loading