We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ubuntu 20.04
所有版本
docker部署
使用xinference部署的m3e-large模型,model id 设置为 text-embedding-ada-002, 使用openai的embedding库获得向量结果是错误的,只要第一个字相同,向量结果就是相同的
import os from langchain_openai import OpenAIEmbeddings os.environ["OPENAI_API_KEY"] = "empty" os.environ["OPENAI_API_BASE"] = "http://127.0.0.1:9997/v1/"
text = ['《中华人民共和国突发事件应对法》\n\n\n','《中华人民共和国突发事件应对法》\n\n\n(2007年8月30日第十届全国人民代表大会常务委员会第二十九次\n会议通过)\n\n\n']
embedding_model = OpenAIEmbeddings(model="text-embedding-ada-002",chunk_size=1000) result1 = embedding_model.embed_query(text[0]) result2 = embedding_model.embed_query(text[1])
if str(result1) == str(result2): print("相同") else: print("不相同")
print(result2[:5]) print(result1[:5]) print() print(result2[-5:]) print(result1[-5:])
不相同
The text was updated successfully, but these errors were encountered:
@codingl2k1 能看下吗?
Sorry, something went wrong.
找到问题了,是openai embedding将query转成tokens了,我增加代码将tokens转回query就好了,等我提交个pr修复一下这个问题
codingl2k1
No branches or pull requests
System Info / 系統信息
ubuntu 20.04
Running Xinference with Docker? / 是否使用 Docker 运行 Xinfernece?
Version info / 版本信息
所有版本
The command used to start Xinference / 用以启动 xinference 的命令
docker部署
Reproduction / 复现过程
使用xinference部署的m3e-large模型,model id 设置为
text-embedding-ada-002,
使用openai的embedding库获得向量结果是错误的,只要第一个字相同,向量结果就是相同的
import os
from langchain_openai import OpenAIEmbeddings
os.environ["OPENAI_API_KEY"] = "empty"
os.environ["OPENAI_API_BASE"] = "http://127.0.0.1:9997/v1/"
text = ['《中华人民共和国突发事件应对法》\n\n\n','《中华人民共和国突发事件应对法》\n\n\n(2007年8月30日第十届全国人民代表大会常务委员会第二十九次\n会议通过)\n\n\n']
embedding_model = OpenAIEmbeddings(model="text-embedding-ada-002",chunk_size=1000)
result1 = embedding_model.embed_query(text[0])
result2 = embedding_model.embed_query(text[1])
if str(result1) == str(result2):
print("相同")
else:
print("不相同")
print(result2[:5])
print(result1[:5])
print()
print(result2[-5:])
print(result1[-5:])
Expected behavior / 期待表现
不相同
The text was updated successfully, but these errors were encountered: