milvus语义搜索方式的大bug：milvus的整型id用了milvusClient.insert()方法之后，返回的id是字符串类型的，中间存在精度损失，导致mongo与milvus之间数据对应关系错误

**例行检查**

[//]: # '方框内填 x 表示打钩'

- [√] 我已确认目前没有类似 issue
- [√] 我已完整查看过项目 README，以及[项目文档](https://doc.tryfastgpt.ai/docs/intro/)
- [√] 我使用了自己的 key，并确认我的 key 是可正常使用的
- [√] 我理解并愿意跟进此 issue，协助测试和提供反馈
- [x] 我理解并认可上述内容，并理解项目维护者精力有限，**不遵循规则的 issue 可能会被无视或直接关闭**

**你的版本**

- [ ] 公有云版本
- [ ] 私有部署版本, 具体版本号: 

**问题描述, 日志截图**

  init = async () => {
    const client = await this.getClient();

    // init db(zilliz cloud will error)
    try {
      const { db_names } = await client.listDatabases();

      if (!db_names.includes(DatasetVectorDbName)) {
        await client.createDatabase({
          db_name: DatasetVectorDbName
        });
      }

      await client.useDatabase({
        db_name: DatasetVectorDbName
      });
    } catch (error) {}

    // init collection and index
    const { value: hasCollection } = await client.hasCollection({
      collection_name: DatasetVectorTableName
    });
    if (!hasCollection) {
      const result = await client.createCollection({
        collection_name: DatasetVectorTableName,
        description: 'Store dataset vector',
        enableDynamicField: true,
        fields: [
          {
            name: 'id',
            data_type: DataType.Int64,
            is_primary_key: true,
            autoID: true
          },
          {
            name: 'vector',
            data_type: DataType.FloatVector,
            dim: 1536
          },
          { name: 'teamId', data_type: DataType.VarChar, max_length: 64 },
          { name: 'datasetId', data_type: DataType.VarChar, max_length: 64 },
          { name: 'collectionId', data_type: DataType.VarChar, max_length: 64 },
          {
            name: 'createTime',
            data_type: DataType.Int64
          }
        ],
------------------------
上述代码片段中，milvus的向量数据的id是int64类型的
------------------------
  insert = async (props: InsertVectorControllerProps): Promise<{ insertId: string }> => {
    const client = await this.getClient();
    const { teamId, datasetId, collectionId, vector, retry = 3 } = props;

    try {
      const result = await client.insert({
        collection_name: DatasetVectorTableName,
        data: [
          {
            vector,
            teamId: String(teamId),
            datasetId: String(datasetId),
            collectionId: String(collectionId),
            createTime: Date.now()
          }
        ]
      });

      console.log("result1234", result)
      // console.log("result123456", result.IDs.str_id.data)

      const insertId = (() => {
        if ('int_id' in result.IDs) {
          return `${result.IDs.int_id.data?.[0]}`;
        }
        return `${result.IDs.str_id.data?.[0]}`;
      })();

      return {
        insertId: insertId
      };
    } catch (error) {
      if (retry <= 0) {
        return Promise.reject(error);
      }
      await delay(500);
      return this.insert({
        ...props,
        retry: retry - 1
      });
    }
  };
--------------------------------
上面这段代码中const result = await client.insert({这里的result结果是：
{
  succ_index: [ 0 ],
  err_index: [],
  status: {
    extra_info: {},
    error_code: 'Success',
    reason: '',
    code: 0,
    retriable: false,
    detail: ''
  },
  IDs: { int_id: { data: [Array] }, id_field: 'int_id' },
  acknowledged: false,
  insert_cnt: '1',
  delete_cnt: '0',
  upsert_cnt: '0',
  timestamp: '453151970661498903'
}
在这里面  IDs: { int_id: { data: [Array] }, id_field: 'int_id' },这个data是类似['453132573482114000']的。
----------------------------------
问题是插入的数据可能是'453132573482114001',然后这个result里面的数据精度损失后四舍五入是这样的：'453132573482114000'。
以下尝试了几组数据：
[@LIN-99F1B266A15 milvus]$ node 
Welcome to Node.js v20.14.0.
Type ".help" for more information.
> aa = 453132573482114001
453132573482114000
> aa = 453132573482114399
453132573482114370
> aa = 453132573482113999
453132573482114000
> 
这就导致原始文本与向量之间对应关系错误了
而且还会影响到向量搜索的代码，因为向量搜索的代码同样存在精度损失，导致搜索出来的id也不对
------------------------------------
希望官方的分析下：pg大概率也存在这种问题。现在没出现应该是pg的数据规模还不够大。id的位数较小，不会存在精度损失
------------------------------------
目前已经搞定的解决方法：
把milvus的id改为varchar，但是这样会导致以前训练好的数据失效，希望官方能够有更好的解决方法。


**复现步骤**
请不要直接使用自己的pnpm dev方式来复现，这样复现不出来，原因尚不明确。使用官方的镜像来复现可以，我们尝试了4.8.3-4.8.11的版本，一直都有这个问题。
另外：https://github.com/labring/FastGPT/issues/2836这个已经有人反馈过的数据id不一致的 同样属于这个问题，希望官方重视一下，不要用未复现来简单避开这个问题。

**预期结果**

**相关截图**


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

milvus语义搜索方式的大bug：milvus的整型id用了milvusClient.insert()方法之后，返回的id是字符串类型的，中间存在精度损失，导致mongo与milvus之间数据对应关系错误 #2895

上述代码片段中，milvus的向量数据的id是int64类型的

};

这就导致原始文本与向量之间对应关系错误了
而且还会影响到向量搜索的代码，因为向量搜索的代码同样存在精度损失，导致搜索出来的id也不对

希望官方的分析下：pg大概率也存在这种问题。现在没出现应该是pg的数据规模还不够大。id的位数较小，不会存在精度损失

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

milvus语义搜索方式的大bug：milvus的整型id用了milvusClient.insert()方法之后，返回的id是字符串类型的，中间存在精度损失，导致mongo与milvus之间数据对应关系错误 #2895

Description

上述代码片段中，milvus的向量数据的id是int64类型的

};

这就导致原始文本与向量之间对应关系错误了 而且还会影响到向量搜索的代码，因为向量搜索的代码同样存在精度损失，导致搜索出来的id也不对

希望官方的分析下：pg大概率也存在这种问题。现在没出现应该是pg的数据规模还不够大。id的位数较小，不会存在精度损失

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

这就导致原始文本与向量之间对应关系错误了
而且还会影响到向量搜索的代码，因为向量搜索的代码同样存在精度损失，导致搜索出来的id也不对