You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Is this a new feature, an improvement, or a change to existing functionality?
New Feature
How would you describe the priority of this feature request
Significant improvement
Please provide a clear description of problem this feature solves
Currently, when I perform an extraction, upload the results to the Milvus VDB, and then query the VDB with LlamaIndex, the node Ids of the retrieved results change every retrieval, even if the content is the same. For example if I upload a document to the VDB:
from nv_ingest_client.client import NvIngestClient
from nv_ingest_client.primitives import JobSpec
from nv_ingest_client.primitives.tasks import ExtractTask
from nv_ingest_client.primitives.tasks import SplitTask
from nv_ingest_client.primitives.tasks import EmbedTask
from nv_ingest_client.primitives.tasks import VdbUploadTask
from nv_ingest_client.util.file_processing.extract import extract_file_content
import logging, time
logger = logging.getLogger("nv_ingest_client")
file_name = "data/multimodal_test.pdf"
file_content, file_type = extract_file_content(file_name)
job_spec = JobSpec(
document_type=file_type,
payload=file_content,
source_id=file_name,
source_name=file_name,
extended_options={"tracing_options": {"trace": True, "ts_send": time.time_ns()}},
)
extract_task = ExtractTask(
document_type=file_type,
extract_text=True,
extract_images=False,
extract_tables=True,
)
embed_task = EmbedTask(
text=True,
tables=True,
)
vdb_upload_task = VdbUploadTask()
job_spec.add_task(extract_task)
job_spec.add_task(embed_task)
job_spec.add_task(vdb_upload_task)
client = NvIngestClient()
job_id = client.add_job(job_spec)
client.submit_job(job_id, "morpheus_task_queue")
result = client.fetch_job_result(job_id, timeout=60)
res = retriever.retrieve("What was the dog doing?")
And get the id:
res[0].id_
I get:
'd87a82ea-c968-42b7-84ae-f628b759eac6'
However If i do it again:
res = retriever.retrieve("What was the dog doing?")
res[0].id_
I get:
'ab450386-7d9b-4d1f-8b58-2735d4cacd76'
Despite that the text is the same in both cases:
'locations. Animal Activity Place Giraffe Driving a car. At the beach Lion Putting on sunscreen At the park. Cat Jumping onto a laptop In a home office Dog Chasing a squirrel In the front yard'
This might be a LlamaIndex issue but when I upload documents to Milvus through LlamaIndex and set the Id with LLamaIndex I get a stable Id when retrieving.
Describe the feature, and optionally a solution or implementation and any alternatives
Ideally I would like the id to be consistent and mapped to the pk field in the nv_ingest_collection
Additional context
No response
The text was updated successfully, but these errors were encountered:
ChrisJar
changed the title
[FEA]: Consistent ids when connecting llamaIndex to Milvus
[FEA]: Consistent ids when connecting llamaIndex to a Milvus VDB populated by NV-Ingest
Oct 30, 2024
Is this a new feature, an improvement, or a change to existing functionality?
New Feature
How would you describe the priority of this feature request
Significant improvement
Please provide a clear description of problem this feature solves
Currently, when I perform an extraction, upload the results to the Milvus VDB, and then query the VDB with LlamaIndex, the node Ids of the retrieved results change every retrieval, even if the content is the same. For example if I upload a document to the VDB:
And then connect to the VDB with LlamaIndex:
And then retrieve a document:
And get the id:
I get:
However If i do it again:
I get:
Despite that the text is the same in both cases:
This might be a LlamaIndex issue but when I upload documents to Milvus through LlamaIndex and set the Id with LLamaIndex I get a stable Id when retrieving.
Describe the feature, and optionally a solution or implementation and any alternatives
Ideally I would like the id to be consistent and mapped to the pk field in the
nv_ingest_collection
Additional context
No response
The text was updated successfully, but these errors were encountered: