-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug]: Document docstore/data not being created on occassion #15888
Comments
Hey there, @jcpaulco! I'm here to help you with any bugs, questions, or contributions you have. Let's squash those bugs together! The issue with the Here is the relevant code snippet: def _get_kv_pairs_for_insert(
self, node: BaseNode, ref_doc_info: Optional[RefDocInfo], store_text: bool
) -> Tuple[
Optional[Tuple[str, dict]],
Optional[Tuple[str, dict]],
Optional[Tuple[str, dict]],
]:
node_kv_pair = None
metadata_kv_pair = None
ref_doc_kv_pair = None
node_key = node.node_id
data = doc_to_json(node)
if store_text:
node_kv_pair = (node_key, data)
# update doc_collection if needed
metadata = {"doc_hash": node.hash}
if ref_doc_info is not None and node.ref_doc_id:
if node.node_id not in ref_doc_info.node_ids:
ref_doc_info.node_ids.append(node.node_id)
if not ref_doc_info.metadata:
ref_doc_info.metadata = node.metadata or {}
# update metadata with map
metadata["ref_doc_id"] = node.ref_doc_id
metadata_kv_pair = (node_key, metadata)
ref_doc_kv_pair = (node.ref_doc_id, ref_doc_info.to_dict())
else:
metadata_kv_pair = (node_key, metadata)
return node_kv_pair, metadata_kv_pair, ref_doc_kv_pair In the def add_documents(
self,
nodes: Sequence[BaseNode],
allow_update: bool = True,
batch_size: Optional[int] = None,
store_text: bool = True,
) -> None:
...
for node in nodes:
...
(
node_kv_pair,
metadata_kv_pair,
ref_doc_kv_pair,
) = self._get_kv_pairs_for_insert(node, ref_doc_info, store_text)
if node_kv_pair is not None:
node_kv_pairs.append(node_kv_pair)
if metadata_kv_pair is not None:
metadata_kv_pairs.append(metadata_kv_pair)
if ref_doc_kv_pair is not None:
key = ref_doc_kv_pair[0]
if key not in ref_doc_kv_pairs:
ref_doc_kv_pairs[key] = []
ref_doc_kv_pairs[key].append(ref_doc_kv_pair)
self._kvstore.put_all(
node_kv_pairs,
collection=self._node_collection,
batch_size=batch_size,
)
self._kvstore.put_all(
metadata_kv_pairs,
collection=self._metadata_collection,
batch_size=batch_size,
)
... Therefore, if |
We're not passing any value for |
@jcpaulco looking at the source code for llama_index/llama-index-core/llama_index/core/storage/docstore/keyval_docstore.py Line 206 in 25e8e1b
The only thing that comes to mind is, maybe for the specific docstore/kvstore backend you are using, |
Maybe |
we thought perhaps this but its never getting created - the issues were seeing data back to several months ago |
it looks like |
The async version is here (and is the exact same 😅) llama_index/llama-index-core/llama_index/core/storage/docstore/keyval_docstore.py Line 308 in 25e8e1b
|
ok good yeah it looks like this should take care of it appropriately so it seems very odd that its not being created
|
@jcpaulco which docstore/kvstore are you using, out of curiosity? Perhaps there's something about its specific implementation of |
here's the docstore we're creating. It looks like the
|
ah ok, so postgres I'm far from an expert on Postgres, but maybe you can spot something in this code that might cause this to happen? |
thanks - i'm getting thrown into a new project here and have little to no idea what i'm digging through so i appreciate the patience I'm no expert on postgres either but from my eyes it seems alright..? |
in your opinion could the line before be an issue?
|
it looks like this sync call perhaps that isn't finished adding the documents and then |
could this be related actually? |
Hmm, I don't think that issue is related, that's more so talking about the lack of an async insert function on the index class |
Nope - no threading here |
@logan-markewich Update here - We found some logs that provide a good entry point to the issue. It looks like we're getting a
|
@logan-markewich what's your thoughts on this? |
It feels like these kvstore.aput_all() calls should return either the success or failure of the doc creation and perhaps include a retry as well |
Bug Description
Occasionally the document
docstore/data
is not being created (KVDocumentStore) whenadd_documents
is calledHowever
docstore/metadata
is consistently being created.This is leading to an issue when
aget_document
is called and the document cannot be found because it does not exist for the/data
collectionVersion
0.10.67.post1
Steps to Reproduce
I've been unable to intentionally reproduce this. However we have several thousand occurrences of the
docstore/data
document not being created whenadd_documents
is called.99% of the time it works and I can't determine what the cause of failure to create this is.
Relevant Logs/Tracbacks
The text was updated successfully, but these errors were encountered: