-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: document search sources string resolver #264
base: main
Are you sure you want to change the base?
feat: document search sources string resolver #264
Conversation
Code Coverage Summary
Diff against main
Results for commit: b2a9ee3 Minimum allowed coverage is ♻️ This comment has been updated with latest results |
- Basic file URI ingestion - Wildcard pattern matching
6be3c3f
to
1b7bd93
Compare
d3b1e38
to
4505521
Compare
Trivy scanning results. .venv/lib/python3.10/site-packages/PyJWT-2.9.0.dist-info/METADATA (secrets)Total: 1 (MEDIUM: 1, HIGH: 0, CRITICAL: 0) MEDIUM: JWT (jwt-token) .venv/lib/python3.10/site-packages/litellm/llms/huggingface/huggingface_llms_metadata/hf_text_generation_models.txt (secrets)Total: 1 (MEDIUM: 0, HIGH: 0, CRITICAL: 1) CRITICAL: HuggingFace (hugging-face-access-token) .venv/lib/python3.10/site-packages/litellm/proxy/_types.py (secrets)Total: 1 (MEDIUM: 1, HIGH: 0, CRITICAL: 0) MEDIUM: Slack (slack-web-hook) |
packages/ragbits-document-search/src/ragbits/document_search/_main.py
Outdated
Show resolved
Hide resolved
packages/ragbits-document-search/src/ragbits/document_search/documents/sources.py
Show resolved
Hide resolved
packages/ragbits-document-search/src/ragbits/document_search/documents/sources.py
Show resolved
Hide resolved
path_obj = Path.cwd() / path_obj | ||
|
||
if "*" in str(path_obj): | ||
# If path contains wildcards, use its parent as base |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems confusing to me, could you explain? I don't get how this code can lead to support for cases explained in the method's docstring (e.g. simple "*.py" or patterns with "?").
To be honest, in my mind the whole method could just be this one line:
return [cls(path=f) for f in Path().glob(pattern)]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks, you are right
document_search = DocumentSearch.from_config(CONFIG) | ||
|
||
# Test ingesting from URI with wildcard | ||
await document_search.ingest(f"file://{temp_dir}/test*.txt") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be good to test other supported syntax options, for example file://{temp_dir}/test?.txt
, file://*.txt
, file://**/test?.txt
This PR should probably also include some documentation |
…main.py Co-authored-by: Ludwik Trammer <[email protected]>
Solves #221