Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
First of all, thank you for sharing these projects with the public! I tried running your knowledge graph RAG on a fresh WSL install of Ubuntu 22.04.4 LTS (GNU/Linux 5.15.153.1-microsoft-standard-WSL2 x86_64). In this PR, I documented the changes I had to make in order to run the code. I am unsure if all of these steps are necessarily correct in all cases, but nevertheless I hope that these improvements in the README and code make life easier for the next person trying out this project.
Please let me know how these fixes can be adapted so that they are able to be merged into the repository :)
External Dependencies
I had to install some external packages that are being called by python packages.
poppler-utils
is necessary to be able to read PDF files. python packages such aspdf2image
use it.ffmpeg libsm6 libxext6
are common cv2 dependencies. Before installing them, I was confronted with aImportError: libGL.so.1: cannot open shared object file: No such file or directory
when trying to process files into the system. This list of packages work, but there might be a more compact way to provide the necessary libraries.tesseract-ocr libtesseract-dev
are necessary forpytesseract
. This is used to parse a PDF into a string.Changes in
requirements.txt
The
requirements.txt
file specifiesRequests==2.32.3
, but another specified dependency requires 2.31.0, making pip unable to resolve the dependency issue.Without the model feature, preprocessing of files eventually runs into a runtime exception.
Changes in the code
I added this snippet at the top of the code to simply make sure the necessary files are there when needed. I am very sure there is a much more suitable spot for this. Please let me know!
Running the project as described in the README led to issues with the import statements. They were fixed for me by using the full path of the module structure.