Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Knowledge Graph RAG: fix setup #148

Open
wants to merge 4 commits into
base: main
Choose a base branch
from
Open

Conversation

gschup
Copy link

@gschup gschup commented Jul 12, 2024

First of all, thank you for sharing these projects with the public! I tried running your knowledge graph RAG on a fresh WSL install of Ubuntu 22.04.4 LTS (GNU/Linux 5.15.153.1-microsoft-standard-WSL2 x86_64). In this PR, I documented the changes I had to make in order to run the code. I am unsure if all of these steps are necessarily correct in all cases, but nevertheless I hope that these improvements in the README and code make life easier for the next person trying out this project.

Please let me know how these fixes can be adapted so that they are able to be merged into the repository :)

External Dependencies

sudo apt install poppler-utils ffmpeg libsm6 libxext6 tesseract-ocr libtesseract-dev

I had to install some external packages that are being called by python packages.

  • poppler-utils is necessary to be able to read PDF files. python packages such as pdf2image use it.
  • ffmpeg libsm6 libxext6 are common cv2 dependencies. Before installing them, I was confronted with a ImportError: libGL.so.1: cannot open shared object file: No such file or directory when trying to process files into the system. This list of packages work, but there might be a more compact way to provide the necessary libraries.
  • tesseract-ocr libtesseract-dev are necessary for pytesseract. This is used to parse a PDF into a string.

Changes in requirements.txt

Requests==2.31.0

The requirements.txt file specifies Requests==2.32.3, but another specified dependency requires 2.31.0, making pip unable to resolve the dependency issue.

pymilvus[model]==2.4.3

Without the model feature, preprocessing of files eventually runs into a runtime exception.

Changes in the code

import nltk
nltk.download('averaged_perceptron_tagger')

I added this snippet at the top of the code to simply make sure the necessary files are there when needed. I am very sure there is a much more suitable spot for this. Please let me know!

from utils.preprocessor import extract_triples

Running the project as described in the README led to issues with the import statements. They were fixed for me by using the full path of the module structure.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants