-
Notifications
You must be signed in to change notification settings - Fork 30
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Memory leak ? #26
Comments
Hey @iLoveBug I'm happy to hear that fastmlx works well for graphrag. Could you give me a reproducible example? For instance, how you start the server and the requests. |
Hello,
Thanks for your reply.
Here attached is my configuration and data file for graphrag. I use a Chinese version of the famous fiction: 80 days around the world for test. You can replace with English one or any others.
I use Llama 3.1 70b as the LLM model, I also test with 8b version, and I got the same issue.
I use two virtual environments both with python 3.11.9:
One for graphrag, with version 0.2.1
Another for mlx, with version 0.16.1
I don’t remember the exact reason why I have to use two separated environments, mostly the conflict of some certain packages.
To reproduce the issue, just:
1/ ollama is used for local embedding, I use nomic-embed-text model
2/ in mlx environment, run fastmlx
3/ in graphrag environment, run python -m graphrag.index —root ./80days-ollama-llam3.1
Regards,

… 2024年8月13日 21:36,Prince Canuma ***@***.***> 写道:
Hey @iLoveBug <https://github.com/iLoveBug>
I'm happy to hear that fastmlx works well for graphrag.
Could you give me a reproducible example?
For instance, how you start the server and the requests.
—
Reply to this email directly, view it on GitHub <https://github.com/Blaizzy/fastmlx/issues/26#issuecomment-2286273293>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AJYUF75GWHTKHEHGNLODA3TZRIDU5AVCNFSM6AAAAABMKZIVU6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDEOBWGI3TGMRZGM>.
You are receiving this because you were mentioned.
|
Can you share the exact example on how to replicate this issue. Please include as much detail as possible:) |
The request and response you get |
80days-ollama-llama3.1.zip |
Thanks for the example @iLoveBug! But I'm afraid I don't understand what the error is. Can you elaborate on what you mean by "except the memory consumption growth as time goes by."? |
Sorry for the confuse. My problem is graphrag consumes a lot of token via many request to LLM, for example, it first request to extract entities and relationships from text chunks, and then write some community report based on the extracted entities and relationships. During the process, I saw the memory consumption growth, and got slower and slower. This is why I guess there maybe some memory leak. The normal case should be the system will release the memory after it finish the response of the request, am I right? |
I think this was due to a bug in mlx that has now been fixed. Here ths original issue in Apple MLX repo |
Thanks guys for your great share of this repo.
I try to use llama 3.1 with tools for graphrag on my MacBook Pro M3 Max 128GB,though Ollama support this model, but I found the entity extraction result is very strange.
Fortunately, fastmlx works quite fine with llama 3.1 for graphrag 0.2.1 (I use this version) except the memory consumption growth as time goes by.
I am not sure it's a memory leak or not.
I download a novel from website, and feed it into graphrag, the file size is less than 200KB.
Hope to get some support here.
The text was updated successfully, but these errors were encountered: