Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Timeout Error while loading /pull llava in the chatbar #33

Open
Paramjethwa opened this issue Sep 20, 2024 · 23 comments
Open

Timeout Error while loading /pull llava in the chatbar #33

Paramjethwa opened this issue Sep 20, 2024 · 23 comments

Comments

@Paramjethwa
Copy link

Paramjethwa commented Sep 20, 2024

this is what i got in the streamlit app after 5 minutes of waiting after doing /pull llava

TimeoutError

Traceback:
File "/usr/local/lib/python3.10/site-packages/streamlit/runtime/scriptrunner/exec_code.py", line 88, in exec_func_with_error_handling
result = func()
File "/usr/local/lib/python3.10/site-packages/streamlit/runtime/scriptrunner/script_runner.py", line 590, in code_to_exec
exec(code, module.dict)
File "/app/app.py", line 171, in
main()
File "/app/app.py", line 123, in main
response = command(user_input)
File "/app/utils.py", line 15, in command
return pull_model_in_background(splitted_input[1])
File "/app/utils.py", line 73, in pull_model_in_background
return asyncio.run(pull_ollama_model_async(model_name, stream=stream))
File "/usr/local/lib/python3.10/asyncio/runners.py", line 44, in run
return loop.run_until_complete(main)
File "/usr/local/lib/python3.10/asyncio/base_events.py", line 649, in run_until_complete
return future.result()
File "/app/utils.py", line 39, in pull_ollama_model_async
async with session.post(url, json=json_data) as response:
File "/usr/local/lib/python3.10/site-packages/aiohttp/client.py", line 1353, in aenter
self._resp = await self._coro
File "/usr/local/lib/python3.10/site-packages/aiohttp/client.py", line 684, in _request
await resp.start(conn)
File "/usr/local/lib/python3.10/site-packages/aiohttp/client_reqrep.py", line 994, in start
with self._timer:
File "/usr/local/lib/python3.10/site-packages/aiohttp/helpers.py", line 713, in exit
raise asyncio.TimeoutError from None

/pull nomic-embed-text loads up pretty quick but the problem with that is when i upload an pdf file it is stuck on processing pdf... and now responsing anything even i ask in chatbar

@Leon-Sander
Copy link
Owner

It seems that you solved the problems from the other two issues you created, can you write the solution under them please and close the issue?

For your issue, it seems to be an ollama related problem, so looking into their issues might help. Also how many pages does your pdf file have?

@Paramjethwa
Copy link
Author

Paramjethwa commented Sep 22, 2024

yes i have updated the solution for previous issues

my pdf size was 1.5 mb only and when i try to /pull llava model it is showing processing and then timeout error.

/pull nomic-embed-text loads up pretty quick but the problem with that is when i upload an pdf file it is stuck on processing pdf... and not responsing anything even i ask in chatbar

@Leon-Sander
Copy link
Owner

I have updated the code yesterday to track timings, can you insert the pdf that is inside the pdf folder of this repository and then paste the command line output here so I can see whats happening and how long it takes?

@Paramjethwa
Copy link
Author

ok so i did update the code with the latest commit you have done.

streamlit app.py in my browser, /pull nomic-embed-text
upload the same pdf present in the repository( hover.pdf)

but still same issue of connectionerror : timeout

This is what i got in my browser :
ERROR FILE 1.txt

and this is what i got in my Ubuntu terminal :
ERROR FILE 2.txt

@Leon-Sander
Copy link
Owner

Leon-Sander commented Sep 22, 2024

In another issue you wrote that you changed the ollama url, did you also change it in the vectordb_handler.py file?

Also sidenote, you dont need to pull the same model over and over, after pulling it once, you have it locally. Also ollama did not recognize your gpu, make sure that you have the latest version of the code and wsl set up if you work on windows.

@Paramjethwa
Copy link
Author

Paramjethwa commented Sep 22, 2024

i have solved the timeout error and will upload the entire solution tomorrow in a proper format

I've encountered a different issue where the /pull command for 'nomic-embed-text' completes successfully, and the PDF loads without any problems. However, 'nomic-embed-text' does not appear in the model selection section, while other random models I've pulled show up there, allowing me to choose them for chatting or asking questions about images.

Image for reference llava-phi3 is working fine for chat and image both
image

Also, how can i unpull the model which i dont want to use and where it is getting stored locally, i cant find the model in my project directory?

@Paramjethwa
Copy link
Author

Paramjethwa commented Sep 22, 2024

In another issue you wrote that you changed the ollama url, did you also change it in the vectordb_handler.py file?

Also sidenote, you dont need to pull the same model over and over, after pulling it once, you have it locally. Also ollama did not recognize your gpu, make sure that you have the latest version of the code and wsl set up if you work on windows.

yes i did changed it in the vectordb file as well but now with the latest repo it works with ollama URL too, yeah i dont know why it is not recognizing the GPU i do have WSL2 (Ubuntu) but even with the CPU only pdf loads take around 30 second now and response tiem is 10-15 second

@Leon-Sander
Copy link
Owner

Leon-Sander commented Sep 22, 2024

However, 'nomic-embed-text' does not appear in the model selection section

I build a filter to not show models that have "embed" in their name, because you cant chat with embedding models, therefore it makes no sense to have them in the model selection. Also embeddings usually are chosen once and then being used continuously. You can define which embedding model to use in the config.yaml file

@Paramjethwa
Copy link
Author

Paramjethwa commented Sep 22, 2024

However, 'nomic-embed-text' does not appear in the model selection section

I build a filter to not show models that have "embed" in their name, because you cant chat with embedding models, therefore it makes no sense to have them in the model selection. Also embeddings usually are chosen once and then being used continuously. You can define which embedding model to use in the config.yaml file

oh okay make sense now so to summarize/chat with the pdf which model is being used after embedding the pdf?

Also, so to unpull i have to delete the model from .ollama directory which i dont want to use

@Leon-Sander
Copy link
Owner

Leon-Sander commented Sep 22, 2024

Y you would need to delete it from the directory you defined, but its a but complicated since they are not named based on their model names.
The model used for the chat is the model you have chosen in the Select a Model dropdown.

Also a sidenote, the code is not build in a way to directly summarize the pdf, its a RAG approach where three text snippets get retrieved from the vector database, which are most similar to your query and will be given to the llm as context to answer your question.

@Leon-Sander
Copy link
Owner

Leon-Sander commented Sep 23, 2024

Okay so I switched to windows and tested it out myself. Llama3.1 would not load on gpu for me, I also got a timeout error. But when loading smaller models it worked, but took a long time. For example 5 minutes to load llava on gpu while it took only 5 seconds on linux. Without gpu support it worked tho.

Also this is an ollama related issue, you might want look for answers over there: ollama/ollama#4427

@Paramjethwa
Copy link
Author

okay sure ill check out the ollama thread for solution for gpu usage

also, once you upload the pdf using nomic embed text model, which model do you use to chat with pdf?

also can you suggest similar kind of model such as llava but smaller in size so that we dont get a timeout error for it ?

@Leon-Sander
Copy link
Owner

You can choose what ever model you want for chatting. Look into the ollama library and select and try what fits your system best https://ollama.com/library
For example
gemma2:2b
qwen2.5:3b

@Paramjethwa
Copy link
Author

so here is the solution for http connection error or connection timeout error

what i did is change the permission setting for .ollama folder by going into properties, security tab and changing the user (your laptop) setting to full control so that it can remove the unnecessary blobs files automatically which is present in the .ollama/model folder

after doing this restart your pc and the http error should be gone

although whenever i load/pull bigger model (3GB to 5GB) i get timeout error but work fine with small model

ok so i did update the code with the latest commit you have done.

streamlit app.py in my browser, /pull nomic-embed-text upload the same pdf present in the repository( hover.pdf)

but still same issue of connectionerror : timeout

This is what i got in my browser : ERROR FILE 1.txt

and this is what i got in my Ubuntu terminal : ERROR FILE 2.txt

@Paramjethwa
Copy link
Author

Paramjethwa commented Sep 23, 2024

You can choose what ever model you want for chatting. Look into the ollama library and select and try what fits your system best https://ollama.com/library For example gemma2:2b qwen2.5:3b

so i tried to pull this model i was having timeout error but then i plugged in the charger, set the system into performance/turbo mode and it loads the model successfully.

although I still cannot load the llava:7b model ends up with timeout error every time i guess it's because of its 4-5GB size while the smaller model loads perfectly

Edit: should I download the llava model manually and put it in the .ollama/model folder, so will it be available in the dropdown select a model section in the stream lit app?

@Paramjethwa
Copy link
Author

Paramjethwa commented Sep 23, 2024

AUDIO ERROR i am getting this when i record an audio from streamlitapp.

LibsndfileError: Error opening <_io.BytesIO object at 0x7f71cd288f40>: Format not recognised.

i feel like the audio which is recording is not in format of Wav or something similar acceptable format hence this error

Audioerror.txt

Edit: although when i try to upload the audio file (ogg format) and summarize it, it works fine but takes few minute to load and process while when i record directly from the record audio button the above error occurs

@Leon-Sander
Copy link
Owner

Leon-Sander commented Sep 23, 2024

@Paramjethwa Just pushed a fix for the audio error and the timeout error. Make sure to rebuild the docker image for the app.
Can you test it out and give feedback please if it fixed the problem?

@Paramjethwa
Copy link
Author

@Leon-Sander so i have tested the commit and at first the audio didnt work but later as i start the normal chat with keyboard and then used record audio it Worked perfectly fine

And about timeout Error of /pull llava it still exist tried few time and end up with the same error of asyncio.exceptions.TimeoutError
ERROR FILE TIMEOUT ERROR.txt

@Leon-Sander
Copy link
Owner

Leon-Sander commented Sep 25, 2024

Okay I think I got it now, it is an aiohttp error. I pushed a fix on a new branch, timout_error_fix. On line 53 in utils.py, you can set the time to wait, after which it will throw an timeout error, I set it to 30 minutes now. Can you please test if this fixes your problem, and adjust if you think you would need more time to download the model?

If this does not work, I got another workaround idea for you which will surely work.

@Paramjethwa
Copy link
Author

Hey Leon it took 15-20 minute to load the model and it got sucessfully completed this works perfectly fine now..
You are Amazing!!
I aspire to become a skilled problem solver like you!

Now the chat App is successfully running everything, but the only issue here is ollama is not detecting/using my GPU to load or response query although i am trying to find the solution in ollama GitHub issue thread hope i find it soon.

@Leon-Sander
Copy link
Owner

Thanks happy to hear that.

Do you have wsl enabled in docker?

@Paramjethwa
Copy link
Author

Paramjethwa commented Sep 26, 2024

Thanks happy to hear that.

Do you have wsl enabled in docker?

yes it is already enabled the WSL in docker desktop

@Paramjethwa
Copy link
Author

Hi Leon finally found the solution for GPU usage, what i did it

  1. Re-install the docker complete and removed all the tmp.json files ( as there was multiple temporary file causing conflicts) from the .docker folder
  2. Ran the docker desktop as an administrator.
  3. Make sure the Cuda driver(nvcc --version) and NVidia Smi is responding perfectly
  4. Checked and enable the WSL integration, also
  5. restarted the system
  6. rebuild the image: (docker compose up --Build)
  7. Finally, it is recognizing the GPU and perfectly using while running the project

Major credit and thanks to you Leon for helping me solve every single problem super grateful!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants