Skip to content
This repository has been archived by the owner on May 28, 2024. It is now read-only.

Error when serve run #144

Open
andakai opened this issue Mar 25, 2024 · 1 comment
Open

Error when serve run #144

andakai opened this issue Mar 25, 2024 · 1 comment

Comments

@andakai
Copy link

andakai commented Mar 25, 2024

I build the image and run the container.

docker run -d -it --gpus all --shm-size 1g -p 8000:8000 -e HF_HOME=~/data -v $cache_dir:/home/ray/data anyscale/ray-llm:latest

But in the container, when I run the command:

serve run ~/serve_configs/amazon--LightGPT.yaml

The error is :

2024-03-25 02:40:33,460 INFO scripts.py:411 -- Running config file: '/home/ray/serve_configs/amazon--LightGPT.yaml'.
2024-03-25 02:40:35,709 WARNING services.py:1996 -- WARNING: The object store is using /tmp instead of /dev/shm because /dev/shm has only 1073741824 bytes available. This will harm performance! You may be able to free up space by deleting files in /dev/shm. If you are inside a Docker container, you can increase /dev/shm size by passing '--shm-size=10.24gb' to 'docker run' (or add it to the run_options list in a Ray cluster config). Make sure to set this to more than 30% of available RAM.
2024-03-25 02:40:36,866 INFO worker.py:1715 -- Started a local Ray instance. View the dashboard at 127.0.0.1:8265 
(ServeController pid=22583) WARNING 2024-03-25 02:40:39,253 controller 22583 logging_utils.py:247 - 'RAY_SERVE_ENABLE_JSON_LOGGING' is deprecated, please use 'LoggingConfig' to enable json format.
(ProxyActor pid=22664) WARNING 2024-03-25 02:40:40,795 proxy 172.17.0.2 logging_utils.py:247 - 'RAY_SERVE_ENABLE_JSON_LOGGING' is deprecated, please use 'LoggingConfig' to enable json format.
(ProxyActor pid=22664) INFO 2024-03-25 02:40:40,795 proxy 172.17.0.2 proxy.py:1141 - Proxy actor 28469263dc5907e200fa9fe201000000 starting on node 87617dc5750c5f36331a1ea5935a849259fee4d4a42262c695d9e0ca.
(ProxyActor pid=22664) INFO 2024-03-25 02:40:40,801 proxy 172.17.0.2 proxy.py:1346 - Starting HTTP server on node: 87617dc5750c5f36331a1ea5935a849259fee4d4a42262c695d9e0ca listening on port 8000
(ProxyActor pid=22664) INFO:     Started server process [22664]
(ProxyActor pid=22664) WARNING 2024-03-25 02:40:40,824 proxy 172.17.0.2 logging_utils.py:247 - 'RAY_SERVE_ENABLE_JSON_LOGGING' is deprecated, please use 'LoggingConfig' to enable json format.
2024-03-25 02:40:40,848 SUCC scripts.py:480 -- Submitted deploy config successfully.
(ServeController pid=22583) INFO 2024-03-25 02:40:40,841 controller 22583 application_state.py:414 - Building application 'ray-llm'.
(build_serve_application pid=21125) There was a problem when trying to write in your cache folder (/home/adk/data/hub). You should set the environment variable TRANSFORMERS_CACHE to a writable directory.
(ServeController pid=22583) WARNING 2024-03-25 02:40:48,038 controller 22583 application_state.py:742 - Deploying app 'ray-llm' failed with exception:
(ServeController pid=22583) Traceback (most recent call last):
(ServeController pid=22583)   File "/home/ray/anaconda3/lib/python3.9/site-packages/ray/serve/_private/application_state.py", line 994, in build_serve_application
(ServeController pid=22583)     app = call_app_builder_with_args_if_necessary(import_attr(import_path), args)
(ServeController pid=22583)   File "/home/ray/anaconda3/lib/python3.9/site-packages/ray/_private/utils.py", line 1182, in import_attr
(ServeController pid=22583)     module = importlib.import_module(module_name)
(ServeController pid=22583)   File "/home/ray/anaconda3/lib/python3.9/importlib/__init__.py", line 127, in import_module
(ServeController pid=22583)     return _bootstrap._gcd_import(name[level:], package, level)
(ServeController pid=22583)   File "<frozen importlib._bootstrap>", line 1030, in _gcd_import
(ServeController pid=22583)   File "<frozen importlib._bootstrap>", line 1007, in _find_and_load
(ServeController pid=22583)   File "<frozen importlib._bootstrap>", line 972, in _find_and_load_unlocked
(ServeController pid=22583)   File "<frozen importlib._bootstrap>", line 228, in _call_with_frames_removed
(ServeController pid=22583)   File "<frozen importlib._bootstrap>", line 1030, in _gcd_import
(ServeController pid=22583)   File "<frozen importlib._bootstrap>", line 1007, in _find_and_load
(ServeController pid=22583)   File "<frozen importlib._bootstrap>", line 986, in _find_and_load_unlocked
(ServeController pid=22583)   File "<frozen importlib._bootstrap>", line 680, in _load_unlocked
(ServeController pid=22583)   File "<frozen importlib._bootstrap_external>", line 850, in exec_module
(ServeController pid=22583)   File "<frozen importlib._bootstrap>", line 228, in _call_with_frames_removed
(ServeController pid=22583)   File "/home/ray/anaconda3/lib/python3.9/site-packages/rayllm/__init__.py", line 1, in <module>
(ServeController pid=22583)     from rayllm.backend.observability.tracing import setup_tracing
(ServeController pid=22583)   File "/home/ray/anaconda3/lib/python3.9/site-packages/rayllm/backend/__init__.py", line 1, in <module>
(ServeController pid=22583)     from rayllm.backend.server.run import router_application
(ServeController pid=22583)   File "/home/ray/anaconda3/lib/python3.9/site-packages/rayllm/backend/server/run.py", line 10, in <module>
(ServeController pid=22583)     from rayllm.backend.llm.vllm.vllm_engine import VLLMEngine
(ServeController pid=22583)   File "/home/ray/anaconda3/lib/python3.9/site-packages/rayllm/backend/llm/vllm/vllm_engine.py", line 15, in <module>
(ServeController pid=22583)     from rayllm.backend.llm.vllm.vllm_compatibility import AviaryAsyncLLMEngine
(ServeController pid=22583)   File "/home/ray/anaconda3/lib/python3.9/site-packages/rayllm/backend/llm/vllm/vllm_compatibility.py", line 31, in <module>
(ServeController pid=22583)     init_hf_modules()
(ServeController pid=22583)   File "/home/ray/anaconda3/lib/python3.9/site-packages/transformers/dynamic_module_utils.py", line 52, in init_hf_modules
(ServeController pid=22583)     os.makedirs(HF_MODULES_CACHE, exist_ok=True)
(ServeController pid=22583)   File "/home/ray/anaconda3/lib/python3.9/os.py", line 215, in makedirs
(ServeController pid=22583)     makedirs(head, exist_ok=exist_ok)
(ServeController pid=22583)   File "/home/ray/anaconda3/lib/python3.9/os.py", line 215, in makedirs
(ServeController pid=22583)     makedirs(head, exist_ok=exist_ok)
(ServeController pid=22583)   File "/home/ray/anaconda3/lib/python3.9/os.py", line 225, in makedirs
(ServeController pid=22583)     mkdir(name, mode)
(ServeController pid=22583) PermissionError: [Errno 13] Permission denied: '/home/adk'
(ServeController pid=22583) 
(build_serve_application pid=21125) [8b5cff370a16:21125] [[48252,1],0] ORTE_ERROR_LOG: Unreachable in file runtime/ompi_mpi_finalize.c at line 262
@kunalchamoli
Copy link

Hello @darrenglow, I faced the same issue just change ~/data to /data in docker run command. This issue will be resolved.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants