Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: DeepDoc not running on sever without GPU #4664

Open
1 task done
senovr opened this issue Jan 27, 2025 · 4 comments
Open
1 task done

[Bug]: DeepDoc not running on sever without GPU #4664

senovr opened this issue Jan 27, 2025 · 4 comments
Labels
bug Something isn't working

Comments

@senovr
Copy link

senovr commented Jan 27, 2025

Is there an existing issue for the same bug?

  • I have checked the existing issues.

RAGFlow workspace code commit ID

System v0.15.1-176-gd970d0ef slim-nightly

RAGFlow image version

gd970d0e

Other environment information

OS: Ubuntu 22.04
No GPU

Actual behavior

It seems for me that after recent update in DeepDoc (enable GPU usage), now it is requires that GPU is present in the system.
I depoloyed RagFlow on a server without GPU, and when trying to parse .pdf file, getting the following traceback:

Expected behavior

.pdf can be parsed on machine both with and without GPU.

Steps to reproduce

1. Deploy RagFlow on server without GPU.
2. Start parsing any pdf - using "general" chunking.
3. Get traceback

Additional information

2025-01-27 18:40:33,188 INFO     31 TextRecognizer det uses GPU
2025-01-27 18:40:33,188 INFO     31 task_consumer_0 reported heartbeat: {"name": "task_consumer_0", "now": "2025-01-27T18:40:33.182+08:00", "boot_at": "2025-01-27T18:40:02.636+08:00", "pending": 16, "lag": 42, "done": 0, "failed": 0, "current": {"id": "20e0cab6dc9b11efb7e00242ac120006", "doc_id": "8afa953cdc9411ef9fc90242ac120006", "from_page": 156, "to_page": 168, "retry_count": 0, "kb_id": "4b58c5d4dc9411efafa10242ac120006", "parser_id": "naive", "parser_config": {"auto_keywords": 3, "auto_questions": 3, "raptor": {"use_raptor": true, "prompt": "Please summarize the following paragraphs. Be careful with the numbers, do not make things up. Paragraphs as following:\n      {cluster_content}\nThe above is the content you need to summarize.", "max_token": 1024, "threshold": 0.1, "max_cluster": 128, "random_seed": 0}, "graphrag": {"use_graphrag": false}, "chunk_token_num": 256, "delimiter": "\\n!?;\u3002\uff1b\uff01\uff1f", "layout_recognize": "DeepDOC", "html4excel": false}, "name": "testfile.pdf", "type": "pdf", "location": "testfile.pdf", "size": 9166052, "tenant_id": "f5ba73a0c6ab11ef93290242ac120006", "language": "English", "embd_id": "hellord/e5-mistral-7b-instruct:Q4_0@Ollama", "pagerank": 0, "kb_parser_config": {"auto_keywords": 3, "auto_questions": 3, "raptor": {"use_raptor": true, "prompt": "Please summarize the following paragraphs. Be careful with the numbers, do not make things up. Paragraphs as following:\n      {cluster_content}\nThe above is the content you need to summarize.", "max_token": 1024, "threshold": 0.1, "max_cluster": 128, "random_seed": 0}, "graphrag": {"use_graphrag": false}, "chunk_token_num": 256, "delimiter": "\\n!?;\u3002\uff1b\uff01\uff1f", "layout_recognize": "DeepDOC", "html4excel": false}, "img2txt_id": "llama3.2-vision@Ollama", "asr_id": "", "llm_id": "mistral-nemo:latest@Ollama", "update_time": 1737974430175, "task_type": ""}}
2025-01-27 18:40:33,845 INFO     31 TextRecognizer rec uses GPU
2025-01-27 18:40:34,487 INFO     31 Recognizer layout uses GPU
2025-01-27 18:40:34,533 INFO     31 Recognizer tsr uses GPU
2025-01-27 18:40:34,545 INFO     31 set_progress(20e0cab6dc9b11efb7e00242ac120006), progress: None, progress_msg: 18:40:34 Page(157~169): OCR started
2025-01-27 18:40:39,565 DEBUG    31 Images converted.
2025-01-27 18:40:54,751 INFO     31 set_progress(20e0cab6dc9b11efb7e00242ac120006), progress: -1, progress_msg: 18:40:54 Page(157~169): [ERROR]Internal server error while chunking: [ONNXRuntimeError] : 2 : INVALID_ARGUMENT : Did not find an arena based allocator registered for device-id  combination in the memory arena shrink list: gpu:0
2025-01-27 18:40:54,757 ERROR    31 Chunking testfile.pdf got exception
Traceback (most recent call last):
  File "/ragflow/rag/svr/task_executor.py", line 218, in build_chunks
    cks = chunker.chunk(task["name"], binary=binary, from_page=task["from_page"],
  File "/ragflow/rag/app/naive.py", line 237, in chunk
    sections, tables = pdf_parser(filename if not binary else binary, from_page=from_page, to_page=to_page,
  File "/ragflow/rag/app/naive.py", line 133, in __call__
    self.__images__(
  File "/ragflow/deepdoc/parser/pdf_parser.py", line 1018, in __images__
    self.__ocr(i + 1, img, chars, zoomin*2)
  File "/ragflow/deepdoc/parser/pdf_parser.py", line 280, in __ocr
    bxs = self.ocr.detect(np.array(img))
  File "/ragflow/deepdoc/vision/ocr.py", line 583, in detect
    dt_boxes, elapse = self.text_detector(img)
  File "/ragflow/deepdoc/vision/ocr.py", line 479, in __call__
    raise e
  File "/ragflow/deepdoc/vision/ocr.py", line 475, in __call__
    outputs = self.predictor.run(None, input_dict, self.run_options)
  File "/ragflow/.venv/lib/python3.10/site-packages/onnxruntime/capi/onnxruntime_inference_collection.py", line 220, in run
    return self._sess.run(output_names, input_feed, run_options)
onnxruntime.capi.onnxruntime_pybind11_state.InvalidArgument: [ONNXRuntimeError] : 2 : INVALID_ARGUMENT : Did not find an arena based allocator registered for device-id  combination in the memory arena shrink list: gpu:0
@senovr senovr added the bug Something isn't working label Jan 27, 2025
@KevinHuSh
Copy link
Collaborator

Check the docker compose file no GPU configurations there, aren't there?

@senovr
Copy link
Author

senovr commented Jan 29, 2025

No, apparently.
I am using slightly tweaked verisions of yamls, but main tweaks related to volumes mount and upgrade versions of minio/elastic I am using.

Can it be due to adding this row in pyproject.toml (and apparently uv.lock)?
#4643

    "onnxruntime==1.19.2; sys_platform == 'darwin' or platform_machine != 'x86_64'",
    "onnxruntime-gpu==1.19.2; sys_platform != 'darwin' and platform_machine == 'x86_64'",

docker-compose.yml:

include:
  - ./docker-compose-base.yml

services:
  ragflow:
    depends_on:
      mysql:
        condition: service_healthy
    image: ${RAGFLOW_IMAGE}
    container_name: ragflow-server
    ports:
      - ${SVR_HTTP_PORT}:9380
      - 80:80
      - 443:443
    volumes:
      - ./ragflow-logs:/ragflow/logs
      - ./nginx/ragflow.conf:/etc/nginx/conf.d/ragflow.conf
      - ./nginx/proxy.conf:/etc/nginx/proxy.conf
      - ./nginx/nginx.conf:/etc/nginx/nginx.conf
    env_file: .env
    environment:
      - TZ=${TIMEZONE}
      - HF_ENDPOINT=${HF_ENDPOINT}
      - MACOS=${MACOS}
    networks:
      - ragflow
    restart: on-failure
    # https://docs.docker.com/engine/daemon/prometheus/#create-a-prometheus-configuration
    # If you're using Docker Desktop, the --add-host flag is optional. This flag makes sure that the host's internal IP gets exposed to the Prometheus container.
    extra_hosts:
      - "host.docker.internal:host-gateway"

docker-compose-base.yml:

services:
  es01:
    container_name: ragflow-es-01
    profiles:
      - elasticsearch
    image: elasticsearch:${STACK_VERSION}
    volumes:
      - ./esdata01:/usr/share/elasticsearch/data
    ports:
      - ${ES_PORT}:9200
    env_file: .env
    environment:
      - node.name=es01
      - ELASTIC_PASSWORD=${ELASTIC_PASSWORD}
      - bootstrap.memory_lock=false
      - discovery.type=single-node
      - xpack.security.enabled=true
      - xpack.security.http.ssl.enabled=false
      - xpack.security.transport.ssl.enabled=false
      - cluster.routing.allocation.disk.watermark.low=5gb
      - cluster.routing.allocation.disk.watermark.high=3gb
      - cluster.routing.allocation.disk.watermark.flood_stage=2gb
      - TZ=${TIMEZONE}
    mem_limit: ${MEM_LIMIT}
    ulimits:
      memlock:
        soft: -1
        hard: -1
    healthcheck:
      test: ["CMD-SHELL", "curl http://localhost:9200"]
      interval: 10s
      timeout: 10s
      retries: 120
    networks:
      - ragflow
    restart: on-failure

  infinity:
    container_name: ragflow-infinity
    profiles:
      - infinity
    image: infiniflow/infinity:v0.6.0-dev2
    volumes:
      - ./infinity_data:/var/infinity
      - ./infinity_conf.toml:/infinity_conf.toml
    command: ["-f", "/infinity_conf.toml"]
    ports:
      - ${INFINITY_THRIFT_PORT}:23817
      - ${INFINITY_HTTP_PORT}:23820
      - ${INFINITY_PSQL_PORT}:5432
    env_file: .env
    environment:
      - TZ=${TIMEZONE}
    mem_limit: ${MEM_LIMIT}
    ulimits:
      nofile:
        soft: 500000
        hard: 500000
    networks:
      - ragflow
    healthcheck:
      test: ["CMD", "curl", "http://localhost:23820/admin/node/current"]
      interval: 10s
      timeout: 10s
      retries: 120
    restart: on-failure


  mysql:
    # mysql:5.7 linux/arm64 image is unavailable.
    image: mysql:8.0.39
    container_name: ragflow-mysql
    env_file: .env
    environment:
      - MYSQL_ROOT_PASSWORD=${MYSQL_PASSWORD}
      - TZ=${TIMEZONE}
    command:
      --max_connections=1000
      --character-set-server=utf8mb4
      --collation-server=utf8mb4_unicode_ci
      --default-authentication-plugin=mysql_native_password
      --tls_version="TLSv1.2,TLSv1.3"
      --init-file /data/application/init.sql
    ports:
      - ${MYSQL_PORT}:3306
    volumes:
      - ./mysql_data:/var/lib/mysql
      - ./init.sql:/data/application/init.sql
    networks:
      - ragflow
    healthcheck:
      test: ["CMD", "mysqladmin" ,"ping", "-uroot", "-p${MYSQL_PASSWORD}"]
      interval: 10s
      timeout: 10s
      retries: 3
    restart: on-failure

  minio:
    image: quay.io/minio/minio:RELEASE.2024-12-18T13-15-44Z
    container_name: ragflow-minio
    command: server --console-address ":9001" /data
    ports:
      - ${MINIO_PORT}:9000
      - ${MINIO_CONSOLE_PORT}:9001
    env_file: .env
    environment:
      - MINIO_ROOT_USER=${MINIO_USER}
      - MINIO_ROOT_PASSWORD=${MINIO_PASSWORD}
      - TZ=${TIMEZONE}
    volumes:
      - ./minio_data:/data
    networks:
      - ragflow
    restart: on-failure

  redis:
    image: valkey/valkey:8
    container_name: ragflow-redis
    command: redis-server --requirepass ${REDIS_PASSWORD} --maxmemory 128mb --maxmemory-policy allkeys-lru
    env_file: .env
    ports:
      - ${REDIS_PORT}:6379
    volumes:
      - ./redis_data:/data
    networks:
      - ragflow
    restart: on-failure



volumes:
  esdata01:
    driver: local
  infinity_data:
    driver: local
  mysql_data:
    driver: local
  minio_data:
    driver: local
  redis_data:
    driver: local

networks:
  ragflow:
    driver: bridge

@senovr
Copy link
Author

senovr commented Jan 29, 2025

Even after completely clean install error message still persists.
By clean install - i mean docker system purge, and re-building/ re-downloading all required images.
I will try to investigate further a bit later, let you know if find out the issue.

@senovr
Copy link
Author

senovr commented Jan 29, 2025

@KevinHuSh
I can confirm that this is due to onnxruntime-gpu.
I created .venv with uv sync, and initial version of onnxruntime was onnxruntime-gpu==1.19.2
Code:
python deepdoc/vision/t_recognizer.py --inputs=test_pdf_file/input/testfile.pdf --threshold=0.2 --mode=tsr --output_dir=test_pdf_file/output/
failing with traceback - the same as message above, additionally asking for CUDA (which is not present on non-GPU server)

After I manually installed non-GPU version of onnxruntime, and after fixing few more small issues (like - absence of dictionaries in nltk), command above run with no errors.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants