Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: Docker 镜像构建时并没有调用warm_up_vectordb预热nltk.download("punkt") #1971

Open
awwaawwa opened this issue Sep 20, 2024 · 2 comments

Comments

@awwaawwa
Copy link
Contributor

Installation Method | 安装方法与平台

Others (Please Describe)

Version | 版本

Latest | 最新版

OS | 操作系统

Docker

Describe the bug | 简述

类似 docs/GithubAction+NoLocal+Latex: RUN python3 -c 'from check_proxy import warm_up_modules; warm_up_modules()'

Screen Shot | 有帮助的截图

网络不太好的话,运行docker的时候会卡在这里
CleanShot 2024-09-20 at 12 50 46@2x

Terminal Traceback & Material to Help Reproduce Bugs | 终端traceback(如有) + 帮助我们复现的测试材料样本(如有)

No response

@awwaawwa
Copy link
Contributor Author

把那句话改成以下这句应该能解决

RUN python3  -c 'from check_proxy import warm_up_modules, warm_up_vectordb; warm_up_modules(); warm_up_vectordb();'

@hongyi-zhao
Copy link
Collaborator

hongyi-zhao commented Sep 20, 2024

对于直接源码运行的情况,我最终采用了下面的方法:

$ proxychains-ng-socks5 python -c "import nltk, os; nltk.download('punkt', download_dir=os.path.expanduser('~') + '/.pyenv/versions/gpt_academic/lib/python3.11/site-packages/llama_index/core/_static/nltk_cache/')"

Or, use python as follows:

In [1]: import nltk
   ...: import os
   ...: 
   ...: # 设置代理
   ...: proxy_url = 'http://127.0.0.1:8080'
   ...: os.environ['HTTP_PROXY'] = proxy_url
   ...: os.environ['HTTPS_PROXY'] = proxy_url
   ...: 
   ...: # 设置下载目录
   ...: home = os.path.expanduser('~')
   ...: download_dir = f"{home}/.pyenv/versions/gpt_academic/lib/python3.11/site-packages/llama_index/core/_static/nltk_cache/"
   ...: 
   ...: # 确保下载目录存在
   ...: os.makedirs(download_dir, exist_ok=True)
   ...: 
   ...: # 下载 'punkt' 数据包
   ...: nltk.download('punkt', download_dir=download_dir, quiet=False)
   ...: 
   ...: print(f"NLTK data downloaded to {download_dir}")
[nltk_data] Downloading package punkt to /home/werner/.pyenv/versions/
[nltk_data]     gpt_academic/lib/python3.11/site-
[nltk_data]     packages/llama_index/core/_static/nltk_cache/...
[nltk_data]   Unzipping tokenizers/punkt.zip.
NLTK data downloaded to /home/werner/.pyenv/versions/gpt_academic/lib/python3.11/site-packages/llama_index/core/_static/nltk_cache/

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants