Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unable to build Docker #176

Open
eladrave opened this issue Dec 10, 2024 · 2 comments
Open

Unable to build Docker #176

eladrave opened this issue Dec 10, 2024 · 2 comments

Comments

@eladrave
Copy link

I get an error when it get to this:

RUN playwright install --with-deps && \
    python -c "from unstructured.nlp.tokenize import download_nltk_packages; download_nltk_packages()" && \
    python -c "import nltk;nltk.download('punkt_tab'); nltk.download('averaged_perceptron_tagger_eng')" && \
    python -c "from unstructured.partition.model_init import initialize; initialize()"

Here is the error:

66.71 Webkit 18.0 (playwright build v2083) downloaded to /root/.cache/ms-playwright/webkit-2083
69.38 Traceback (most recent call last):
69.38   File "<string>", line 1, in <module>
69.38   File "/usr/local/lib/python3.11/site-packages/unstructured/nlp/tokenize.py", line 88, in download_nltk_packages
69.38     urllib.request.urlretrieve(NLTK_DATA_URL, tgz_file_path)
69.38   File "/usr/local/lib/python3.11/urllib/request.py", line 241, in urlretrieve
69.39     with contextlib.closing(urlopen(url, data)) as fp:
69.39                             ^^^^^^^^^^^^^^^^^^
69.39   File "/usr/local/lib/python3.11/urllib/request.py", line 216, in urlopen
69.39     return opener.open(url, data, timeout)
69.39            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
69.39   File "/usr/local/lib/python3.11/urllib/request.py", line 525, in open
69.39     response = meth(req, response)
69.39                ^^^^^^^^^^^^^^^^^^^
69.39   File "/usr/local/lib/python3.11/urllib/request.py", line 634, in http_response
69.39     response = self.parent.error(
69.39                ^^^^^^^^^^^^^^^^^^
69.39   File "/usr/local/lib/python3.11/urllib/request.py", line 563, in error
69.39     return self._call_chain(*args)
69.39            ^^^^^^^^^^^^^^^^^^^^^^^
69.39   File "/usr/local/lib/python3.11/urllib/request.py", line 496, in _call_chain
69.39     result = func(*args)
69.39              ^^^^^^^^^^^
69.39   File "/usr/local/lib/python3.11/urllib/request.py", line 643, in http_error_default
69.39     raise HTTPError(req.full_url, code, msg, hdrs, fp)
69.39 urllib.error.HTTPError: HTTP Error 403: Forbidden
------
failed to solve: process "/bin/sh -c playwright install --with-deps &&     python -c \"from unstructured.nlp.tokenize import download_nltk_packages; download_nltk_packages()\" &&     python -c \"import nltk;nltk.download('punkt_tab'); nltk.download('averaged_perceptron_tagger_eng')\" &&     python -c \"from unstructured.partition.model_init import initialize; initialize()\"" did not complete successfully: exit code: 1
@lmpentland
Copy link

had this problem all day yesterday, and went away this morning. The only things I've done between then and now are:

downloaded and installed node + dependencies on the host machine (chocolatey etc, all included with node),
added C:/...../megaparse/src to PYTHONPATH env variable,
a restart,
and included my username in the docker build like
docker build -t usernamehere/megaparse .

One of those things seem to fix it. or at least allowed me to build the image. I'm still getting an error, but now a ModuleNotFound error about not being able to find megaparse.api

@StanGirard
Copy link
Contributor

Unstructured changed something in the way they download models. In the latest version we fixed that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants