Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Getting connection aborted message when processing pdf files #68

Open
minump opened this issue Nov 7, 2023 · 0 comments
Open

Getting connection aborted message when processing pdf files #68

minump opened this issue Nov 7, 2023 · 0 comments
Assignees
Labels
bug Something isn't working

Comments

@minump
Copy link
Collaborator

minump commented Nov 7, 2023

When uploading a pdf file, the pdf2text-extractor kicks in, after "submission for extraction", multiple "Connection aborted" messages are being logged.

The file is again submitted for extraction and the cycle repeats. This is noticed in the local instance when extracting large pdf files. Not able to check this out in consort instance because of radiant volume detaching issue.

Traceback (most recent call last):

  File "/usr/local/lib/python3.10/site-packages/pyclowder/connectors.py", line 434, in _process_message

    self.process_message(self, source_host, secret_key, resource, body)

  File "//pdf2text.py", line 56, in process_message

    output_xml_file, output_json_file, output_txt_file = process_pdf_file(input_file, input_filename, temp_dir, output_dir)

  File "/doc2txt/grobid2json/process_pdf.py", line 74, in process_pdf_file

    client.process_pdf(input_file, input_filename, temp_dir, "processFulltextDocument")

  File "/doc2txt/grobid2json/grobid/grobid_client.py", line 154, in process_pdf

    tei_text = self.process_pdf_stream(input_filename, pdf_strm, output, service)

  File "/doc2txt/grobid2json/grobid/grobid_client.py", line 125, in process_pdf_stream

    res, status = self.post(

  File "/doc2txt/grobid2json/grobid/client.py", line 205, in post

    return self.call_api(

  File "/doc2txt/grobid2json/grobid/client.py", line 126, in call_api

    r = requests.request(

  File "/usr/local/lib/python3.10/site-packages/requests/api.py", line 60, in request

    return session.request(method=method, url=url, **kwargs)

  File "/usr/local/lib/python3.10/site-packages/requests/sessions.py", line 533, in request

    resp = self.send(prep, **send_kwargs)

  File "/usr/local/lib/python3.10/site-packages/requests/sessions.py", line 646, in send

    r = adapter.send(request, **kwargs)

  File "/usr/local/lib/python3.10/site-packages/requests/adapters.py", line 498, in send

    raise ConnectionError(err, request=request)

requests.exceptions.ConnectionError: ('Connection aborted.', RemoteDisconnected('Remote end closed connection without response'))

2023-11-07 15:56:34,679 [Thread-15 (_process_message)] INFO    : pyclowder.connectors - [654a5da2e4b051a0ae4d5257] : StatusMessage.error: ('Connection aborted.', RemoteDisconnected('Remote end closed connection without response'))
Screenshot 2023-11-07 at 9 59 28 AM
@minump minump self-assigned this Nov 7, 2023
@minump minump added the bug Something isn't working label Nov 7, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant