Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

preview randomly not working, again... #592

Open
info-sic opened this issue Apr 25, 2024 · 14 comments
Open

preview randomly not working, again... #592

info-sic opened this issue Apr 25, 2024 · 14 comments

Comments

@info-sic
Copy link

Aloha,
my pandora-instance gets rebootet every night and starts with systemd. The start-exec includes an poetry run update --yes

Because of unknown reasons preview fails randomly (see also #93)

2024-04-25 07:57:41,662 extractor INFO:[a0eef483-5267-44e5-881d-3ca9fb821bd7] Runtime: 0.22s
File: /home/pandora/pandora/tasks/2024/04/a0eef483-5267-44e5-881d-3ca9fb821bd7/pandoratestmacro.docx

Unencrypted xlsm file

[Loading Cells]
[Starting Deobfuscation]
[END of Deobfuscation]
time elapsed: 0.1948859691619873
2024-04-25 07:57:41,760 xmldeobfuscator INFO:[a0eef483-5267-44e5-881d-3ca9fb821bd7] Runtime: 0.26s
2024-04-25 07:57:41,776 yara_signature_base INFO:[a0eef483-5267-44e5-881d-3ca9fb821bd7] Runtime: 0.36s
2024-04-25 07:57:41,835 observables INFO:[a0eef483-5267-44e5-881d-3ca9fb821bd7] Runtime: 0.41s
2024-04-25 07:57:41,827 yarahq_full INFO:[a0eef483-5267-44e5-881d-3ca9fb821bd7] Runtime: 0.40s
2024-04-25 07:57:41,847 hashlookup INFO:[a0eef483-5267-44e5-881d-3ca9fb821bd7] Runtime: 0.39s
2024-04-25 07:57:44,059 comodo INFO:[a0eef483-5267-44e5-881d-3ca9fb821bd7] Runtime: 2.65s
2024-04-25 07:57:56,460 qrcode INFO:[a0eef483-5267-44e5-881d-3ca9fb821bd7] Runtime: 15.02s
2024-04-25 07:58:41,420 preview ERROR:
Traceback (most recent call last):
 File "/home/pandora/pandora/pandora/workers/preview.py", line 15, in analyse
   task.file.convert()
 File "/home/pandora/pandora/pandora/file.py", line 276, in convert
   self.libreoffice_client.convert(inpath=str(self.path), outpath=f'{self.path}.pdf')
 File "/home/pandora/.cache/pypoetry/virtualenvs/pandora-Y9WI11B0-py3.10/lib/python3.10/site-packages/unoserver/client.py", line 98, in convert
   result = proxy.convert(
 File "/usr/lib/python3.10/xmlrpc/client.py", line 1122, in __call__
   return self.__send(self.__name, args)
 File "/usr/lib/python3.10/xmlrpc/client.py", line 1464, in __request
   response = self.__transport.request(
 File "/usr/lib/python3.10/xmlrpc/client.py", line 1166, in request
   return self.single_request(host, handler, request_body, verbose)
 File "/usr/lib/python3.10/xmlrpc/client.py", line 1179, in single_request
   resp = http_conn.getresponse()
 File "/usr/lib/python3.10/http/client.py", line 1375, in getresponse
   response.begin()
 File "/usr/lib/python3.10/http/client.py", line 318, in begin
   version, status, reason = self._read_status()
 File "/usr/lib/python3.10/http/client.py", line 279, in _read_status
   line = str(self.fp.readline(_MAXLINE + 1), "iso-8859-1")
 File "/usr/lib/python3.10/socket.py", line 705, in readinto
   return self._sock.recv_into(b)
 File "/home/pandora/pandora/pandora/workers/base.py", line 102, in _raise_timeout
   raise TimeoutError
TimeoutError
2024-04-25 07:58:41,421 preview WARNING:Unable to generate preview, this is suspicious: 

Error disappears after systemctl stop/start pandora.
Any ideas for a workaround?
Manu

@info-sic
Copy link
Author

info-sic commented Apr 25, 2024

Update: I can reproduce the error with a multisheet xlsx file (that I can't publish). I can open this file with libreoffice and export it to pdf fine there. To me it looks like the timeout in pandora might be too short and that somehow crashes unoserver/libreoffice?
So one workaround might be, to check the status of the socket after an analysis, to see wether libreoffice is still reachable?
Just my 2ct.

@Rafiot
Copy link
Contributor

Rafiot commented Apr 25, 2024

Can you increase the timeout value in the settings of the preview worker?

@info-sic
Copy link
Author

info-sic commented Apr 25, 2024

Update I'll try. And I failed. I didn't find the timeout parameter for preview.py

I could reproduce the timeout with pandora.circl.lu 0d92c168-4c01-465c-8fd9-08b56f359abb You can check the file yourself.
I politely ask you to srm it afterwards and keep all information confidential.
Manu

@Rafiot
Copy link
Contributor

Rafiot commented Apr 25, 2024

Quick update on that: the file is huge (30+Mb), and generating the PDF in fact works. What fails is creating the images put of the exported PDF. I'm increasing the timeout until it work, but I'm not sure how practical it is.

@info-sic
Copy link
Author

That's worth a try. But anyways, the bigger problem is, that preview doesn't work at all for following uploads after it ever crashes. So a check and restart might be more important. But thx a lot, as always.
Manu

@Rafiot
Copy link
Contributor

Rafiot commented Apr 25, 2024

yeah, seems it causes libreoffice processes to get stuck.

@info-sic
Copy link
Author

I might get things completely wrong but does pandora first convert office files to pdf and than produces a png for the preview? LibreOffice is capable of exporting png directly. At least from the UI ... I have no clue wether that's scriptable, though.

@Rafiot
Copy link
Contributor

Rafiot commented Apr 25, 2024

The problem with that approach was (I need to try it again) that the filenames weren't something I could set myself so they couldn't easily be seen on the web interface without reprocessing them individually. And We want the PDF export anyway, so this approach was more efficient.

But yes, as of now, it breaks the preview generator, I'll investigate.

In the meantime, I got it to work by adding the following in pandora/workers/preview.yml

settings:
  cache: 1h
  timeout: 30m

@info-sic
Copy link
Author

The settings did not help with this specific xlsx, timed out after 30min with the same crash:

File "/usr/lib/python3.10/socket.py", line 705, in readinto                                             
   return self._sock.recv_into(b)                                                   
File "/home/pandora/pandora/pandora/workers/base.py", line 102, in _raise_timeout                                          
   raise TimeoutError  

while soffice.bin was stuck with 100%CPU, so I'm quite sure, the next preview will fail also. Will tell you in 24 minutes ;)

@Rafiot
Copy link
Contributor

Rafiot commented Apr 25, 2024

just making sure, you restarted the workers? The settings will not be taken in account otherwise.

@info-sic
Copy link
Author

I sure did, otherwise the timeout/crash would have occured much faster...

@info-sic
Copy link
Author

I assume, that it is beyond the scope of this project to fix soffice errors, so focusing on restarting the preview-capabilities after unavoidable crashes might be a better option? Maybe #187 could be done simultaniously? Because I'd really like to see that too for the workers ;) If the user doesn't get feedback for a longer time, he/she will eventually give up.

@Rafiot
Copy link
Contributor

Rafiot commented Apr 25, 2024

Yeah, I can see that, but the bar will not be doing anything more than moving somewhat randomly up to toe time it is done and the interface doesn't tell you to wait anymore.

@info-sic
Copy link
Author

Never underestimate the power of Microsoft-minutes (between 2s and 2h long) and some random moving patterns to keep users on a site. As long as we don't use RT-Os, a user-agent will never know, when the job is done. Btw, eithin the libreoffice UI, it took about 20sec to convert this particular xlsx.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants