-
Notifications
You must be signed in to change notification settings - Fork 36
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
worker: fix deadlock when LoggingThread
wrote into its own Queue
#257
worker: fix deadlock when LoggingThread
wrote into its own Queue
#257
Conversation
Such failures were seen with |
I'm a little confused under exactly what conditions this occurs. I don't suppose you have a backtrace in the deadlock case, i.e. how exactly did we arrive at I'm asking because: the first thing that comes to mind is that there's a logger which is attached to sys.stdout or stderr, which feeds back into the LoggingThread, so if such a logger is activated from within the LoggingThread it can get stuck waiting for space in the queue forever. But in this change, you've also added a logging statement in this case:
But if so, will that lead to infinite recursion? It seems like that log statement can bring us back into |
@rohanpm Good point. If Traceback (most recent call first):
File "/usr/lib64/python3.9/threading.py", line 312, in wait
waiter.acquire()
File "/usr/lib64/python3.9/queue.py", line 140, in put
self.not_full.wait()
File "/usr/lib/python3.9/site-packages/kobo/worker/logger.py", line 96, in write
self._queue.put(data)
File "/usr/lib/python3.9/site-packages/kobo/worker/logger.py", line 119, in write
self._thread.write(data)
<built-in method print of module object at remote 0x7f8697bcfae0>
File "/usr/lib/python3.9/site-packages/kobo/xmlrpc.py", line 487, in request
print("XML-RPC connection to %s failed: %s, %s" % (args[0], ex.args[1:], retries), file=sys.stderr)
File "/usr/lib64/python3.9/xmlrpc/client.py", line 1464, in __request
response = self.__transport.request(
File "/usr/lib64/python3.9/xmlrpc/client.py", line 1122, in __call__
return self.__send(self.__name, args)
File "/usr/lib/python3.9/site-packages/kobo/client/__init__.py", line 510, in upload_task_log
self._hub.worker.upload_task_log(task_id, remote_file_name, mode, chunk_start, chunk_len, chunk_checksum, encoded_chunk)
File "/usr/lib/python3.9/site-packages/kobo/worker/logger.py", line 69, in run
self._hub.upload_task_log(BytesIO(self._send_data), self._task_id, "stdout.log", append=True)
File "/usr/lib64/python3.9/threading.py", line 980, in _bootstrap_inner
self.run() |
If self._hub.upload_task_log() called self._queue.put(), it would cause deadlock because 1. self._queue uses locks that are not reentrant. 2. it will block if the Queue is already full. Co-authored-by: Kamil Dudka <[email protected]> Co-authored-by: Lukáš Zaoral <[email protected]>
2072322
to
bc3df3e
Compare
@rohanpm I've amended the commit to avoid infinite recursion when the logger has a Note that it is not the case of |
If
self._hub.upload_task_log()
calledself._queue.put()
, it would cause deadlock because:self._queue
uses locks that are not reentrant.Queue
is already full.