You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Odd one just now. atproto-hub instance died at 18:43:17 PT, it restarted, and subscribe kept dying with an ndb context error every time it tried to load the Cursor from the datastore. It never managed to get past that and connect to the relay. Ugh. Lasted until ~19:33 PT when I noticed and restarted it.
Also our firehose processing delay metric was absent, so its alert didn't fire until after the restart. Double ugh. Need to do something about that.
Excerpted logs:
2025-01-13 18:43:17.000 [2025-01-14 02:43:17 +0000] [1] [CRITICAL] WORKER TIMEOUT (pid:49631)
2025-01-13 18:43:17.000 Exception ignored in: <module 'threading' from '/layers/google.python.runtime/python/lib/python3.12/threading.py'>
2025-01-13 18:43:17.000 Traceback (most recent call last):
File "/layers/google.python.runtime/python/lib/python3.12/threading.py", line 1594, in _shutdown
atexit_call()
File "/layers/google.python.runtime/python/lib/python3.12/concurrent/futures/thread.py", line 31, in _python_exit
t.join()
File "/layers/google.python.runtime/python/lib/python3.12/threading.py", line 1149, in join
self._wait_for_tstate_lock()
File "/layers/google.python.runtime/python/lib/python3.12/threading.py", line 1169, in _wait_for_tstate_lock
if lock.acquire(block, timeout):
File "/layers/google.python.pip/pip/lib/python3.12/site-packages/gunicorn/workers/base.py", line 204, in handle_abort
sys.exit(1)
2025-01-13 18:43:17.000 SystemExit: 1
2025-01-13 18:43:18.000 [1] [ERROR] Worker (pid:49631) was sent SIGKILL! Perhaps out of memory?
...
Traceback (most recent call last):
File "/workspace/atproto_firehose.py", line 123, in subscriber
subscribe()
File "/workspace/atproto_firehose.py", line 140, in subscribe
cursor = Cursor.get_or_insert(
File "google/cloud/ndb/_options.py", line 102, in wrapper
return wrapped(*pass_args, **kwargs)
File "google/cloud/ndb/utils.py", line 150, in positional_wrapper
return wrapped(*args, **kwds)
File "google/cloud/ndb/model.py", line 5995, in _get_or_insert
return _cls._get_or_insert_async(_name, *args, **kwargs).result()
File "google/cloud/ndb/tasklets.py", line 210, in result
self.check_success()
File "google/cloud/ndb/tasklets.py", line 157, in check_success
raise self._exception
File "google/cloud/ndb/tasklets.py", line 319, in _advance_tasklet
yielded = self.generator.throw(type(error), error, traceback)
File "google/cloud/ndb/model.py", line 6098, in get_or_insert
entity = yield key.get_async(_options=options)
File "google/cloud/ndb/tasklets.py", line 319, in _advance_tasklet
yielded = self.generator.throw(type(error), error, traceback)
File "google/cloud/ndb/key.py", line 943, in get
entity_pb = yield _datastore_api.lookup(self._key, _options)
File "google/cloud/ndb/tasklets.py", line 319, in _advance_tasklet
yielded = self.generator.throw(type(error), error, traceback)
File "google/cloud/ndb/_datastore_api.py", line 165, in lookup
entity_pb = yield batch.add(key)
File "google/cloud/ndb/tasklets.py", line 319, in _advance_tasklet
yielded = self.generator.throw(type(error), error, traceback)
File "google/cloud/ndb/_retry.py", line 97, in retry_wrapper
raise error
File "google/cloud/ndb/_retry.py", line 82, in retry_wrapper
result = yield result
File "google/cloud/ndb/tasklets.py", line 323, in _advance_tasklet
yielded = self.generator.send(send_value)
File "google/cloud/ndb/_datastore_api.py", line 89, in rpc_call
context = context_module.get_toplevel_context()
File "google/cloud/ndb/context.py", line 151, in get_toplevel_context
raise exceptions.ContextError()
google.cloud.ndb.exceptions.ContextError: No current context. NDB calls must be made in context established by google.cloud.ndb.Client.context.
The text was updated successfully, but these errors were encountered:
Odd one just now. atproto-hub instance died at 18:43:17 PT, it restarted, and
subscribe
kept dying with an ndb context error every time it tried to load theCursor
from the datastore. It never managed to get past that and connect to the relay. Ugh. Lasted until ~19:33 PT when I noticed and restarted it.Somewhat related to #1315.
Also our firehose processing delay metric was absent, so its alert didn't fire until after the restart. Double ugh. Need to do something about that.
Excerpted logs:
The text was updated successfully, but these errors were encountered: