Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

atproto-hub startup failure, subscribe never got an ndb context #1687

Open
snarfed opened this issue Jan 14, 2025 · 0 comments
Open

atproto-hub startup failure, subscribe never got an ndb context #1687

snarfed opened this issue Jan 14, 2025 · 0 comments
Labels

Comments

@snarfed
Copy link
Owner

snarfed commented Jan 14, 2025

Odd one just now. atproto-hub instance died at 18:43:17 PT, it restarted, and subscribe kept dying with an ndb context error every time it tried to load the Cursor from the datastore. It never managed to get past that and connect to the relay. Ugh. Lasted until ~19:33 PT when I noticed and restarted it.

Somewhat related to #1315.

image image

Also our firehose processing delay metric was absent, so its alert didn't fire until after the restart. Double ugh. Need to do something about that.

image

Excerpted logs:

2025-01-13 18:43:17.000 [2025-01-14 02:43:17 +0000] [1] [CRITICAL] WORKER TIMEOUT (pid:49631) 
2025-01-13 18:43:17.000 Exception ignored in: <module 'threading' from '/layers/google.python.runtime/python/lib/python3.12/threading.py'>
2025-01-13 18:43:17.000 Traceback (most recent call last):
  File "/layers/google.python.runtime/python/lib/python3.12/threading.py", line 1594, in _shutdown
    atexit_call()
  File "/layers/google.python.runtime/python/lib/python3.12/concurrent/futures/thread.py", line 31, in _python_exit
    t.join()
  File "/layers/google.python.runtime/python/lib/python3.12/threading.py", line 1149, in join
    self._wait_for_tstate_lock()
  File "/layers/google.python.runtime/python/lib/python3.12/threading.py", line 1169, in _wait_for_tstate_lock
    if lock.acquire(block, timeout):
  File "/layers/google.python.pip/pip/lib/python3.12/site-packages/gunicorn/workers/base.py", line 204, in handle_abort
    sys.exit(1)
2025-01-13 18:43:17.000 SystemExit: 1
2025-01-13 18:43:18.000 [1] [ERROR] Worker (pid:49631) was sent SIGKILL! Perhaps out of memory?
...
Traceback (most recent call last):
  File "/workspace/atproto_firehose.py", line 123, in subscriber
    subscribe()
  File "/workspace/atproto_firehose.py", line 140, in subscribe
    cursor = Cursor.get_or_insert(
  File "google/cloud/ndb/_options.py", line 102, in wrapper
    return wrapped(*pass_args, **kwargs)
  File "google/cloud/ndb/utils.py", line 150, in positional_wrapper
    return wrapped(*args, **kwds)
  File "google/cloud/ndb/model.py", line 5995, in _get_or_insert
    return _cls._get_or_insert_async(_name, *args, **kwargs).result()
  File "google/cloud/ndb/tasklets.py", line 210, in result
    self.check_success()
  File "google/cloud/ndb/tasklets.py", line 157, in check_success
    raise self._exception
  File "google/cloud/ndb/tasklets.py", line 319, in _advance_tasklet
    yielded = self.generator.throw(type(error), error, traceback)
  File "google/cloud/ndb/model.py", line 6098, in get_or_insert
    entity = yield key.get_async(_options=options)
  File "google/cloud/ndb/tasklets.py", line 319, in _advance_tasklet
    yielded = self.generator.throw(type(error), error, traceback)
  File "google/cloud/ndb/key.py", line 943, in get
    entity_pb = yield _datastore_api.lookup(self._key, _options)
  File "google/cloud/ndb/tasklets.py", line 319, in _advance_tasklet
    yielded = self.generator.throw(type(error), error, traceback)
  File "google/cloud/ndb/_datastore_api.py", line 165, in lookup
    entity_pb = yield batch.add(key)
  File "google/cloud/ndb/tasklets.py", line 319, in _advance_tasklet
    yielded = self.generator.throw(type(error), error, traceback)
  File "google/cloud/ndb/_retry.py", line 97, in retry_wrapper
    raise error
  File "google/cloud/ndb/_retry.py", line 82, in retry_wrapper
    result = yield result
  File "google/cloud/ndb/tasklets.py", line 323, in _advance_tasklet
    yielded = self.generator.send(send_value)
  File "google/cloud/ndb/_datastore_api.py", line 89, in rpc_call
    context = context_module.get_toplevel_context()
  File "google/cloud/ndb/context.py", line 151, in get_toplevel_context
    raise exceptions.ContextError()
google.cloud.ndb.exceptions.ContextError: No current context. NDB calls must be made in context established by google.cloud.ndb.Client.context.
@snarfed snarfed added the infra label Jan 14, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant