You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I deployed a K8ssandraCluster with 96 replicas and medusa enabled, and one of the pods did not reach the Readiness probe
k get sts cs-95f5cdf50d-cs-95f5cdf50d-default-sts -n platform
NAME READY AGE
cs-95f5cdf50d-cs-95f5cdf50d-default-sts 95/96
I identified the faulty pod: It was not reaching readiness probe because of the medusa container.
The medusa gRPC server did not start because load_config() failed (see logs below).
Since the gRPC server was not started, the readiness probe was not reached.
The medusa container was "blocked" and did not attempt to restart the gRPC server.
I restarted the pod manually by deleting it, and the medusa gRPC server started successfully.
Did you expect to see something different?
I expect the pod to restart and to be in CrashLoopBackOff phase if a uncaught exception is raised by the medusa python process, instead of blocking indefinitely.
I believe this behavior was introduced by the following change : #731
How to reproduce it (as minimally and precisely as possible):
Start the medusa container with an invalid configuration
Environment
K8ssandra Operator version:
1.18
Medusa version:
0.21
Kubernetes version information:
1.29
Kubernetes cluster kind:
GKE
Medusa logs
MEDUSA_MODE = GRPC
sleeping for 0 sec
Starting Medusa gRPC service
WARNING:root:The CQL_USERNAME environment variable is deprecated and has been replaced by the MEDUSA_CQL_USERNAME variable
WARNING:root:The CQL_PASSWORD environment variable is deprecated and has been replaced by the MEDUSA_CQL_PASSWORD variable
Traceback (most recent call last):
File "<frozen runpy>", line 198, in _run_module_as_main
File "<frozen runpy>", line 88, in _run_code
File "/home/cassandra/.venv/lib/python3.11/site-packages/medusa/service/grpc/server.py", line 424, in <module>
asyncio.run(main())
File "/usr/lib/python3.11/asyncio/runners.py", line 190, in run
return runner.run(main)
^^^^^^^^^^^^^^^^
File "/usr/lib/python3.11/asyncio/runners.py", line 118, in run
return self._loop.run_until_complete(task)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/lib/python3.11/asyncio/base_events.py", line 654, in run_until_complete
return future.result()
^^^^^^^^^^^^^^^
File "/home/cassandra/.venv/lib/python3.11/site-packages/medusa/service/grpc/server.py", line 419, in main
server = Server(config_file_path)
^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/cassandra/.venv/lib/python3.11/site-packages/medusa/service/grpc/server.py", line 53, in __init__
self.medusa_config = self.create_config()
^^^^^^^^^^^^^^^^^^^^
File "/home/cassandra/.venv/lib/python3.11/site-packages/medusa/service/grpc/server.py", line 88, in create_config
return load_config(args, config_file)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/cassandra/.venv/lib/python3.11/site-packages/medusa/config.py", line 315, in load_config
config = parse_config(args, config_file)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/cassandra/.venv/lib/python3.11/site-packages/medusa/config.py", line 280, in parse_config
config.set('storage', 'fqdn', hostname_resolver.resolve_fqdn())
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/cassandra/.venv/lib/python3.11/site-packages/medusa/network/hostname_resolver.py", line 48, in resolve_fqdn
hostname = self.compute_k8s_hostname(ip_address_to_resolve)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/cassandra/.venv/lib/python3.11/site-packages/medusa/network/hostname_resolver.py", line 56, in compute_k8s_hostname
fqdns = dns.resolver.resolve(reverse_name, 'PTR')
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/cassandra/.venv/lib/python3.11/site-packages/dns/resolver.py", line 1565, in resolve
return get_default_resolver().resolve(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/cassandra/.venv/lib/python3.11/site-packages/dns/resolver.py", line 1307, in resolve
(request, answer) = resolution.next_request()
^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/cassandra/.venv/lib/python3.11/site-packages/dns/resolver.py", line 749, in next_request
raise NXDOMAIN(qnames=self.qnames_to_try, responses=self.nxdomain_responses)
dns.resolver.NXDOMAIN: The DNS query name does not exist: 92.49.20.172.in-addr.arpa.
┆Issue is synchronized with this Jira Story by Unito
┆Reviewer: Alexander Dejanovski
┆Fix Versions: 2024-10
┆Issue Number: MED-97
The text was updated successfully, but these errors were encountered:
Project board link
See k8ssandra-operator issue: k8ssandra/k8ssandra-operator#1406
What happened?
I deployed a K8ssandraCluster with 96 replicas and medusa enabled, and one of the pods did not reach the Readiness probe
I identified the faulty pod: It was not reaching readiness probe because of the
medusa
container.The
medusa
gRPC server did not start becauseload_config()
failed (see logs below).Since the gRPC server was not started, the readiness probe was not reached.
The
medusa
container was "blocked" and did not attempt to restart the gRPC server.I restarted the pod manually by deleting it, and the medusa gRPC server started successfully.
Did you expect to see something different?
I expect the pod to restart and to be in
CrashLoopBackOff
phase if a uncaught exception is raised by the medusa python process, instead of blocking indefinitely.I believe this behavior was introduced by the following change : #731
How to reproduce it (as minimally and precisely as possible):
Start the medusa container with an invalid configuration
Environment
1.18
0.21
1.29
GKE
Medusa logs
┆Issue is synchronized with this Jira Story by Unito
┆Reviewer: Alexander Dejanovski
┆Fix Versions: 2024-10
┆Issue Number: MED-97
The text was updated successfully, but these errors were encountered: