Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WebSocketConnectionClosedException Connection to remote host was lost. #38

Open
craigwalton-dsit opened this issue Dec 20, 2024 · 0 comments
Labels
3rd party errors Errors observed from 3rd party code such as websocket or SSL errors

Comments

@craigwalton-dsit
Copy link
Collaborator

craigwalton-dsit commented Dec 20, 2024

_Migrated from internal repo.
Complete stack trace and logs (sensitive) https://github.com/AI-Safety-Institute/aisi-inspect-tools/issues/142
Original date: 23 Oct 2024
Originally raised by @willpayne23

"Traceback (most recent call last):\n\n  File \"redacted/.venv/lib/python3.12/site-packages/inspect_ai/_eval/task/run.py\", line 260, in task_run\
       │ n    sample_results = await asyncio.gather(*sample_coroutines)\n                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\n  File \"redacted/.venv/lib/python
       │ 3.12/site-packages/inspect_ai/_eval/task/run.py\", line 424, in task_run_sample\n    error = sample_error(ex)\n            ^^^^^^^^^^^^^^^^\n\n  File \"redacted/.v
       │ env/lib/python3.12/site-packages/inspect_ai/_eval/task/error.py\", line 22, in __call__\n    raise ex\n\n  File \"redacted/.venv/lib/python3.12/site-packages/inspe
       │ ct_ai/_eval/task/run.py\", line 416, in task_run_sample\n    state = await plan(state, generate)\n            ^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\n  File \"redacted/.ven
       │ v/lib/python3.12/site-packages/inspect_ai/solver/_plan.py\", line 105, in __call__\n    state = await solver(state, generate)\n            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\n  File \"/
       │ /.venv/lib/python3.12/site-packages/inspect_ai/solver/_basic_agent.py\", line 159, in solve\n    tool_results = await call_tools(state.output.message, state
       │ .tools)\n                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\n  File \"redacted/.venv/lib/python3.12/site-packages/inspect_ai/model/_call_tool
       │ s.py\", line 149, in call_tools\n    results = await asyncio.gather(*tasks)\n              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\n  File \"redacted/.venv/lib/python3.12/s
       │ ite-packages/inspect_ai/model/_call_tools.py\", line 75, in call_tool_task\n    result = await call_tool(tdefs, call)\n             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\n  File \"/home/ubu
       │ /.venv/lib/python3.12/site-packages/inspect_ai/model/_call_tools.py\", line 203, in call_tool\n    result = await tool_def.tool(**arguments)\n             ^^^^^^^^^
       │ ^^^^^^^^^^^^^^^^^^^^^^^\n\n  File \"redacted/run/src/agents/tools/python.py\", line 29, in execute\n    result = await sandbox().exec(\n             ^^^^^^^^^^^^^^
       │ ^^^^^^^\n\n  File \"redacted/.venv/lib/python3.12/site-packages/aisitools/k8s_sandbox/sandbox_environment.py\", line 105, in exec\n    return await self._pod.exec(
       │ cmd, input, timeout)\n           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\n  File \"redacted/.venv/lib/python3.12/site-packages/aisitools/k8s_sandbox/pod.py\",
       │ line 56, in exec\n    result = await self._run_asynchronously(\n             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\n  File \"redacted/.venv/lib/python3.12/site-package
       │ s/aisitools/k8s_sandbox/pod.py\", line 100, in _run_asynchronously\n    return await loop.run_in_executor(\n           ^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\n  File \"/usr/lib/python3.12/con
       │ current/futures/thread.py\", line 58, in run\n    result = self.fn(*self.args, **self.kwargs)\n             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\n  File \"redacted
       │ /.venv/lib/python3.12/site-packages/aisitools/k8s_sandbox/pod.py\", line 57, in <lambda>\n    lambda: executor.exec(cmd, stdin, timeout)\n            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
       │ ^^\n\n  File \"redacted/.venv/lib/python3.12/site-packages/aisitools/k8s_sandbox/pod.py\", line 175, in exec\n    result = self._handle_stream_output(response, tim
       │ eout is not None)\n             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\n  File \"redacted/.venv/lib/python3.12/site-packages/aisitools/k8s_san
       │ dbox/pod.py\", line 208, in _handle_stream_output\n    response.run_forever()\n\n  File \"redacted/.venv/lib/python3.12/site-packages/kubernetes/stream/ws_client.p
       │ y\", line 229, in run_forever\n    self.update(timeout=None)\n\n  File \"redacted/.venv/lib/python3.12/site-packages/kubernetes/stream/ws_client.py\", line 197, in
       │  update\n    op_code, frame = self.sock.recv_data_frame(True)\n                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\n  File \"redacted/.venv/lib/python3.12/site-
       │ packages/websocket/_core.py\", line 437, in recv_data_frame\n    frame = self.recv_frame()\n            ^^^^^^^^^^^^^^^^^\n\n  File \"redacted/.venv/lib/python3.12
       │ /site-packages/websocket/_core.py\", line 478, in recv_frame\n    return self.frame_buffer.recv_frame()\n           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\n  File \"redacted
       │ /.venv/lib/python3.12/site-packages/websocket/_abnf.py\", line 377, in recv_frame\n    payload = self.recv_strict(length)\n              ^^^^^^^^^^^^^^^^^^^^^^^^\n\n  File \"/hom
       │ /.venv/lib/python3.12/site-packages/websocket/_abnf.py\", line 398, in recv_strict\n    bytes_ = self.recv(min(16384, shortage))\n             ^^^^^^^^^^^^^^^^
       │ ^^^^^^^^^^^^^^^\n\n  File \"redacted/.venv/lib/python3.12/site-packages/websocket/_core.py\", line 563, in _recv\n    return recv(self.sock, bufsize)\n           ^
       │ ^^^^^^^^^^^^^^^^^^^^^^^\n\n  File \"redacted/.venv/lib/python3.12/site-packages/websocket/_socket.py\", line 132, in recv\n    raise WebSocketConnectionClosedExcep
       │ tion(\"Connection to remote host was lost.\")\n\nwebsocket._exceptions.WebSocketConnectionClosedException: Connection to remote host was lost.\n

Another instance of this, with the improved logging below (26 Oct 2024)

WebSocketConnectionClosedException: Connection to remote host was lost.
...
K8sError: Error during: Execute command in pod. {"pod": "agent-env-nqmhh6q4-default-0", ...

With timestamps (from py log file). Why did nearly an hour elapse between starting the command and the failure?

2024-10-26 23:53:24,982 - SANDBOX - K8S: Starting: Execute command in pod. {"pod": "agent-env-nqmhh6q4-default-0", ...
2024-10-27 00:48:42,974 - ERROR - K8S: Error during: Execute command in pod. {"cause": "Connection to remote host was lost.", "pod": "agent-env-nqmhh6q4-default-0", ...

Kubernetes cluster events. Note the "node not ready".

2024-10-26T23:50:36Z   Normal    agent-env-nqmhh6q4-default-0                  Scheduled               Successfully assigned agent/agent-env-nqmhh6q4-default-0 to ip-192-168-102-178.eu-west-2.compute.internal
2024-10-26T23:50:37Z   Normal    agent-env-nqmhh6q4-default-0                  Started                 Started container resolve-coredns-ip
2024-10-26T23:50:37Z   Normal    agent-env-nqmhh6q4-default-0                  Created                 Created container resolve-coredns-ip
2024-10-26T23:50:37Z   Normal    agent-env-nqmhh6q4-default-0                  Pulled                  Container image "toolbelt/dig:2024-09-23" already present on machine
2024-10-26T23:50:38Z   Normal    agent-env-nqmhh6q4-default-0                  Pulled                  Container image "redacted" already present on machine
2024-10-26T23:50:39Z   Normal    agent-env-nqmhh6q4-default-0                  Created                 Created container default
2024-10-26T23:50:39Z   Normal    agent-env-nqmhh6q4-default-0                  Started                 Started container default
2024-10-26T23:55:46Z   Warning   agent-env-nqmhh6q4-default-0                  NodeNotReady            Node is not ready
2024-10-27T00:00:51Z   Normal    agent-env-nqmhh6q4-default                    SuccessfulCreate        create Pod agent-env-nqmhh6q4-default-0 in StatefulSet agent-env-nqmhh6q4-default successful
2024-10-27T00:00:51Z   Normal    agent-env-nqmhh6q4-default-0                  TaintManagerEviction    Marking for deletion Pod agent/agent-env-nqmhh6q4-default-0
2024-10-27T00:00:51Z   Normal    agent-env-nqmhh6q4-default-0                  Scheduled               Successfully assigned agent/agent-env-nqmhh6q4-default-0 to ip-192-168-108-64.eu-west-2.compute.internal
2024-10-27T00:00:52Z   Normal    agent-env-nqmhh6q4-default-0                  Pulled                  Container image "toolbelt/dig:2024-09-23" already present on machine
2024-10-27T00:00:53Z   Normal    agent-env-nqmhh6q4-default-0                  Created                 Created container resolve-coredns-ip
2024-10-27T00:00:53Z   Normal    agent-env-nqmhh6q4-default-0                  Started                 Started container resolve-coredns-ip
2024-10-27T00:00:54Z   Normal    agent-env-nqmhh6q4-default-0                  Pulled                  Container image "redacted" already present on machine
2024-10-27T00:00:54Z   Normal    agent-env-nqmhh6q4-default-0                  Started                 Started container default
2024-10-27T00:00:54Z   Normal    agent-env-nqmhh6q4-default-0                  Created                 Created container default
2024-10-27T00:48:44Z   Normal    agent-env-nqmhh6q4-default-0                  Killing                 Stopping container default

Another instance of this (02 Nov 2024)

│ redacted/.venv/lib/python3.12/site-packages/websocket/_socket.py:132 in recv                    │
│                                                                                                                      │
│   129 │   │   │   raise                                                                                              │
│   130 │                                                                                                              │
│   131 │   if not bytes_:                                                                                             │
│ > 132 │   │   raise WebSocketConnectionClosedException("Connection to remote host was lost.")                        │
│   133 │                                                                                                              │
│   134 │   return bytes_                                                                                              │
│   135                                                                                                                │
└──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘
WebSocketConnectionClosedException: Connection to remote host was lost.
...
K8sError: Error during: Execute command in pod. {"pod": "agent-env-z2z8u7np-default-0", ...
2024-11-02 12:31:25,541 - ERROR - K8S: Error during: Execute command in pod. {"cause": "Connection to remote host was lost.", "pod": "agent-env-z2z8u7np-default-0",...

cluster events:

2024-11-02T11:36:42Z   Normal    agent-env-z2z8u7np-default-0                  Scheduled               Successfully assigned agent/agent-env-z2z8u7np-default-0 to ip-192-168-156-230.eu-west-2.compute.internal
2024-11-02T11:36:43Z   Normal    agent-env-z2z8u7np-default-0                  Created                 Created container resolve-coredns-ip
2024-11-02T11:36:43Z   Normal    agent-env-z2z8u7np-default-0                  Started                 Started container resolve-coredns-ip
2024-11-02T11:36:43Z   Normal    agent-env-z2z8u7np-default-0                  Pulled                  Container image "toolbelt/dig:2024-09-23" already present on machine
2024-11-02T11:36:44Z   Normal    agent-env-z2z8u7np-default-0                  Pulled                  Container image "redacted" already present on machine
2024-11-02T11:36:44Z   Normal    agent-env-z2z8u7np-default-0                  Started                 Started container default
2024-11-02T11:36:44Z   Normal    agent-env-z2z8u7np-default-0                  Created                 Created container default
2024-11-02T11:39:42Z   Warning   agent-env-z2z8u7np-default-0                  NodeNotReady            Node is not ready
2024-11-02T11:44:47Z   Normal    agent-env-z2z8u7np-default-0                  TaintManagerEviction    Marking for deletion Pod agent/agent-env-z2z8u7np-default-0
2024-11-02T11:44:48Z   Normal    agent-env-z2z8u7np-default                    SuccessfulCreate        create Pod agent-env-z2z8u7np-default-0 in StatefulSet agent-env-z2z8u7np-default successful
2024-11-02T11:44:48Z   Normal    agent-env-z2z8u7np-default-0                  Scheduled               Successfully assigned agent/agent-env-z2z8u7np-default-0 to ip-192-168-129-237.eu-west-2.compute.internal
2024-11-02T11:44:49Z   Normal    agent-env-z2z8u7np-default-0                  Pulled                  Container image "toolbelt/dig:2024-09-23" already present on machine
2024-11-02T11:44:49Z   Normal    agent-env-z2z8u7np-default-0                  Created                 Created container resolve-coredns-ip
2024-11-02T11:44:49Z   Normal    agent-env-z2z8u7np-default-0                  Started                 Started container resolve-coredns-ip
2024-11-02T11:44:50Z   Normal    agent-env-z2z8u7np-default-0                  Created                 Created container default
2024-11-02T11:44:50Z   Normal    agent-env-z2z8u7np-default-0                  Started                 Started container default
2024-11-02T11:44:50Z   Normal    agent-env-z2z8u7np-default-0                  Pulled                  Container image "rwedacted" already present on machine

06 November 2024

│ redacted/.venv/lib/python3.12/site-packages/websocket/_core.py:563 in _recv                     │
│                                                                                                                      │
│   560 │                                                                                                              │
│   561 │   def _recv(self, bufsize):                                                                                  │
│   562 │   │   try:                                                                                                   │
│ > 563 │   │   │   return recv(self.sock, bufsize)                                                                    │
│   564 │   │   except WebSocketConnectionClosedException:                                                             │
│   565 │   │   │   if self.sock:                                                                                      │
│   566 │   │   │   │   self.sock.close()                                                                              │
│                                                                                                                      │
│ redacted/.venv/lib/python3.12/site-packages/websocket/_socket.py:132 in recv                    │
│                                                                                                                      │
│   129 │   │   │   raise                                                                                              │
│   130 │                                                                                                              │
│   131 │   if not bytes_:                                                                                             │
│ > 132 │   │   raise WebSocketConnectionClosedException("Connection to remote host was lost.")                        │
│   133 │                                                                                                              │
│   134 │   return bytes_                                                                                              │
│   135                                                                                                                │
└──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘
WebSocketConnectionClosedException: Connection to remote host was lost.
...
K8sError: Error during: Execute command in pod. {"pod": "agent-env-r8s8i9gs-default-0", ...
2024-11-06 01:32:23,797 - SANDBOX - K8S: Starting: Write file to pod. {"pod": "agent-env-r8s8i9gs-default-0", ...
2024-11-06 01:32:23,944 - SANDBOX - K8S: Completed: Write file to pod. {"pod": "agent-env-r8s8i9gs-default-0", ...
2024-11-06 01:32:23,944 - SANDBOX - K8S: Starting: Write file to pod. {"pod": "agent-env-r8s8i9gs-default-0", ...
2024-11-06 01:32:24,107 - SANDBOX - K8S: Completed: Write file to pod. {"pod": "agent-env-r8s8i9gs-default-0", ...
2024-11-06 01:32:25,185 - SANDBOX - K8S: Starting: Execute command in pod. {"pod": "agent-env-r8s8i9gs-default-0", ...
2024-11-06 01:32:25,398 - SANDBOX - K8S: Completed: Execute command in pod. {"result": "ExecResult(success=True, returncode=0, ...
2024-11-06 01:32:25,398 - SANDBOX - K8S: Starting: Read file from pod. {"pod": "agent-env-r8s8i9gs-default-0", ...
2024-11-06 01:32:25,533 - SANDBOX - K8S: Completed: Read file from pod. {"pod": "agent-env-r8s8i9gs-default-0", ...
2024-11-06 01:32:26,484 - SANDBOX - K8S: Starting: Execute command in pod. {"pod": "agent-env-r8s8i9gs-default-0",...
2024-11-06 01:32:26,649 - SANDBOX - K8S: Completed: Execute command in pod. {"result": "ExecResult(success=True, returncode=0, ...
2024-11-06 01:32:26,649 - SANDBOX - K8S: Starting: Read file from pod. {"pod": "agent-env-r8s8i9gs-default-0", ...
2024-11-06 01:32:26,775 - SANDBOX - K8S: Completed: Read file from pod. {"pod": "agent-env-r8s8i9gs-default-0", ...
2024-11-06 01:32:39,921 - SANDBOX - K8S: Starting: Execute command in pod. {"pod": "agent-env-r8s8i9gs-default-0", ...
2024-11-06 01:39:47,174 - SANDBOX - K8S: Error during: Execute command in pod. {"cause": "Command timed out after 300s. ExecResult(success=False, returncode=124, ...
2024-11-06 01:39:52,744 - SANDBOX - K8S: Starting: Execute command in pod. {"pod": "agent-env-r8s8i9gs-default-0", ...
2024-11-06 01:39:52,922 - SANDBOX - K8S: Completed: Execute command in pod. {"result": "ExecResult(success=False, returncode=1, ...
2024-11-06 01:40:07,160 - SANDBOX - K8S: Starting: Execute command in pod. {"pod": "agent-env-r8s8i9gs-default-0", ...
2024-11-06 02:32:15,986 - ERROR - K8S: Error during: Execute command in pod. {"cause": "Connection to remote host was lost.", "pod": "agent-env-r8s8i9gs-default-0", ...

Note the 50 minutes between starting (or technically queueing) the command the when the actual error was raised.

2024-11-06T01:32:19Z   Normal    agent-env-r8s8i9gs-coredns                    ScalingReplicaSet       Scaled up replica set agent-env-r8s8i9gs-coredns-84dcf44548 to 1
2024-11-06T01:32:19Z   Normal    agent-env-r8s8i9gs-coredns-84dcf44548         SuccessfulCreate        Created pod: agent-env-r8s8i9gs-coredns-84dcf44548-vxb8l
2024-11-06T01:32:19Z   Normal    agent-env-r8s8i9gs-shared-volume              Provisioning            External provisioner is provisioning volume for claim "agent/agent-env-r8s8i9gs-shared-volume"
2024-11-06T01:32:19Z   Normal    agent-env-r8s8i9gs-ghidra                     SuccessfulCreate        create Pod agent-env-r8s8i9gs-ghidra-0 in StatefulSet agent-env-r8s8i9gs-ghidra successful
2024-11-06T01:32:19Z   Normal    agent-env-r8s8i9gs-shared-volume              ExternalProvisioning    Waiting for a volume to be created either by the external provisioner 'nfs.csi.k8s.io' or manually by the system administrator. If volume creation is delayed, please verify that the provisioner is running and correctly registered.
2024-11-06T01:32:19Z   Normal    agent-env-r8s8i9gs-shared-volume              ProvisioningSucceeded   Successfully provisioned volume pvc-13d0b8e9-9e15-4ced-b627-69ea8416d136
2024-11-06T01:32:19Z   Normal    agent-env-r8s8i9gs-ghidra-0                   Scheduled               Successfully assigned agent/agent-env-r8s8i9gs-ghidra-0 to ip-192-168-104-100.eu-west-2.compute.internal
2024-11-06T01:32:19Z   Normal    agent-env-r8s8i9gs-coredns-84dcf44548-vxb8l   Scheduled               Successfully assigned agent/agent-env-r8s8i9gs-coredns-84dcf44548-vxb8l to ip-192-168-104-100.eu-west-2.compute.internal
2024-11-06T01:32:20Z   Normal    agent-env-r8s8i9gs-default-0                  Scheduled               Successfully assigned agent/agent-env-r8s8i9gs-default-0 to ip-192-168-181-54.eu-west-2.compute.internal
2024-11-06T01:32:20Z   Normal    agent-env-r8s8i9gs-ghidra-0                   Started                 Started container resolve-coredns-ip
2024-11-06T01:32:20Z   Normal    agent-env-r8s8i9gs-ghidra-0                   Created                 Created container resolve-coredns-ip
2024-11-06T01:32:20Z   Normal    agent-env-r8s8i9gs-ghidra-0                   Pulled                  Container image "toolbelt/dig:2024-09-23" already present on machine
2024-11-06T01:32:20Z   Normal    agent-env-r8s8i9gs-coredns-84dcf44548-vxb8l   Pulled                  Container image "coredns/coredns:1.8.3" already present on machine
2024-11-06T01:32:20Z   Normal    agent-env-r8s8i9gs-coredns-84dcf44548-vxb8l   Created                 Created container coredns
2024-11-06T01:32:20Z   Normal    agent-env-r8s8i9gs-coredns-84dcf44548-vxb8l   Started                 Started container coredns
2024-11-06T01:32:21Z   Normal    agent-env-r8s8i9gs-default-0                  Created                 Created container resolve-coredns-ip
2024-11-06T01:32:21Z   Normal    agent-env-r8s8i9gs-default-0                  Pulled                  Container image "toolbelt/dig:2024-09-23" already present on machine
2024-11-06T01:32:21Z   Normal    agent-env-r8s8i9gs-ghidra-0                   Started                 Started container ghidra
2024-11-06T01:32:21Z   Normal    agent-env-r8s8i9gs-ghidra-0                   Created                 Created container ghidra
2024-11-06T01:32:21Z   Normal    agent-env-r8s8i9gs-default-0                  Started                 Started container resolve-coredns-ip
2024-11-06T01:32:21Z   Normal    agent-env-r8s8i9gs-ghidra-0                   Pulled                  Container image "redacted" already present on machine
2024-11-06T01:32:22Z   Normal    agent-env-r8s8i9gs-default-0                  Created                 Created container default
2024-11-06T01:32:22Z   Normal    agent-env-r8s8i9gs-default-0                  Started                 Started container default
2024-11-06T01:32:22Z   Normal    agent-env-r8s8i9gs-default-0                  Pulled                  Container image "redacted" already present on machine
2024-11-06T01:33:37Z   Warning   agent-env-r8s8i9gs-default-0                  NodeNotReady            Node is not ready
2024-11-06T01:38:42Z   Normal    agent-env-r8s8i9gs-default                    SuccessfulCreate        create Pod agent-env-r8s8i9gs-default-0 in StatefulSet agent-env-r8s8i9gs-default successful
2024-11-06T01:38:42Z   Normal    agent-env-r8s8i9gs-default-0                  TaintManagerEviction    Marking for deletion Pod agent/agent-env-r8s8i9gs-default-0
2024-11-06T01:38:42Z   Normal    agent-env-r8s8i9gs-default-0                  Scheduled               Successfully assigned agent/agent-env-r8s8i9gs-default-0 to ip-192-168-129-171.eu-west-2.compute.internal
2024-11-06T01:38:42Z   Normal    agent-env-r8s8i9gs-default-0                  TaintManagerEviction    Cancelling deletion of Pod agent/agent-env-r8s8i9gs-default-0
2024-11-06T01:38:43Z   Normal    agent-env-r8s8i9gs-default-0                  Created                 Created container resolve-coredns-ip
2024-11-06T01:38:43Z   Normal    agent-env-r8s8i9gs-default-0                  Pulled                  Container image "toolbelt/dig:2024-09-23" already present on machine
2024-11-06T01:38:43Z   Normal    agent-env-r8s8i9gs-default-0                  Started                 Started container resolve-coredns-ip
2024-11-06T01:38:44Z   Normal    agent-env-r8s8i9gs-default-0                  Started                 Started container default
2024-11-06T01:38:44Z   Normal    agent-env-r8s8i9gs-default-0                  Created                 Created container default
2024-11-06T01:38:44Z   Normal    agent-env-r8s8i9gs-default-0                  Pulled                  Container image "redacted" already present on machine
2024-11-06T01:39:48Z   Normal    agent-env-r8s8i9gs-default-0                  Killing                 Stopping container default
2024-11-06T01:40:57Z   Warning   agent-env-r8s8i9gs-default-0                  NodeNotReady            Node is not ready
2024-11-06T02:32:16Z   Normal    agent-env-r8s8i9gs-ghidra-0                   Killing                 Stopping container ghidra
2024-11-06T02:32:16Z   Normal    agent-env-r8s8i9gs-coredns-84dcf44548-vxb8l   Killing                 Stopping container coredns

It looks like the default container was started at 01:32:22, then the node was marked as not ready at 01:33:37 (by which point some write_files had already taken place). Note that the exec started at 01:32:39 errored at 01:39:47.


06 Nov 2024

│ redacted/.venv/lib/python3.12/site-packages/websocket/_socket.py:132 in recv                    │
│                                                                                                                      │
│   129 │   │   │   raise                                                                                              │
│   130 │                                                                                                              │
│   131 │   if not bytes_:                                                                                             │
│ > 132 │   │   raise WebSocketConnectionClosedException("Connection to remote host was lost.")                        │
│   133 │                                                                                                              │
│   134 │   return bytes_                                                                                              │
│   135                                                                                                                │
└──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘
WebSocketConnectionClosedException: Connection to remote host was lost.
...
K8sError: Error during: Execute command in pod. {"pod": "agent-env-san7tnhu-default-0", ...
2024-11-05T22:40:18Z   Warning   agent-env-san7tnhu-default-0                  FailedScheduling        0/46 nodes are available: 2 node(s) had untolerated taint {CriticalAddonsOnly: true}, 2 node(s) had untolerated taint {aisi.gov.uk/dev: true}, 2 node(s) had untolerated taint {aisi.gov.uk/devpods: true}, 2 node(s) had untolerated taint {node.kubernetes.io/unreachable: }, 38 Insufficient memory. preemption: 0/46 nodes are available: 38 No preemption victims found for incoming pod, 8 Preemption is not helpful for scheduling.
2024-11-05T22:40:22Z   Normal    agent-env-san7tnhu-default-0                  Scheduled               Successfully assigned agent/agent-env-san7tnhu-default-0 to ip-192-168-160-39.eu-west-2.compute.internal
2024-11-05T22:40:23Z   Normal    agent-env-san7tnhu-default-0                  Pulled                  Container image "toolbelt/dig:2024-09-23" already present on machine
2024-11-05T22:40:23Z   Normal    agent-env-san7tnhu-default-0                  Created                 Created container resolve-coredns-ip
2024-11-05T22:40:24Z   Normal    agent-env-san7tnhu-default-0                  Created                 Created container default
2024-11-05T22:40:24Z   Normal    agent-env-san7tnhu-default-0                  Started                 Started container resolve-coredns-ip
2024-11-05T22:40:24Z   Normal    agent-env-san7tnhu-default-0                  Pulled                  Container image "redacted" already present on machine
2024-11-05T22:40:25Z   Normal    agent-env-san7tnhu-default-0                  Started                 Started container default
2024-11-05T22:56:46Z   Warning   agent-env-san7tnhu-default-0                  NodeNotReady            Node is not ready
2024-11-05T23:01:51Z   Normal    agent-env-san7tnhu-default-0                  TaintManagerEviction    Cancelling deletion of Pod agent/agent-env-san7tnhu-default-0
2024-11-05T23:01:51Z   Normal    agent-env-san7tnhu-default-0                  TaintManagerEviction    Marking for deletion Pod agent/agent-env-san7tnhu-default-0
2024-11-05T23:01:51Z   Normal    agent-env-san7tnhu-default                    SuccessfulCreate        create Pod agent-env-san7tnhu-default-0 in StatefulSet agent-env-san7tnhu-default successful
2024-11-05T23:08:20Z   Warning   agent-env-san7tnhu-default-0                  FailedScheduling        0/46 nodes are available: 2 node(s) had untolerated taint {CriticalAddonsOnly: true}, 2 node(s) had untolerated taint {aisi.gov.uk/dev: true}, 2 node(s) had untolerated taint {aisi.gov.uk/devpods: true}, 36 Insufficient memory, 4 node(s) had untolerated taint {node.kubernetes.io/unreachable: }. preemption: 0/46 nodes are available: 10 Preemption is not helpful for scheduling, 36 No preemption victims found for incoming pod.
2024-11-05T23:09:01Z   Normal    agent-env-san7tnhu-default-0                  Pulled                  Container image "toolbelt/dig:2024-09-23" already present on machine
2024-11-05T23:09:01Z   Normal    agent-env-san7tnhu-default-0                  Started                 Started container resolve-coredns-ip
2024-11-05T23:09:01Z   Normal    agent-env-san7tnhu-default-0                  Created                 Created container resolve-coredns-ip
2024-11-05T23:09:02Z   Normal    agent-env-san7tnhu-default-0                  Created                 Created container default
2024-11-05T23:09:02Z   Normal    agent-env-san7tnhu-default-0                  Pulled                  Container image "redacted" already present on machine
2024-11-05T23:09:02Z   Normal    agent-env-san7tnhu-default-0                  Started                 Started container default
2024-11-05T23:40:06Z   Normal    agent-env-san7tnhu-default-0                  Killing                 Stopping container default
@craigwalton-dsit craigwalton-dsit added the 3rd party errors Errors observed from 3rd party code such as websocket or SSL errors label Dec 20, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3rd party errors Errors observed from 3rd party code such as websocket or SSL errors
Projects
None yet
Development

No branches or pull requests

1 participant