Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SSLEOFError EOF occurred in violation of protocol #37

Open
craigwalton-dsit opened this issue Dec 20, 2024 · 0 comments
Open

SSLEOFError EOF occurred in violation of protocol #37

craigwalton-dsit opened this issue Dec 20, 2024 · 0 comments
Labels
3rd party errors Errors observed from 3rd party code such as websocket or SSL errors

Comments

@craigwalton-dsit
Copy link
Collaborator

craigwalton-dsit commented Dec 20, 2024

Migrated from internal repo.
Complete stack trace and logs (sensitive) https://github.com/AI-Safety-Institute/aisi-inspect-tools/issues/133
Original date: 18 Oct 2024

Whilst running an eval set.

Taken from inspect log (hence odd json formatting)

"message": "SSLEOFError(8, 'EOF occurred in violation of protocol (_ssl.c:2406)')",
    "traceback": "Traceback (most recent call last):
  File \"redacted/.venv/lib/python3.12/site-packages/inspect_ai/_eval/task/run.py\", line 263, in task_run
    sample_results = await asyncio.gather(*sample_coroutines)
                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File \"redacted/.venv/lib/python3.12/site-packages/inspect_ai/_eval/task/run.py\", line 428, in task_run_sample
    error = sample_error(ex)
            ^^^^^^^^^^^^^^^^
  File \"redacted/.venv/lib/python3.12/site-packages/inspect_ai/_eval/task/error.py\", line 22, in __call__
    raise ex
  File \"redacted/.venv/lib/python3.12/site-packages/inspect_ai/_eval/task/run.py\", line 420, in task_run_sample
    state = await plan(state, generate)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File \"redacted/.venv/lib/python3.12/site-packages/inspect_ai/solver/_plan.py\", line 106, in __call__
    state = await solver(state, generate)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File \"redacted/.venv/lib/python3.12/site-packages/inspect_ai/solver/_basic_agent.py\", line 177, in solve
    tool_results = await call_tools(state.output.message, state.tools)
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File \"redacted/.venv/lib/python3.12/site-packages/inspect_ai/model/_call_tools.py\", line 162, in call_tools
    results = await asyncio.gather(*tasks)
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File \"redacted/.venv/lib/python3.12/site-packages/inspect_ai/model/_call_tools.py\", line 82, in call_tool_task
    result = await call_tool(tdefs, message.text, call)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File \"redacted/.venv/lib/python3.12/site-packages/inspect_ai/model/_call_tools.py\", line 225, in call_tool
    result = await tool_def.tool(**arguments)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File \"redacted/.venv/lib/python3.12/site-packages/inspect_ai/tool/_tools/_execute.py\", line 84, in execute
    result = await sandbox().exec(
             ^^^^^^^^^^^^^^^^^^^^^
  File \"redacted/.venv/lib/python3.12/site-packages/aisitools/k8s_sandbox/sandbox_environment.py\", line 102, in exec
    return await self._pod.exec(cmd, input, cwd, timeout)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File \"redacted/.venv/lib/python3.12/site-packages/aisitools/k8s_sandbox/pod.py\", line 82, in exec
    result = await self._run_asynchronously(
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File \"redacted/.venv/lib/python3.12/site-packages/aisitools/k8s_sandbox/pod.py\", line 150, in _run_asynchronously
    return await loop.run_in_executor(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File \"/usr/lib/python3.12/concurrent/futures/thread.py\", line 58, in run
    result = self.fn(*self.args, **self.kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File \"redacted/.venv/lib/python3.12/site-packages/aisitools/k8s_sandbox/pod.py\", line 83, in <lambda>
    lambda: executor.exec(cmd, stdin, cwd, timeout)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File \"redacted/.venv/lib/python3.12/site-packages/aisitools/k8s_sandbox/pod.py\", line 228, in exec
    result = self._handle_shell_output(shell, timeout)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File \"redacted/.venv/lib/python3.12/site-packages/aisitools/k8s_sandbox/pod.py\", line 326, in _handle_shell_output
    result = stream_output()
             ^^^^^^^^^^^^^^^
  File \"redacted/.venv/lib/python3.12/site-packages/aisitools/k8s_sandbox/pod.py\", line 306, in stream_output
    if shell.peek_stdout():
       ^^^^^^^^^^^^^^^^^^^
  File \"redacted/.venv/lib/python3.12/site-packages/kubernetes/stream/ws_client.py\", line 124, in peek_stdout
    return self.peek_channel(STDOUT_CHANNEL, timeout=timeout)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File \"redacted/.venv/lib/python3.12/site-packages/kubernetes/stream/ws_client.py\", line 75, in peek_channel
    self.update(timeout=timeout)
  File \"redacted/.venv/lib/python3.12/site-packages/kubernetes/stream/ws_client.py\", line 197, in update
    op_code, frame = self.sock.recv_data_frame(True)
                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File \"redacted/.venv/lib/python3.12/site-packages/websocket/_core.py\", line 457, in recv_data_frame
    self.send_close()
  File \"redacted/.venv/lib/python3.12/site-packages/websocket/_core.py\", line 494, in send_close
    self.send(struct.pack(\"!H\", status) + reason, ABNF.OPCODE_CLOSE)
  File \"redacted/.venv/lib/python3.12/site-packages/websocket/_core.py\", line 297, in send
    return self.send_frame(frame)
           ^^^^^^^^^^^^^^^^^^^^^^
  File \"redacted/.venv/lib/python3.12/site-packages/websocket/_core.py\", line 337, in send_frame
    l = self._send(data)
        ^^^^^^^^^^^^^^^^
  File \"redacted/.venv/lib/python3.12/site-packages/websocket/_core.py\", line 559, in _send
    return send(self.sock, data)
           ^^^^^^^^^^^^^^^^^^^^^
  File \"redacted/.venv/lib/python3.12/site-packages/websocket/_socket.py\", line 179, in send
    return _send()
           ^^^^^^^
  File \"redacted/.venv/lib/python3.12/site-packages/websocket/_socket.py\", line 156, in _send
    return sock.send(data)
           ^^^^^^^^^^^^^^^
  File \"/usr/lib/python3.12/ssl.py\", line 1180, in send
    return self._sslobj.write(data)
           ^^^^^^^^^^^^^^^^^^^^^^^^
ssl.SSLEOFError: EOF occurred in violation of protocol (_ssl.c:2406)

I managed to reliably reproduce this issue when writing ~100MB to stdin on a WSClient which was exec'ing to a k8s pod.

But I don't think the occurrences of this in the wild are related to writing something that large - I don't think the model would be able to.


Not overly hopeful, but this might help (soon to be released) websocket-client/websocket-client#983

Found via this issue websocket-client/websocket-client#942


Another instance (26 Oct 2024)

SSLEOFError: EOF occurred in violation of protocol (_ssl.c:2406)
...
K8sError: Error during: Execute command in pod. {"pod": "agent-env-hxszbbql-default-0", redacted ", "cwd": "None", "timeout": "300"}

With timestamps

2024-10-26 21:03:17,961 - SANDBOX - K8S: Starting: Execute command in pod. {"pod": "agent-env-hxszbbql-default-0",  "cmd": "['python3']", ...
2024-10-26 21:05:05,035 - ERROR - K8S: Error during: Execute command in pod. {"cause": "EOF occurred in violation of protocol (_ssl.c:2406)", "pod": "agent-env-hxszbbql-default-0", ...

Cluster events (note node not ready)

2024-10-26T20:42:30Z   Normal    agent-env-hxszbbql-default                    SuccessfulCreate        create Pod agent-env-hxszbbql-default-0 in StatefulSet agent-env-hxszbbql-default successful
2024-10-26T20:42:30Z   Normal    agent-env-hxszbbql-default-0                  Scheduled               Successfully assigned agent/agent-env-hxszbbql-default-0 to ip-192-168-96-203.eu-west-2.compute.internal
2024-10-26T20:42:31Z   Normal    agent-env-hxszbbql-default-0                  Started                 Started container resolve-coredns-ip
2024-10-26T20:42:31Z   Normal    agent-env-hxszbbql-default-0                  Pulled                  Container image "toolbelt/dig:2024-09-23" already present on machine
2024-10-26T20:42:31Z   Normal    agent-env-hxszbbql-default-0                  Created                 Created container resolve-coredns-ip
2024-10-26T21:01:40Z   Warning   agent-env-hxszbbql-default-0                  NodeNotReady            Node is not ready
2024-10-26T21:04:54Z   Normal    agent-env-hxszbbql-default-0                  Started                 Started container default
2024-10-26T21:04:54Z   Normal    agent-env-hxszbbql-default-0                  Pulled                  Container image "redacted" already present on machine
2024-10-26T21:04:54Z   Normal    agent-env-hxszbbql-default-0                  Created                 Created container default
2024-10-26T21:05:00Z   Normal    agent-env-hxszbbql-default-0                  TaintManagerEviction    Cancelling deletion of Pod agent/agent-env-hxszbbql-default-0
2024-10-26T21:05:09Z   Normal    agent-env-hxszbbql-default-0                  Killing                 Stopping container default


Another instance (11 Nov 2024)

│ /usr/lib/python3.12/ssl.py:1180 in send                                                                              │
│                                                                                                                      │
│   1177 │   │   │   │   raise ValueError(                                                                             │
│   1178 │   │   │   │   │   "non-zero flags not allowed in calls to send() on %s" %                                   │
│   1179 │   │   │   │   │   self.__class__)                                                                           │
│ > 1180 │   │   │   return self._sslobj.write(data)                                                                   │
│   1181 │   │   else:                                                                                                 │
│   1182 │   │   │   return super().send(data, flags)                                                                  │
│   1183                                                                                                               │
└──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘
SSLEOFError: EOF occurred in violation of protocol (_ssl.c:2406)
...
K8sError: Error during: Execute command in pod. {"pod": "agent-env-hjyrlaug-attacker-0", ...
2024-11-02 11:26:38,921 - ERROR - K8S: Error during: Execute command in pod. {"cause": "EOF occurred in violation of protocol (_ssl.c:2406)", "pod": "agent-env-hjyrlaug-attacker-0", ...
2024-11-02T11:18:04Z   Warning   agent-env-hjyrlaug-attacker-0                 FailedScheduling        0/86 nodes are available: 2 node(s) had untolerated taint {CriticalAddonsOnly: true}, 2 node(s) had untolerated taint {aisi.gov.uk/dev: true}, 2 node(s) had untolerated taint {aisi.gov.uk/devpods: true}, 5 node(s) had untolerated taint {node.kubernetes.io/unreachable: }, 75 Insufficient memory. preemption: 0/86 nodes are available: 11 Preemption is not helpful for scheduling, 75 No preemption victims found for incoming pod.
2024-11-02T11:18:04Z   Normal    agent-env-hjyrlaug-attacker                   SuccessfulCreate        create Pod agent-env-hjyrlaug-attacker-0 in StatefulSet agent-env-hjyrlaug-attacker successful
2024-11-02T11:18:05Z   Normal    agent-env-hjyrlaug-attacker-0                 Scheduled               Successfully assigned agent/agent-env-hjyrlaug-attacker-0 to ip-192-168-107-140.eu-west-2.compute.internal
2024-11-02T11:18:06Z   Normal    agent-env-hjyrlaug-attacker-0                 Pulled                  Container image "toolbelt/dig:2024-09-23" already present on machine
2024-11-02T11:18:06Z   Normal    agent-env-hjyrlaug-attacker-0                 Created                 Created container resolve-coredns-ip
2024-11-02T11:18:06Z   Normal    agent-env-hjyrlaug-attacker-0                 Started                 Started container resolve-coredns-ip
2024-11-02T11:25:11Z   Warning   agent-env-hjyrlaug-attacker-0                 NodeNotReady            Node is not ready
2024-11-02T11:26:38Z   Normal    agent-env-hjyrlaug-attacker-0                 Pulled                  Container image "redacted" already present on machine
2024-11-02T11:26:38Z   Normal    agent-env-hjyrlaug-attacker-0                 Created                 Created container attacker
2024-11-02T11:26:38Z   Normal    agent-env-hjyrlaug-attacker-0                 Started                 Started container attacker
2024-11-02T11:26:40Z   Normal    agent-env-hjyrlaug-attacker-0                 Killing                 Stopping container attacker
2024-11-02T11:26:40Z   Normal    agent-env-hjyrlaug-attacker-0                 TaintManagerEviction    Cancelling deletion of Pod agent/agent-env-hjyrlaug-attacker-0
@craigwalton-dsit craigwalton-dsit added the 3rd party errors Errors observed from 3rd party code such as websocket or SSL errors label Dec 20, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3rd party errors Errors observed from 3rd party code such as websocket or SSL errors
Projects
None yet
Development

No branches or pull requests

1 participant