Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ROS2Connector Distribution Conflictions (Cryptic LLVM Out of Memory error) #60

Open
nathanredmond123 opened this issue Dec 17, 2024 · 0 comments

Comments

@nathanredmond123
Copy link

As one of the collaborators on the merged PR for ROS2Connector, I would like to leave this here for any users that have the potential to run into this issue.

I have been working closely with ROS2Connector for several months and have discovered a very cryptic issue when trying to instantiate ROS2Connector while on a network with ROS2 nodes of different distributions running. In my particular case, I am running ROS2 Humble in a container that pulls from dustynv/nano_llm:humble-r36.3.0, and we have other robots that are running ROS2 Jazzy on the same network.

This was the stack trace when trying to instantiate ROS2Connector while Jazzy nodes were on the network.

LLVM ERROR: out of memory
Fatal Python error: Aborted

Thread 0x0000fffdfffff100 (most recent call first):
  File "/usr/lib/python3.10/ssl.py", line 1161 in read
  File "/usr/lib/python3.10/ssl.py", line 1288 in recv
  File "/usr/local/lib/python3.10/dist-packages/websockets/sync/connection.py", line 538 in recv_events
  File "/usr/local/lib/python3.10/dist-packages/websockets/sync/server.py", line 171 in recv_events
  File "/usr/lib/python3.10/threading.py", line 953 in run
  File "/usr/lib/python3.10/threading.py", line 1016 in _bootstrap_inner
  File "/usr/lib/python3.10/threading.py", line 973 in _bootstrap

Thread 0x0000fffdff7ef100 (most recent call first):
  File "/usr/lib/python3.10/posixpath.py", line 431 in _joinrealpath
  File "/usr/lib/python3.10/posixpath.py", line 397 in realpath
  File "/usr/lib/python3.10/inspect.py", line 878 in getmodule
  File "/usr/lib/python3.10/inspect.py", line 952 in findsource
  File "/usr/lib/python3.10/inspect.py", line 1624 in getframeinfo
  File "/opt/ros/humble/local/lib/python3.10/dist-packages/rclpy/impl/rcutils_logger.py", line 47 in _find_caller
  File "/opt/ros/humble/local/lib/python3.10/dist-packages/rclpy/impl/rcutils_logger.py", line 59 in __new__
  File "/opt/ros/humble/local/lib/python3.10/dist-packages/rclpy/impl/rcutils_logger.py", line 287 in log
  File "/opt/ros/humble/local/lib/python3.10/dist-packages/rclpy/impl/rcutils_logger.py", line 329 in info
  File "/opt/NanoLLM/nano_llm/plugins/robotics/ros_connector.py", line 89 in __init__
  File "/opt/NanoLLM/nano_llm/plugins/dynamic_plugin.py", line 35 in __new__
  File "/opt/NanoLLM/nano_llm/agents/dynamic_agent.py", line 67 in add_plugin
  File "/usr/lib/python3.10/threading.py", line 953 in run
  File "/opt/NanoLLM/nano_llm/agents/dynamic_agent.py", line 60 in add_plugin
  File "/opt/NanoLLM/nano_llm/agents/dynamic_agent.py", line 423 in invoke_handler
  File "/opt/NanoLLM/nano_llm/agents/dynamic_agent.py", line 441 in on_message
  File "/opt/NanoLLM/nano_llm/agents/dynamic_agent.py", line 451 in on_websocket
  File "/opt/NanoLLM/nano_llm/web/server.py", line 193 in on_message
  File "/opt/NanoLLM/nano_llm/web/server.py", line 393 in websocket_listener
  File "/opt/NanoLLM/nano_llm/web/server.py", line 314 in on_websocket
  File "/usr/local/lib/python3.10/dist-packages/websockets/sync/server.py", line 499 in conn_handler
  File "/usr/lib/python3.10/threading.py", line 953 in run
  File "/usr/lib/python3.10/threading.py", line 1016 in _bootstrap_inner
  File "/usr/lib/python3.10/threading.py", line 973 in _bootstrap

Thread 0x0000fffe06a8f100 (most recent call first):
  File "/usr/lib/python3.10/selectors.py", line 416 in select
  File "/usr/lib/python3.10/socketserver.py", line 232 in serve_forever
  File "/usr/local/lib/python3.10/dist-packages/werkzeug/serving.py", line 810 in serve_forever
  File "/usr/local/lib/python3.10/dist-packages/werkzeug/serving.py", line 1116 in run_simple
  File "/usr/local/lib/python3.10/dist-packages/flask/app.py", line 625 in run
  File "/opt/NanoLLM/nano_llm/web/server.py", line 120 in <lambda>
  File "/usr/lib/python3.10/threading.py", line 953 in run
  File "/usr/lib/python3.10/threading.py", line 1016 in _bootstrap_inner
  File "/usr/lib/python3.10/threading.py", line 973 in _bootstrap

Thread 0x0000fffe0729f100 (most recent call first):
  File "/usr/lib/python3.10/selectors.py", line 469 in select
  File "/usr/local/lib/python3.10/dist-packages/websockets/sync/server.py", line 227 in serve_forever
  File "/opt/NanoLLM/nano_llm/web/server.py", line 119 in <lambda>
  File "/usr/lib/python3.10/threading.py", line 953 in run
  File "/usr/lib/python3.10/threading.py", line 1016 in _bootstrap_inner
  File "/usr/lib/python3.10/threading.py", line 973 in _bootstrap

Thread 0x0000fffe07aaf100 (most recent call first):
  File "/usr/local/lib/python3.10/dist-packages/psutil/__init__.py", line 1814 in cpu_percent
  File "/opt/NanoLLM/nano_llm/plugins/tegrastats.py", line 58 in read
  File "/opt/NanoLLM/nano_llm/plugins/tegrastats.py", line 96 in run
  File "/usr/lib/python3.10/threading.py", line 1016 in _bootstrap_inner
  File "/usr/lib/python3.10/threading.py", line 973 in _bootstrap

Thread 0x0000fffe99fbf100 (most recent call first):
  File "/usr/lib/python3.10/threading.py", line 324 in wait
  File "/usr/lib/python3.10/threading.py", line 607 in wait
  File "/usr/local/lib/python3.10/dist-packages/tqdm/_monitor.py", line 60 in run
  File "/usr/lib/python3.10/threading.py", line 1016 in _bootstrap_inner
  File "/usr/lib/python3.10/threading.py", line 973 in _bootstrap

Thread 0x0000ffffa2158020 (most recent call first):
  File "/usr/lib/python3.10/threading.py", line 1116 in _wait_for_tstate_lock
  File "/usr/lib/python3.10/threading.py", line 1096 in join
  File "/opt/NanoLLM/nano_llm/agents/dynamic_agent.py", line 504 in run
  File "/opt/NanoLLM/nano_llm/studio.py", line 17 in <module>
  File "/usr/lib/python3.10/runpy.py", line 86 in _run_code
  File "/usr/lib/python3.10/runpy.py", line 196 in _run_module_as_main

We do not know the exact cause of this issue and have struggled with network interference when running different ROS2 distributions before; in one case it resulted in RAM OOM issues. We are unsure as to why this is related to LLVM memory in this case though. Our initial guess is that node discovery across distributions leads to some form of memory leak. If anyone has run into similar issues and has discovered the root cause, please let us know.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant