Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Simulations in Summit #4

Open
YC-Liang opened this issue Feb 20, 2023 · 17 comments
Open

Simulations in Summit #4

YC-Liang opened this issue Feb 20, 2023 · 17 comments

Comments

@YC-Liang
Copy link

Dear author(s),

I had some issues when running the basic Summit simulator from the summit_driverhard folder.
After loading Carla in port 23000 and run the command, python simulator.py, I got the error,
Pyro4.errors.CommunicationError: cannot connect to ('localhost', 23010): [Errno 111] Connection refused

So I started another Carla window with the port 23010 in a different terminal, the error disappears, but the car is not moving, there is no other crowds in the environment, the other window (port 23010) is empty, and nothing is reported in the terminal.

Screenshot from 2023-02-20 17-31-41

Is any of the above steps wrong? If you could kindly give some steps in running the simulator, that would awesome.

I am on Ubuntu 22.04, with a 4080 Nvidia GPU.

Thanks.

PS: I posted another issue in the Summit repo as I could not get the tutorial to work, not sure whether the issues are related.

@LeeYiyuan
Copy link
Contributor

LeeYiyuan commented Feb 20, 2023

In principle everything should run with python3 simulator.py. It should internally launch an instance of the SUMMIT simulator, along with the gamma_crowd.py crowd controller, as well as an ego-agent that runs the handcrafted DESPOT strategy with macro-action length 1. In particular, you won't need to run the scripts in the SUMMIT repo separately. The exact details may be found in the script.

By default, the SUMMIT simulator is hidden in-memory and not shown on-screen. Could you try running python3 simulator.py --debug --visualize? This should enable the visualization of the spawned simulator, and print out a bunch of debug info to better pinpoint what went wrong.

@LeeYiyuan
Copy link
Contributor

LeeYiyuan commented Feb 20, 2023

I suspect that simulator.py is not launching the SUMMIT simulator and the gamma_crowd.py script properly. These are on lines 344 and 475 respectively.

I had hardcoded the paths to the SUMMIT binary and the crowd script, which might have caused the error. Do you think you could debug near those lines and see if that is indeed the case?

@YC-Liang
Copy link
Author

I have changed the code to point to the summit address on my machine and the program gets stuck when it tries to launch the crowds. Specifically, the following errors are reported,

Launching environment process...
Launching simulator process...
    Delaying for simulator to start up...
    Creating client...
Resetting world...
    Spawning meshes...
    Spawning ego-agent...
        Launching controller...
    Launching GAMMA process...
        Delaying for crowd service to launch...
        Waiting for spawn target to be reached...
Traceback (most recent call last):
  File "/home/james/miniconda3/envs/magic2/lib/python3.8/site-packages/Pyro4/core.py", line 511, in connect_and_handshake
    sock = socketutil.createSocket(connect=connect_location,
  File "/home/james/miniconda3/envs/magic2/lib/python3.8/site-packages/Pyro4/socketutil.py", line 307, in createSocket
    sock.connect(connect)
ConnectionRefusedError: [Errno 111] Connection refused

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "simulator.py", line 628, in <module>
    sim.reset_world()
  File "simulator.py", line 503, in reset_world
    while not self.crowd_service.spawn_target_reached:
  File "/home/james/miniconda3/envs/magic2/lib/python3.8/site-packages/Pyro4/core.py", line 275, in __getattr__
    self._pyroGetMetadata()
  File "/home/james/miniconda3/envs/magic2/lib/python3.8/site-packages/Pyro4/core.py", line 615, in _pyroGetMetadata
    self.__pyroCreateConnection()
  File "/home/james/miniconda3/envs/magic2/lib/python3.8/site-packages/Pyro4/core.py", line 596, in __pyroCreateConnection
    connect_and_handshake(conn)
  File "/home/james/miniconda3/envs/magic2/lib/python3.8/site-packages/Pyro4/core.py", line 549, in connect_and_handshake
    raise ce
Pyro4.errors.CommunicationError: cannot connect to ('localhost', 23010): [Errno 111] Connection refused

Does using conda environment cause some networking issue? It seems the problem is at the following lines of code,

self.crowd_service = Pyro4.Proxy('PYRO:crowdservice.warehouse@localhost:{}'.format(self.pyro_port))
        debug_print('        Delaying for crowd service to launch...')
        time.sleep(3)
        debug_print('        Waiting for spawn target to be reached...')
        while not self.crowd_service.spawn_target_reached:
            time.sleep(0.2)

In particular, when using crowd_service, above errors appear. I tried to adjust the sleep time to 10s, but that didn't help.

@LeeYiyuan
Copy link
Contributor

LeeYiyuan commented Feb 21, 2023

When using the --visualize flag, does the simulator show up on the screen? If so, the crowd should spawn once Delaying for crowd service to launch... is printed, and you will be able to see it visually...

...if it does not appear on-screen, I think you may need to modify SUMMIT_SCRIPTS_PATH too, defined on line 58 and used on line 478 to point to the gamma_crowd.py script. The error refers to the inability to access the agent states tracked by gamma_crowd.py, which likely means that gamma_crowd.py did not even launch successfully.

I believe there should be no issues with networking caused by conda -- SUMMIT/CARLA relies on TCP to receive instructions, with spawning the maps as one such instruction. As long as the maps appear, the networking should be good.

@YC-Liang
Copy link
Author

The crowd service did not launch, as the picture I put at the very top of this thread shows. The agent is not moving either. I believe I have set the SUMMIT_SCRIPTS_PATH to the correct one as the images and meshes are loaded correctly, and gamma_crowd.py is in the same directory.

@LeeYiyuan
Copy link
Contributor

LeeYiyuan commented Feb 21, 2023

Okay, I think I know the issue. Your SUMMIT_SCRIPTS_PATH is definitely correct, as you pointed out from the fact that the meshes load correctly.

I had used a custom version of gamma_crowd.py with flags that don't exist in the official repostiroy, but is assumed to exist in simulator.py.

In simulator.py, delete lines 483 (--no-respawn) and 488 (--aim-center). This should let gamma_crowd.py launch correctly.

But you need a small change to gamma_crowd.py, since simulator.py assumes that gamma_crowd.py doesn't go and delete out-of-bound agents. In gamma_crowd.py, delete the lines:

            (car_agents, bike_agents, pedestrian_agents, destroy_list, statistics) = \
                    do_death(c, car_agents, bike_agents, pedestrian_agents, destroy_list, statistics)

which should be at or near line 1463

I forgot how the --aim-center flag was implemented. It is not necessary to run, but was meant to make sure that vehicles that spawned faced the center of the red bounding box you see in the simulator, so that the problem is actually hard (otherwise cars would just be moving away from the ego-agent from the start which won't be very interesting). You can probably restore it in gamma_crowd.py's do_spawn method via rejection sampling, by drawing a vector from the vehicle's position to the center of the bounding box, and checking if the dot product of the vehicle's heading against that vector is positive.

@YC-Liang
Copy link
Author

Yea I got the crowd to launch now, however, the next line throw another error,

File "simulator.py", line 629, in <module>
    sim.reset_world()
  File "simulator.py", line 503, in reset_world
    while not self.crowd_service.spawn_target_reached:
  File "/home/james/miniconda3/envs/magic2/lib/python3.8/site-packages/Pyro4/core.py", line 280, in __getattr__
    raise AttributeError("remote object '%s' has no exposed attribute or method '%s'" % (self._pyroUri, name))
AttributeError: remote object 'PYRO:crowdservice.warehouse@localhost:23010' has no exposed attribute or method 'spawn_target_reached'

Is the spawn_target_reached method also something from the modified gamma_crowd?

@LeeYiyuan
Copy link
Contributor

LeeYiyuan commented Feb 21, 2023

Yup... 🤦

For the moment, it would be easiest to replace that line (while not self.crowd_service.spawn_target_reached:) with:

time.sleep(5)
while self.crowd_service.spawn_car:
    time.sleep(0.2)

@LeeYiyuan
Copy link
Contributor

LeeYiyuan commented Feb 21, 2023

If it runs without error but crashes afterward with the error message ERROR: Incorrect number of exo agents!, try changing SPAWN_DESTROY_RATE_MAX to 3 (gamma_crowd.py: 41) and SPAWN_DESTROY_REPETITIONS to 1 (gamma_crowd.py:45).

@YC-Liang
Copy link
Author

Everything works smoother now, thanks.
One more issue, when visualising, I got the warning msg shown below,
Screenshot from 2023-02-21 15-58-19
I can click ok and the simulator can start fine, but when visualisation is turned off, the warning msg prevents the simulator from launching.
Also, sometimes even when I clicked ok, the simulator window still shuts down straight away.

@YC-Liang
Copy link
Author

YC-Liang commented Feb 21, 2023

Also, sometimes even when I clicked ok, the simulator window still shuts down straight away.

In this case, i need to restart the system and everything will work., not sure what causes it. Potentially some processes are not killed in the background?

@LeeYiyuan
Copy link
Contributor

LeeYiyuan commented Feb 21, 2023

We bumped UE to 4.26, which seems to have deprecated -opengl, so you can remove that from simulator.py:349.

If you want to run it headless (i.e. no visualization), you would now need to add -RenderOffscreen.

I really need to update the codes here, as well as the SUMMIT docs, when I get the time for it...

@YC-Liang
Copy link
Author

Thank you so much for answering patiently, I believe the core issues have been solved!

@LeeYiyuan
Copy link
Contributor

Thank you! We've uncovered many existing issues related to getting the codes back up and running thanks to your help, really :)

Leaving this issue open until the docs/codes have been updated...

@YC-Liang
Copy link
Author

Hi, sorry for bothering again! When using the magic model in the summit simulator, an error related to use CUDA in subprocesses is raised as below,

Traceback (most recent call last):
  File "/home/james/miniconda3/envs/magic2/lib/python3.8/multiprocessing/process.py", line 315, in _bootstrap
    self.run()
  File "/home/james/miniconda3/envs/magic2/lib/python3.8/multiprocessing/process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
  File "simulator.py", line 122, in environment_process
    gen_model = MAGICGenNet_DriveHard(MACRO_LENGTH, True, True).float().to(device)
  File "/home/james/miniconda3/envs/magic2/lib/python3.8/site-packages/torch/nn/modules/module.py", line 989, in to
    return self._apply(convert)
  File "/home/james/miniconda3/envs/magic2/lib/python3.8/site-packages/torch/nn/modules/module.py", line 641, in _apply
    module._apply(fn)
  File "/home/james/miniconda3/envs/magic2/lib/python3.8/site-packages/torch/nn/modules/module.py", line 664, in _apply
    param_applied = fn(param)
  File "/home/james/miniconda3/envs/magic2/lib/python3.8/site-packages/torch/nn/modules/module.py", line 987, in convert
    return t.to(device, dtype if t.is_floating_point() or t.is_complex() else None, non_blocking)
  File "/home/james/miniconda3/envs/magic2/lib/python3.8/site-packages/torch/cuda/__init__.py", line 217, in _lazy_init
    raise RuntimeError(
RuntimeError: Cannot re-initialize CUDA in forked subprocess. To use CUDA with multiprocessing, you must use the 'spawn' start method

I tried to change the start method to 'spawn', but that didn't solve the issue, instead I got the following error,

Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/home/james/miniconda3/envs/magic2/lib/python3.8/multiprocessing/spawn.py", line 116, in spawn_main
    exitcode = _main(fd, parent_sentinel)
  File "/home/james/miniconda3/envs/magic2/lib/python3.8/multiprocessing/spawn.py", line 125, in _main
    prepare(preparation_data)
  File "/home/james/miniconda3/envs/magic2/lib/python3.8/multiprocessing/spawn.py", line 236, in prepare
    _fixup_main_from_path(data['init_main_from_path'])
  File "/home/james/miniconda3/envs/magic2/lib/python3.8/multiprocessing/spawn.py", line 287, in _fixup_main_from_path
    main_content = runpy.run_path(main_path,
  File "/home/james/miniconda3/envs/magic2/lib/python3.8/runpy.py", line 265, in run_path
    return _run_module_code(code, init_globals, run_name,
  File "/home/james/miniconda3/envs/magic2/lib/python3.8/runpy.py", line 97, in _run_module_code
    _run_code(code, mod_globals, init_globals,
  File "/home/james/miniconda3/envs/magic2/lib/python3.8/runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "/home/james/Documents/Uni/SCNC3021/magic-playground/python/summit_drivehard/simulator.py", line 16, in <module>
    from controller import Controller
  File "/home/james/Documents/Uni/SCNC3021/magic-playground/python/summit_drivehard/controller.py", line 12, in <module>
    import carla
  File "<frozen importlib._bootstrap>", line 991, in _find_and_load
  File "<frozen importlib._bootstrap>", line 975, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 655, in _load_unlocked
  File "<frozen importlib._bootstrap>", line 618, in _load_backward_compatible
  File "<frozen zipimport>", line 259, in load_module
  File "/home/james/summit/PythonAPI/carla/dist/carla-0.9.8-py3.8-linux-x86_64.egg/carla/__init__.py", line 8, in <module>
  File "<frozen importlib._bootstrap>", line 991, in _find_and_load
  File "<frozen importlib._bootstrap>", line 975, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 655, in _load_unlocked
  File "<frozen importlib._bootstrap>", line 618, in _load_backward_compatible
  File "<frozen zipimport>", line 259, in load_module
  File "/home/james/summit/PythonAPI/carla/dist/carla-0.9.8-py3.8-linux-x86_64.egg/carla/libcarla.py", line 7, in <module>
  File "/home/james/summit/PythonAPI/carla/dist/carla-0.9.8-py3.8-linux-x86_64.egg/carla/libcarla.py", line 3, in __bootstrap__
  File "/home/james/miniconda3/envs/magic2/lib/python3.8/site-packages/pkg_resources/__init__.py", line 3260, in <module>
    def _initialize_master_working_set():
  File "/home/james/miniconda3/envs/magic2/lib/python3.8/site-packages/pkg_resources/__init__.py", line 3234, in _call_aside
    f(*args, **kwargs)
  File "/home/james/miniconda3/envs/magic2/lib/python3.8/site-packages/pkg_resources/__init__.py", line 3272, in _initialize_master_working_set
    working_set = WorkingSet._build_master()
  File "/home/james/miniconda3/envs/magic2/lib/python3.8/site-packages/pkg_resources/__init__.py", line 572, in _build_master
    ws = cls()
  File "/home/james/miniconda3/envs/magic2/lib/python3.8/site-packages/pkg_resources/__init__.py", line 565, in __init__
    self.add_entry(entry)
  File "/home/james/miniconda3/envs/magic2/lib/python3.8/site-packages/pkg_resources/__init__.py", line 619, in add_entry
    self.entry_keys.setdefault(entry, [])
TypeError: unhashable type: 'list'

@LeeYiyuan
Copy link
Contributor

LeeYiyuan commented Feb 23, 2023

I can't seem to reproduce the error. The error is occuring because CUDA was used somewhere before environment_process started running. In principle, the only time anything from torch should be used is in in the environment_process thread, to prevent this error.

@LeeYiyuan
Copy link
Contributor

... though my SUMMIT hangs whenever I use the GPU to call the generator.

I suspect that since UE4.26 dropped support for OpenGL, we now use Vulkan, and that somehow shares some memory with Torch/CUDA. This could explain why SUMMIT hangs for me and why your machine complains instead about CUDA being already initialized.

I've pushed a commit that integrates the changes in this thread. To fix the CUDA issue, I've switched the generator to be invoked on the CPU (simulator.py:116: use torch.device("cpu") instead).

LeeYiyuan added a commit that referenced this issue Feb 23, 2023
Integrates the changes discussed in #4
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants