Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OSError: [Errno 24] Too many open files #298

Open
zichunxx opened this issue Jun 5, 2024 · 5 comments
Open

OSError: [Errno 24] Too many open files #298

zichunxx opened this issue Jun 5, 2024 · 5 comments

Comments

@zichunxx
Copy link

zichunxx commented Jun 5, 2024

Hi!

I tried to store episodes with EpisodeBuffer and memmap=True to release RAM pressure but met this error:

File "/home/xzc/Documents/dreamerv3-torch/test/buffer.py", line 92, in test_max_buffer_szie
    rb.add(episode)
  File "/home/xzc/miniforge3/envs/dreamerv3/lib/python3.9/site-packages/sheeprl/data/buffers.py", line 968, in add
    self._save_episode(self._open_episodes[env])
  File "/home/xzc/miniforge3/envs/dreamerv3/lib/python3.9/site-packages/sheeprl/data/buffers.py", line 1024, in _save_episode
    episode_to_store[k] = MemmapArray(
  File "/home/xzc/miniforge3/envs/dreamerv3/lib/python3.9/site-packages/sheeprl/utils/memmap.py", line 67, in __init__
    self._array = np.memmap(
  File "/home/xzc/miniforge3/envs/dreamerv3/lib/python3.9/site-packages/numpy/core/memmap.py", line 267, in __new__
    mm = mmap.mmap(fid.fileno(), bytes, access=acc, offset=start)
OSError: [Errno 24] Too many open files
Exception ignored in: <function MemmapArray.__del__ at 0x7fc7cc249700>
Traceback (most recent call last):
  File "/home/xzc/miniforge3/envs/dreamerv3/lib/python3.9/site-packages/sheeprl/utils/memmap.py", line 220, in __del__
    if self._array is not None and self._has_ownership and getrefcount(self._file) <= 2:
  File "/home/xzc/miniforge3/envs/dreamerv3/lib/python3.9/site-packages/sheeprl/utils/memmap.py", line 236, in __getattr__
    raise AttributeError(f"'MemmapArray' object has no attribute '{attr}'")
AttributeError: 'MemmapArray' object has no attribute '_array'

For traceback, the following minimal code snippet can reproduce the error

import numpy as np
from sheeprl.data.buffers import EpisodeBuffer, ReplayBuffer
from sheeprl.utils.memmap import MemmapArray
import gymnasium as gym
from gymnasium.experimental.wrappers import PixelObservationV0

buf_size = 1000000
sl = 5
n_envs = 1
obs_keys = ("observation",)
rb = EpisodeBuffer(
    buf_size,
    sl,
    n_envs=n_envs,
    obs_keys=obs_keys,
    memmap=True,
    memmap_dir="", 
)
env = PixelObservationV0(gym.make("Walker2d-v4", render_mode="rgb_array", width=100, height=100), pixels_only=True)
keys = ("observation", "reward", "terminated", "truncated")
episode = {k: [] for k in keys}
steps = 0
obs, info = env.reset()
image_shape = obs.shape
while True:
    if steps % int(1000) == 0:
        print("current steps: {}".format(steps))
    observation, reward, terminated, truncated, info = env.step(env.action_space.sample())
    episode["observation"].append(observation)
    episode["reward"].append(reward)
    episode["terminated"].append(terminated)
    episode["truncated"].append(truncated)

    if terminated or truncated:
        episode_length = len(episode["observation"])
        episode["observation"] = np.array(episode["observation"]).reshape(episode_length, 1, *image_shape)
        episode["reward"] = np.array(episode["reward"]).reshape(episode_length, 1, -1)
        episode["terminated"] = np.array(episode["terminated"]).reshape(episode_length, 1, -1)
        episode["truncated"] = np.array(episode["truncated"]).reshape(episode_length, 1, -1)
        rb.add(episode)
        episode = {k: [] for k in keys}
        env.reset()

    steps += 1

, where memmap_dir should be given.

Could you please tell me what causes this problem?

Many thanks for considering my request.

Update:

I found this problem seems to be triggered by saving too many episodes on disk. (Please correct me if I'm wrong)

I tried with EpisodeBuffer because image observation almost consumes all RAM (64GB) during training, especially with frame stack. I want to complete this training process without upgrading the hardware. So I want to relieve the pressure on RAM with memmap=True but encounter the above problem. Any advice on this problem?

Thanks in advance.

@belerico
Copy link
Member

belerico commented Jun 6, 2024

Hi @zichunxx, I will have a look in the next few days after some deadlines.
Thank you

@belerico
Copy link
Member

belerico commented Jun 6, 2024

Have you tried with another buffer, like the standard ReplayBuffer or the SequentialReplayBuffer? Does it give you the same error?

@zichunxx
Copy link
Author

zichunxx commented Jun 8, 2024

Hi @zichunxx, I will have a look in the next few days after some deadlines. Thank you

No problem! I will try to fix it before you are done with your deadline.

Have you tried with another buffer, like the standard ReplayBuffer or the SequentialReplayBuffer? Does it give you the same error?

I have tried with ReplayBuffer and there is no OSError. The above error seems to be triggered by too many .memmap files generated on disk.

@belerico
Copy link
Member

Hi @zichunxx, I tried yesterday on my machine and reached more than 200k steps without errors: how many steps can you print before the error is raised?
PS I had to stop the experiment because I was running out of space on hdisk

@zichunxx
Copy link
Author

Hi! The above error is triggered with 5000 steps and a buffer size 4990. Besides, I found this error only occurred when I ran the above program in the system terminal with conda env activated. If I tried this in the VSCode terminal, this error would not happen in 5000 steps, which bothered me a lot.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants