Opening the shared memory failed, os error 24 #54

meua · 2023-04-24T06:14:17Z

Describe the bug

frame:  (1080, 1920, 4)
img:  (1080, 1920, 3)
output:  [[ 3.6737819  3.6716187  3.6644292 ... 12.203822  12.108404  12.077164 ]
 [ 3.6688473  3.6667528  3.6598594 ... 12.227848  12.146185  12.11989  ]
 [ 3.6578336  3.655894   3.6496665 ... 12.287019  12.240098  12.226307 ]
 ...
 [53.57742   53.672173  53.898468  ... 83.08049   83.24545   83.30725  ]
 [53.411537  53.528187  53.81199   ... 83.325745  83.47475   83.5309   ]
 [53.344387  53.46946   53.775078  ... 83.41604   83.56092   83.615654 ]]
Traceback (most recent call last):
  File "<string>", line 1, in <module>
RuntimeError: Dora Runtime raised an error.

Caused by:
   0: main task failed
   1: received error event: failed to map shared memory input

      Caused by:
          Opening the shared memory failed, os error 24

      Location:
          apis/rust/node/src/event.rs:64:14

Location:
    binaries/runtime/src/lib.rs:316:34
(dora3.7) jarvis@jia:~/coding/dora_home/dora-drives$ Traceback (most recent call last):
  File "<string>", line 1, in <module>
RuntimeError: Dora Runtime raised an error.

Caused by:
   0: main task failed
   1: failed to send node output
   2: failed to allocate shared memory
   3: Creating the shared memory failed, os error 24

Location:
    apis/rust/node/src/node.rs:169:22

To Reproduce
Steps to reproduce the behavior:

Dora start daemon: dora up
Start a new dataflow: dora start graphs/tutorials/webcam_single_dpt_frame.yaml --attach --hot-reload --name webcam-midas

Expected behavior
A clear and concise description of what you expected to happen.

Screenshots or Video

Environments (please complete the following information):

System info: Linux jia 5.15.0-69-generic #76~20.04.1-Ubuntu SMP Mon Mar 20 15:54:19 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
Dora version: 0.2.2

The text was updated successfully, but these errors were encountered:

haixuanTao · 2023-04-24T06:51:30Z

Would be great if you could share the code as well. Thanks :)

Specifically: graphs/tutorials/webcam_single_dpt_frame.yaml

meua · 2023-04-26T08:04:03Z

Would be great if you could share the code as well. Thanks :)

Specifically: graphs/tutorials/webcam_single_dpt_frame.yaml

Ok, I submitted the related PR

phil-opp · 2023-04-26T11:11:22Z

Thanks for reporting!

Ok, I submitted the related PR

You're talking about #55, right?

Regarding the error:

Did you see any warnings in the logs? There are some situations where we will unmap shared memory regions after some timeout if the receiver did not react as expected. If this happened, you should see a warning in the log output. (@haixuanTao Do we have the tracing to stdout enabled for Python by default? )

Given that the shared memory allocation failed too, it is more likely that the issue is the number of open files. There is typically a limit on the number of open file handles, which you can query using ulimit -n. We're currently allocating each message as a separate shared memory region (which requires a file handle), so it's easy to exhaust this limit if you have many messages in transit. To work around this, you can temporarily double the file limit by running ulimit -n 2048, larger values are possible too.

To fix this properly, we should reduce the number of allocated shared memory regions and reuse the same region for mulitple messages. I opened dora-rs/dora#268 for that.

haixuanTao · 2023-04-26T11:41:28Z

@phil-opp , so trace goes to stdout with export RUST_LOG=trace, the only case they don't is if we also activate DORA_JAEGER_TRACING

phil-opp · 2023-04-26T11:51:29Z

Ok good. And the default log level is warn, right? Then it sounds like the file handle number is the issue.

haixuanTao · 2023-04-26T11:57:33Z

If the environment variable is empty or not set, or if it contains only invalid directives, a default directive enabling the ERROR level is added.

The default is the same as Tokio tracing default which is error. We can change it to warn.

meua · 2023-04-26T11:59:26Z

The original reason for triggering the #54 problem is that the bytes data (numpy array) sent by send_output is relatively large. Now I have replaced the sent content according to haixuanTao's opinion, So the code for this problem does not appear now. To reproduce this problem, the code in dora-drives/operators/single_dpt_op.py needs to be modified as follows:

                prediction = torch.nn.functional.interpolate(
                    prediction.unsqueeze(1),
                    size=img.shape[:2],
                    mode="bicubic",
                    align_corners=False,
                ).squeeze()
                
                depth_output = prediction.cpu().numpy()
                print("depth_output: ", depth_output)
                send_output("depth_frame", depth_output.tobytes(), dora_input["metadata"])

The content of depth_output is relatively large, which is more likely to trigger this problem.

phil-opp · 2023-04-26T12:22:22Z

The default is the same as Tokio tracing default which is error. We can change it to warn.

This would be a good idea in my opinion. We're using warnings in dora to log abnormal events that are not critical yet, but should still be observed by users.

phil-opp · 2023-04-26T12:25:17Z

@meua Thanks a lot for the info!

phil-opp · 2023-06-21T14:16:53Z

What's the status of this? Can we still reproduce the "failed to map shared memory input" error with the latest version?

meua · 2023-06-27T02:50:05Z

What's the status of this? Can we still reproduce the "failed to map shared memory input" error with the latest version?

I don't have time to test it now, I will verify it later when I have a chance.

haixuanTao mentioned this issue Apr 26, 2023

Filter default log level at warn for tokio::tracing dora-rs/dora#269

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Opening the shared memory failed, os error 24 #54

Opening the shared memory failed, os error 24 #54

meua commented Apr 24, 2023

haixuanTao commented Apr 24, 2023

meua commented Apr 26, 2023

phil-opp commented Apr 26, 2023

haixuanTao commented Apr 26, 2023

phil-opp commented Apr 26, 2023

haixuanTao commented Apr 26, 2023

meua commented Apr 26, 2023 •

edited

Loading

phil-opp commented Apr 26, 2023

phil-opp commented Apr 26, 2023

phil-opp commented Jun 21, 2023

meua commented Jun 27, 2023

Opening the shared memory failed, os error 24 #54

Opening the shared memory failed, os error 24 #54

Comments

meua commented Apr 24, 2023

haixuanTao commented Apr 24, 2023

meua commented Apr 26, 2023

phil-opp commented Apr 26, 2023

haixuanTao commented Apr 26, 2023

phil-opp commented Apr 26, 2023

haixuanTao commented Apr 26, 2023

meua commented Apr 26, 2023 • edited Loading

phil-opp commented Apr 26, 2023

phil-opp commented Apr 26, 2023

phil-opp commented Jun 21, 2023

meua commented Jun 27, 2023

meua commented Apr 26, 2023 •

edited

Loading