Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] slirp4netns, binding to interface (wlan0), network not available #65

Closed
xiota opened this issue Jul 27, 2023 · 19 comments
Closed

Comments

@xiota
Copy link

xiota commented Jul 27, 2023

Output of bubblejail --version

AUR-git 0.8.0.r3.gc38a98f

Your distro name and version

Arch

Description

Frequently, when starting an instance with network limited to a specific interface (wlan0) using slirp4netns, internet is not available. Internet usually works after shutting down the program and restarting bubblejail. (Possibly some race condition causing the network interface to be unavailable during the first run.)

Would be nice if bubblejail quit with notification when it is unable to establish a connection to the interfaces through slirp4netns. This way, the user can fix the problem and try again without waiting for the program to fully load.

@igo95862
Copy link
Owner

Hello @xiota

Would be nice if bubblejail quit with notification when it is unable to establish a connection to the interfaces through slirp4netns.

Definitely. Currently bubblejail is not actually using the --ready-fd option that slirp4netns provides. I believe slirp4netns should crash if the specified interface is not available. Could you verify that?

By the way how well does the interface binding work for slirp4netns service? I never actually verified its function.

@xiota
Copy link
Author

xiota commented Jul 27, 2023

I don't see any "crash". If I kill the network interface first and run bubblejail in terminal, I see the following:

WARNING: Support for --outbound-addr is experimental
outbound-addr has to be valid ipv4 address or interface name.[GFX1-]: glxtest: cannot access /sys/bus/pci

Then the application continues launching without internet access. (What I would like is to not launch the application, but send a notification with the reason for the failure. This could be behind an option if it's expected some people would want the application to continue running without internet access. In my case, it's a web browser that's useless without internet and slow to start/shutdown/restart.)

When slirp4netns is working, it seems to do what it's supposed to do. The app is unable to access other network interfaces.

@igo95862
Copy link
Owner

It does exit with a non-zero exit code if a not a valid interface was passed. Therefore --ready-fd would never be written to.

So it only a matter of plumbing the --ready-fd to the subprocess.

@igo95862
Copy link
Owner

Should be fixed with 2974128. @xiota can you give it a try?

@xiota
Copy link
Author

xiota commented Jul 30, 2023

This is great. Works as expected.

When the network interface is available, the program runs with internet access. When the interface is not available and run from a shortcut, I get a notification. Otherwise, I see an error message in the terminal.

@xiota xiota closed this as completed Jul 30, 2023
@xiota
Copy link
Author

xiota commented Aug 1, 2023

@igo95862 Having used this for a few days, I've noticed a couple issues:

  • When a new network interface is brought up (available and active according to network manager), slirp4netns isn't available on the first run. After waiting a second and trying again, it works. I believe this is the initial problem that prompted me to request this feature.

  • The notification is permanent. It does not time out and dismiss itself like standard notifications. The cause seems to be urgency is set to critical.

    try:
    subprocess_run(
    (
    'notify-send',
    '--urgency', 'critical',
    '--icon', 'bubblejail-config',
    f"Failed to run instance: {instance_name}",
    f"Exception: {format_exc(0)}"
    )
    )
    except FileNotFoundError:
    # Make notify-send optional
    ...

@igo95862
Copy link
Owner

igo95862 commented Aug 1, 2023

When a new network interface is brought up (available and active according to network manager), slirp4netns isn't available on the first run. After waiting a second and trying again, it works. I believe this is the initial problem that prompted me to request this feature.

This probably has something to do with interface binding. Could you check the stderr of bubblejail? bubblejail and slirp4netns share the stderr so whatever slirp4netns prints should be visible as the bubblejail's output.

The notification is permanent. It does not time out and dismiss itself like standard notifications. The cause seems to be urgency is set to critical.

Sure. I will add this.

@xiota
Copy link
Author

xiota commented Aug 1, 2023

Looks like the main difference between the first run error and error when disconnected from internet is:

setns(CLONE_NEWNET): Operation not permitted
child failed(1)

Also, looks like some output string is missing a newline somewhere. (See the error when disconnected from internet.)

First run, network manager connected to internet. Error.
WARNING: Support for --outbound-addr is experimental
setns(CLONE_NEWNET): Operation not permitted
child failed(1)
Traceback (most recent call last):
  File "/usr/lib/bubblejail/python_packages/bubblejail/services.py", line 941, in post_init_hook
    await wait_for(slirp_ready_task, timeout=3)
  File "/usr/lib/python3.11/asyncio/tasks.py", line 479, in wait_for
    return fut.result()
           ^^^^^^^^^^^^
  File "/usr/lib/python3.11/asyncio/locks.py", line 213, in wait
    await fut
asyncio.exceptions.CancelledError

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/bin/bubblejail", line 32, in <module>
    bubblejail_main()
  File "/usr/lib/bubblejail/python_packages/bubblejail/bubblejail_cli.py", line 237, in bubblejail_main
    func(**args_dict)
  File "/usr/lib/bubblejail/python_packages/bubblejail/bubblejail_cli.py", line 105, in run_bjail
    async_run(
  File "/usr/lib/python3.11/asyncio/runners.py", line 190, in run
    return runner.run(main)
           ^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.11/asyncio/runners.py", line 118, in run
    return self._loop.run_until_complete(task)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.11/asyncio/base_events.py", line 653, in run_until_complete
    return future.result()
           ^^^^^^^^^^^^^^^
  File "/usr/lib/bubblejail/python_packages/bubblejail/bubblejail_instance.py", line 239, in async_run_init
    await runner.create_bubblewrap_subprocess(args_to_run)
  File "/usr/lib/bubblejail/python_packages/bubblejail/bubblejail_runner.py", line 368, in create_bubblewrap_subprocess
    await self.task_post_init
  File "/usr/lib/bubblejail/python_packages/bubblejail/bubblejail_runner.py", line 378, in _run_post_init_hooks
    await hook(sandboxed_pid)
  File "/usr/lib/bubblejail/python_packages/bubblejail/services.py", line 943, in post_init_hook
    raise BubblejailInitializationError(
bubblejail.exceptions.BubblejailInitializationError: Slirp4netns initialization failed
Second run, network manager connected to internet. No error.
WARNING: Support for --outbound-addr is experimental
sent tapfd=5 for tap0
received tapfd=5
Starting slirp
* MTU:             1500
* Network:         10.0.2.0
* Netmask:         255.255.255.0
* Gateway:         10.0.2.2
* DNS:             10.0.2.3
* DHCP begin:      10.0.2.15
* DHCP end:        10.0.2.30
* Recommended IP:  10.0.2.100
* Outbound IPv4:    192.168.1.114
WARNING: 127.0.0.1:* on the host is accessible as 10.0.2.2 (set --disable-host-loopback to prohibit connecting to 127.0.0.1:*)
[GFX1-]: glxtest: cannot access /sys/bus/pci
ATTENTION: default value of option mesa_glthread overridden by environment.
ATTENTION: default value of option mesa_glthread overridden by environment.
ATTENTION: default value of option mesa_glthread overridden by environment.
When network manager disconnected from internet.
WARNING: Support for --outbound-addr is experimental
outbound-addr has to be valid ipv4 address or interface name.Traceback (most recent call last):
  File "/usr/lib/bubblejail/python_packages/bubblejail/services.py", line 941, in post_init_hook
    await wait_for(slirp_ready_task, timeout=3)
  File "/usr/lib/python3.11/asyncio/tasks.py", line 479, in wait_for
    return fut.result()
           ^^^^^^^^^^^^
  File "/usr/lib/python3.11/asyncio/locks.py", line 213, in wait
    await fut
asyncio.exceptions.CancelledError

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/bin/bubblejail", line 32, in <module>
    bubblejail_main()
  File "/usr/lib/bubblejail/python_packages/bubblejail/bubblejail_cli.py", line 237, in bubblejail_main
    func(**args_dict)
  File "/usr/lib/bubblejail/python_packages/bubblejail/bubblejail_cli.py", line 105, in run_bjail
    async_run(
  File "/usr/lib/python3.11/asyncio/runners.py", line 190, in run
    return runner.run(main)
           ^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.11/asyncio/runners.py", line 118, in run
    return self._loop.run_until_complete(task)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.11/asyncio/base_events.py", line 653, in run_until_complete
    return future.result()
           ^^^^^^^^^^^^^^^
  File "/usr/lib/bubblejail/python_packages/bubblejail/bubblejail_instance.py", line 239, in async_run_init
    await runner.create_bubblewrap_subprocess(args_to_run)
  File "/usr/lib/bubblejail/python_packages/bubblejail/bubblejail_runner.py", line 368, in create_bubblewrap_subprocess
    await self.task_post_init
  File "/usr/lib/bubblejail/python_packages/bubblejail/bubblejail_runner.py", line 378, in _run_post_init_hooks
    await hook(sandboxed_pid)
  File "/usr/lib/bubblejail/python_packages/bubblejail/services.py", line 943, in post_init_hook
    raise BubblejailInitializationError(
bubblejail.exceptions.BubblejailInitializationError: Slirp4netns initialization failed

@igo95862
Copy link
Owner

igo95862 commented Aug 1, 2023

Looks like the main difference between the first run error and error when disconnected from internet is

This one is interesting because I also encounter it when running the development environment and running an instance twice. Not sure what is causing it.

Also, looks like some output string is missing a newline somewhere.

Pretty sure this is slirp4netns bug.

@xiota
Copy link
Author

xiota commented Sep 15, 2023

Reopening because the underlying issue (slirp4netns not binding the interface on first run) still has not been resolved.

Having some way to see the slirp4netns command would be helpful. (#75)

Error output from 0.8.1.r0.g806acc9 (basically same as before).
WARNING: Support for --outbound-addr is experimental
setns(CLONE_NEWNET): Operation not permitted
child failed(1)
Traceback (most recent call last):
  File "/usr/lib/bubblejail/python_packages/bubblejail/services.py", line 948, in post_init_hook
    await wait_for(slirp_ready_task, timeout=3)
  File "/usr/lib/python3.11/asyncio/tasks.py", line 479, in wait_for
    return fut.result()
           ^^^^^^^^^^^^
  File "/usr/lib/python3.11/asyncio/locks.py", line 213, in wait
    await fut
asyncio.exceptions.CancelledError

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/bin/bubblejail", line 32, in <module>
    bubblejail_main()
  File "/usr/lib/bubblejail/python_packages/bubblejail/bubblejail_cli.py", line 242, in bubblejail_main
    func(**args_dict)
  File "/usr/lib/bubblejail/python_packages/bubblejail/bubblejail_cli.py", line 111, in run_bjail
    async_run(
  File "/usr/lib/python3.11/asyncio/runners.py", line 190, in run
    return runner.run(main)
           ^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.11/asyncio/runners.py", line 118, in run
    return self._loop.run_until_complete(task)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.11/asyncio/base_events.py", line 653, in run_until_complete
    return future.result()
           ^^^^^^^^^^^^^^^
  File "/usr/lib/bubblejail/python_packages/bubblejail/bubblejail_instance.py", line 242, in async_run_init
    await runner.create_bubblewrap_subprocess(args_to_run)
  File "/usr/lib/bubblejail/python_packages/bubblejail/bubblejail_runner.py", line 382, in create_bubblewrap_subprocess
    await self.task_post_init
  File "/usr/lib/bubblejail/python_packages/bubblejail/bubblejail_runner.py", line 392, in _run_post_init_hooks
    await hook(sandboxed_pid)
  File "/usr/lib/bubblejail/python_packages/bubblejail/services.py", line 950, in post_init_hook
    raise BubblejailInitializationError(
bubblejail.exceptions.BubblejailInitializationError: Slirp4netns initialization failed

@xiota xiota changed the title slirp4netns, binding to interface (wlan0), network not available [Bug] slirp4netns, binding to interface (wlan0), network not available Sep 15, 2023
@igo95862
Copy link
Owner

I can't reproduce it.

Having some way to see the slirp4netns command would be helpful.

Slirp4netns is there. It shares stderr and stdout with bubblejail.

WARNING: Support for --outbound-addr is experimental
setns(CLONE_NEWNET): Operation not permitted
child failed(1)

This is the output of slirp4netns.

@xiota
Copy link
Author

xiota commented Sep 16, 2023

It is on first run only. Most reliable way to reproduce is after reboot.

In case this is a kernel issue, I am using 6.1.53. Not the kernel. Also occurs with 6.5.3.

Slirp4netns is there.

Not the output. The command and options. See #75

@igo95862
Copy link
Owner

I never encountered such issues. Are you using the SUID bwrap by any chance?

@igo95862
Copy link
Owner

Not the output. The command and options.

By the way since its Python you can just insert a print statement somewhere here: https://github.com/igo95862/bubblejail/blob/806acc9064067f3f1342e1b10b7ff9c90066d4b1/src/bubblejail/services.py#L928C1-L928C1

@xiota
Copy link
Author

xiota commented Sep 18, 2023

Are you using the SUID bwrap by any chance?

Not as far as I can tell.

$ pacman -Ss bubblewrap

extra/bubblewrap 0.8.0-1 [installed]
    Unprivileged sandboxing tool
extra/bubblewrap-suid 0.8.0-1
    Unprivileged sandboxing tool (setuid variant)
chaotic-aur/bubblejail 0.8.1-1
    Bubblewrap based sandboxing utility
chaotic-aur/bubblejail-git 0.8.1.r0.g806acc9-1 [installed]
    Bubblewrap based sandboxing utility

since its Python you can just insert a print statement

I will try.

I also tried disabling and enabling different services. When only common and slirp4netns are enabled, this issue doesn't occur. When all of the following are enabled, it does occur. common, direct_rendering, home_share, notify, pulse_audio, root_share, slirp4netns, x11.

When direct_rendering is disabled / enabled, the problem sometimes doesn't occur, but it's inconsistent.

@xiota
Copy link
Author

xiota commented Sep 19, 2023

Found this old issue that seems related. rootless-containers/slirp4netns#228

The last couple comments indicate that OP found a solution, but I have no idea what it was.

Also found this: rootless-containers/slirp4netns#311

You point to this as the solution:

target_namespace = UserNamespace.from_pid(pid)
parent_ns = target_namespace.get_parent_ns()
parent_ns.setns()

However, the equivalent lines in the current commit is different:

target_namespace = UserNamespace.from_pid(pid)
parent_ns = target_namespace.get_parent_ns()
parent_ns_fd = parent_ns._fd
parent_ns_path = f"/proc/{getpid()}/fd/{parent_ns_fd}"

parent_ns.setns() is missing. Is it important?

@igo95862
Copy link
Owner

igo95862 commented Sep 19, 2023

Both old and new code solves the problem of a user namespace that is not bounded to any process id. Slirp4netns needs to be in the namespace that owns the network namespace you want to use slirp4netns for.

Old code forked to a separate process that then switches to that unbounded user namespace and launches slirp4netns. However, it does not play that nice with async code.

I realized a clever way of passing the unbounded user namespace to slirp4netns. All it takes is opening that unbounded namespace and then calling slirp4netns using --userns-path= option with the /proc/{my pid}/fd/{opened unbounded namespace file descriptor number}. This is much cleaner solution than using subprocesses.

@igo95862
Copy link
Owner

igo95862 commented Sep 19, 2023

The setns(CLONE_NEWNET): Operation not permitted might be happening if the namespace hierarchy is different then what bubblejail expects.

Currently bubblejail expects something like this (visualized with lsns --tree):

├─4026532949         user        0       igo95862 
│ ├─4026532950       mnt        30  3060 igo95862 /usr/bin/python3 -IO /usr/lib/bubble
│ ├─4026532951       uts        30  3060 igo95862 /usr/bin/python3 -IO /usr/lib/bubble
│ ├─4026532952       ipc         4  3060 igo95862 /usr/bin/python3 -IO /usr/lib/bubble
│ ├─4026532953       pid        30  3060 igo95862 /usr/bin/python3 -IO /usr/lib/bubble
│ ├─4026532954       cgroup     30  3060 igo95862 /usr/bin/python3 -IO /usr/lib/bubble
│ └─4026532955       user        3  3060 igo95862 /usr/bin/python3 -IO /usr/lib/bubble

This hierarchy happens when bwrap --dev option is used. Without --dev unbounded namespace is not created. I actually created a new option for util-linux's nsenter command to handle this issue.

@xiota
Copy link
Author

xiota commented Oct 26, 2023

Fixed by 0.8.2 release (12307c4).

@xiota xiota closed this as completed Oct 26, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants