Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] Application still run after bubblejail fails/exits #76

Closed
xiota opened this issue Sep 15, 2023 · 18 comments
Closed

[Bug] Application still run after bubblejail fails/exits #76

xiota opened this issue Sep 15, 2023 · 18 comments

Comments

@xiota
Copy link

xiota commented Sep 15, 2023

Output of bubblejail --version

AUR-git 0.8.1.r0.g806acc9

Your distro name and version

Arch Linux

Description

After failure, the application associated with the instance is still run. When run from a terminal, the user is returned to the command line after the error messages are printed. bubblejail execution appears to have ended, and I don't see anything indicating that the application would still run.

Expected behavior is to not execute the application because setting up the sandbox had already failed.

To reproduce, enable slirp4netns, which currently fails on first run. (#65)

Here are the error messages from a recent run.
WARNING: Support for --outbound-addr is experimental
setns(CLONE_NEWNET): Operation not permitted
child failed(1)
Traceback (most recent call last):
  File "/usr/lib/bubblejail/python_packages/bubblejail/services.py", line 948, in post_init_hook
    await wait_for(slirp_ready_task, timeout=3)
  File "/usr/lib/python3.11/asyncio/tasks.py", line 479, in wait_for
    return fut.result()
           ^^^^^^^^^^^^
  File "/usr/lib/python3.11/asyncio/locks.py", line 213, in wait
    await fut
asyncio.exceptions.CancelledError

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/bin/bubblejail", line 32, in <module>
    bubblejail_main()
  File "/usr/lib/bubblejail/python_packages/bubblejail/bubblejail_cli.py", line 242, in bubblejail_main
    func(**args_dict)
  File "/usr/lib/bubblejail/python_packages/bubblejail/bubblejail_cli.py", line 111, in run_bjail
    async_run(
  File "/usr/lib/python3.11/asyncio/runners.py", line 190, in run
    return runner.run(main)
           ^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.11/asyncio/runners.py", line 118, in run
    return self._loop.run_until_complete(task)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.11/asyncio/base_events.py", line 653, in run_until_complete
    return future.result()
           ^^^^^^^^^^^^^^^
  File "/usr/lib/bubblejail/python_packages/bubblejail/bubblejail_instance.py", line 242, in async_run_init
    await runner.create_bubblewrap_subprocess(args_to_run)
  File "/usr/lib/bubblejail/python_packages/bubblejail/bubblejail_runner.py", line 382, in create_bubblewrap_subprocess
    await self.task_post_init
  File "/usr/lib/bubblejail/python_packages/bubblejail/bubblejail_runner.py", line 392, in _run_post_init_hooks
    await hook(sandboxed_pid)
  File "/usr/lib/bubblejail/python_packages/bubblejail/services.py", line 950, in post_init_hook
    raise BubblejailInitializationError(
bubblejail.exceptions.BubblejailInitializationError: Slirp4netns initialization failed
@igo95862
Copy link
Owner

Hello @xiota

I can't reproduce this.

I've set the outbound address to a non-existing name and bubblejail fails to run.

WARNING: Support for --outbound-addr is experimental
outbound-addr has to be valid ipv4 address or interface name.Traceback (most recent call last):
  File "/usr/lib/bubblejail/python_packages/bubblejail/services.py", line 948, in post_init_hook
    await wait_for(slirp_ready_task, timeout=3)
  File "/usr/lib/python3.11/asyncio/tasks.py", line 479, in wait_for
    return fut.result()
           ^^^^^^^^^^^^
  File "/usr/lib/python3.11/asyncio/locks.py", line 213, in wait
    await fut
asyncio.exceptions.CancelledError

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/bin/bubblejail", line 32, in <module>
    bubblejail_main()
  File "/usr/lib/bubblejail/python_packages/bubblejail/bubblejail_cli.py", line 243, in bubblejail_main
    func(**args_dict)
  File "/usr/lib/bubblejail/python_packages/bubblejail/bubblejail_cli.py", line 111, in run_bjail
    async_run(
  File "/usr/lib/python3.11/asyncio/runners.py", line 190, in run
    return runner.run(main)
           ^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.11/asyncio/runners.py", line 118, in run
    return self._loop.run_until_complete(task)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.11/asyncio/base_events.py", line 653, in run_until_complete
    return future.result()
           ^^^^^^^^^^^^^^^
  File "/usr/lib/bubblejail/python_packages/bubblejail/bubblejail_instance.py", line 242, in async_run_init
    await runner.create_bubblewrap_subprocess(args_to_run)
  File "/usr/lib/bubblejail/python_packages/bubblejail/bubblejail_runner.py", line 382, in create_bubblewrap_subprocess
    await self.task_post_init
  File "/usr/lib/bubblejail/python_packages/bubblejail/bubblejail_runner.py", line 392, in _run_post_init_hooks
    await hook(sandboxed_pid)
  File "/usr/lib/bubblejail/python_packages/bubblejail/services.py", line 950, in post_init_hook
    raise BubblejailInitializationError(
bubblejail.exceptions.BubblejailInitializationError: Slirp4netns initialization failed

@xiota
Copy link
Author

xiota commented Sep 16, 2023

Upon further testing, this seems to be only on the first run (most reliably reproduced after reboot). Possibly related to the slirp4netns first-run issue.

Seems to occur with GUI applications and X11. Does not occur with shell script.

It does not occur with network names that don't exist (eg, random strings). Only names do or should exist.

$ slirp4netns --version
slirp4netns version 1.2.2
commit: 0ee2d87523e906518d34a6b423271e4826f71faf
libslirp: 4.7.0
SLIRP_CONFIG_VERSION_MAX: 4
libseccomp: 2.5.4

@igo95862
Copy link
Owner

Hmm. I experienced this today but with the limit namespaces service:

Process Process-1:
Traceback (most recent call last):
  File "/usr/lib/python3.11/multiprocessing/process.py", line 314, in _bootstrap
    self.run()
  File "/usr/lib/python3.11/multiprocessing/process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
  File "/usr/lib/bubblejail/python_packages/bubblejail/services.py", line 1082, in set_namespaces_limits
    with open("/proc/sys/user/" + proc_file, mode="w") as f:
         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
PermissionError: [Errno 13] Permission denied: '/proc/sys/user/max_user_namespaces'
Traceback (most recent call last):
  File "/usr/bin/bubblejail", line 32, in <module>
    bubblejail_main()
  File "/usr/lib/bubblejail/python_packages/bubblejail/bubblejail_cli.py", line 243, in bubblejail_main
    func(**args_dict)
  File "/usr/lib/bubblejail/python_packages/bubblejail/bubblejail_cli.py", line 111, in run_bjail
    async_run(
  File "/usr/lib/python3.11/asyncio/runners.py", line 190, in run
    return runner.run(main)
           ^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.11/asyncio/runners.py", line 118, in run
    return self._loop.run_until_complete(task)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.11/asyncio/base_events.py", line 653, in run_until_complete
    return future.result()
           ^^^^^^^^^^^^^^^
  File "/usr/lib/bubblejail/python_packages/bubblejail/bubblejail_instance.py", line 242, in async_run_init
    await runner.create_bubblewrap_subprocess(args_to_run)
  File "/usr/lib/bubblejail/python_packages/bubblejail/bubblejail_runner.py", line 382, in create_bubblewrap_subprocess
    await self.task_post_init
  File "/usr/lib/bubblejail/python_packages/bubblejail/bubblejail_runner.py", line 392, in _run_post_init_hooks
    await hook(sandboxed_pid)
  File "/usr/lib/bubblejail/python_packages/bubblejail/services.py", line 1136, in post_init_hook
    raise BubblejailInitializationError(
bubblejail.exceptions.BubblejailInitializationError: Limit namespaces subprocess failed

@igo95862
Copy link
Owner

igo95862 commented Sep 21, 2023

I should have ran the lsns --tree when it happened...

@xiota
Copy link
Author

xiota commented Sep 22, 2023

I tried running lsns --tree, but don't know how to interpret it's output. What would you be looking for? Also, after the error occurs, bubblejail exits. Wouldn't the namespaces of interest no longer be present / accessible? Is there a way to poll the namespaces to see what is happening to them in (near) real time?

@igo95862
Copy link
Owner

Wouldn't the namespaces of interest no longer be present / accessible?

Bwrap still runs successfully so the namespaces should be there.

@igo95862
Copy link
Owner

igo95862 commented Sep 25, 2023

I encountered the bug today and checked the namespaces. There was nothing abnormal about them.

My current theory is that there is some kind of file descriptor confusion. I am planning on adding the logging module. When set to DEBUG it will print every little detail like the file descriptors that are used.

@xiota
Copy link
Author

xiota commented Sep 29, 2023

Using build from 8477471.

After reboot. Desktop environment loaded. Bubblejail not yet run.
NS           TYPE   NPROCS   PID PPID COMMAND
4026531837   user       30   492  472 /usr/bin/startplasma-x11
├─4026531834 time       30   492  472 /usr/bin/startplasma-x11
├─4026531835 cgroup     30   492  472 /usr/bin/startplasma-x11
├─4026531836 pid        30   492  472 /usr/bin/startplasma-x11
├─4026531838 uts        30   492  472 /usr/bin/startplasma-x11
├─4026531839 ipc        30   492  472 /usr/bin/startplasma-x11
├─4026531840 net        30   492  472 /usr/bin/startplasma-x11
└─4026531841 mnt        30   492  472 /usr/bin/startplasma-x11
After first (unsuccessful) Bubblejail run. (Looks same as earlier output, except nprocs is higher.)
NS           TYPE   NPROCS   PID PPID COMMAND
4026531837   user       36   492  472 /usr/bin/startplasma-x11
├─4026531834 time       36   492  472 /usr/bin/startplasma-x11
├─4026531835 cgroup     36   492  472 /usr/bin/startplasma-x11
├─4026531836 pid        36   492  472 /usr/bin/startplasma-x11
├─4026531838 uts        36   492  472 /usr/bin/startplasma-x11
├─4026531839 ipc        36   492  472 /usr/bin/startplasma-x11
├─4026531840 net        36   492  472 /usr/bin/startplasma-x11
└─4026531841 mnt        36   492  472 /usr/bin/startplasma-x11
After second (successful) Bubblejail run.
NS                 TYPE   NPROCS   PID  PPID COMMAND
4026531837         user       40   492   472 /usr/bin/startplasma-x11
├─4026531834       time       45   492   472 /usr/bin/startplasma-x11
├─4026531835       cgroup     40   492   472 /usr/bin/startplasma-x11
├─4026531836       pid        40   492   472 /usr/bin/startplasma-x11
├─4026531838       uts        40   492   472 /usr/bin/startplasma-x11
├─4026531839       ipc        40   492   472 /usr/bin/startplasma-x11
├─4026531840       net        40   492   472 /usr/bin/startplasma-x11
├─4026531841       mnt        40   492   472 /usr/bin/startplasma-x11
└─4026532334       user        0
  ├─4026532335     mnt         5  1422  1420 /usr/bin/python3 -IO /usr/lib/bubblejail/bubblejail-helper --helper-socke
  ├─4026532336     uts         5  1422  1420 /usr/bin/python3 -IO /usr/lib/bubblejail/bubblejail-helper --helper-socke
  ├─4026532337     ipc         2  1422  1420 /usr/bin/python3 -IO /usr/lib/bubblejail/bubblejail-helper --helper-socke
  ├─4026532338     pid         5  1422  1420 /usr/bin/python3 -IO /usr/lib/bubblejail/bubblejail-helper --helper-socke
  ├─4026532339     cgroup      5  1422  1420 /usr/bin/python3 -IO /usr/lib/bubblejail/bubblejail-helper --helper-socke
  ├─4026532340     net         3  1422  1420 /usr/bin/python3 -IO /usr/lib/bubblejail/bubblejail-helper --helper-socke
  └─4026532574     user        2  1422  1420 /usr/bin/python3 -IO /usr/lib/bubblejail/bubblejail-helper --helper-socke
    ├─4026532576   user        1  1518  1427 /usr/lib/firefox-nightly/firefox -contentproc -parentBuildID 202309262347
    │ └─4026532577 ipc         1  1518  1427 /usr/lib/firefox-nightly/firefox -contentproc -parentBuildID 202309262347
    ├─4026532578   user        1  1544  1427 /usr/lib/firefox-nightly/firefox -contentproc -childID 1 -isForBrowser -p
    │ ├─4026532579 ipc         1  1544  1427 /usr/lib/firefox-nightly/firefox -contentproc -childID 1 -isForBrowser -p
    │ └─4026532580 net         1  1544  1427 /usr/lib/firefox-nightly/firefox -contentproc -childID 1 -isForBrowser -p
    └─4026532635   user        1  1589  1427 /usr/lib/firefox-nightly/firefox -contentproc -childID 2 -isForBrowser -p
      ├─4026532636 ipc         1  1589  1427 /usr/lib/firefox-nightly/firefox -contentproc -childID 2 -isForBrowser -p
      └─4026532637 net         1  1589  1427 /usr/lib/firefox-nightly/firefox -contentproc -childID 2 -isForBrowser -p

@igo95862
Copy link
Owner

igo95862 commented Oct 4, 2023

So I had a Firefox instance failing today after boot because it could not decode the JSON written by the bwrap:

Exception in callback BubblejailRunner.read_info_fd()
 handle: <Handle BubblejailRunner.read_info_fd()>
 Traceback (most recent call last):
   File "/usr/lib/python3.11/asyncio/events.py", line 80, in _run
     self._context.run(self._callback, *self._args)
   File "/usr/lib/bubblejail/python_packages/bubblejail/bubblejail_runner.py", line 309, in read_info_fd
     info_dict = json_load(f)
                 ^^^^^^^^^^^^
   File "/usr/lib/python3.11/json/__init__.py", line 293, in load
     return loads(fp.read(),
            ^^^^^^^^^^^^^^^^
   File "/usr/lib/python3.11/json/__init__.py", line 346, in loads
     return _default_decoder.decode(s)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^
   File "/usr/lib/python3.11/json/decoder.py", line 337, in decode
     obj, end = self.raw_decode(s, idx=_w(s, 0).end())
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
   File "/usr/lib/python3.11/json/decoder.py", line 353, in raw_decode
     obj, end = self.scan_once(s, idx)
                ^^^^^^^^^^^^^^^^^^^^^^
 json.decoder.JSONDecodeError: Expecting ',' delimiter: line 2 column 22 (char 23)

File descriptor confusion theory seems more likely.

@igo95862
Copy link
Owner

@xiota

I finally found time to work on adding logging. Right now it is in a separate branch add-logging.

It adds a new command line option --log-level. When it is set to DEBUG (--log-level DEBUG) will print all the arguments passed to subcommands.

@xiota
Copy link
Author

xiota commented Oct 22, 2023

Thanks. I tried it. Didn't notice anything indicating what the cause might be.

Just to refresh/summarize the issue, on first run of an instance after reboot (Firefox, other web browsers, and GUI network applications), bubblejail usually produces an error that includes the following, which appears to be produced by slirp4netns:

WARNING: Support for --outbound-addr is experimental
setns(CLONE_NEWNET): Operation not permitted
child failed(1)

Even though bubblejail exits after producing the error message, the program (eg, Firefox) continues to run. When I first reported the issue, this happened every time. Now it seems to happen about half the time.

Subsequent runs of the instance seem to work normally. Running a different instance (eg, different web browser) produces the same error and behavior on first run.

Command-line programs do not have the issue. Desktop is KDE Plasma. Network is managed by Network Manager. Configuring user.max_user_namespaces and kernel.unprivileged_userns_clone through sysctl has had no effect.

The first-run error message does not recur after disconnecting and reconnecting the network.

toml of an instance that has the problem.
[common]
executable_name = [
    "/usr/bin/firefox",
]
share_local_time = false
filter_disk_sync = false
dbus_name = ""

[x11]

[pulse_audio]

[home_share]
home_paths = [
    "Downloads",
    ".config/gtk-3.0",
    ".config/mimeapps.list",
]

[direct_rendering]
enable_aco = false

[root_share]
paths = [
    "/run/user/1000/bubblejail-RecentDocuments",
    "/run/user/1000/bubblejail-cache",
]
read_only_paths = []

[notify]

[slirp4netns]
dns_servers = []
outbound_addr = "wlan0"
disable_host_loopback = false

Paths at /run/user/1000/ are created by another script and definitely exist before bubblejail is run. They're used with symlinks to keep some transient cache data in tmpfs. They are garbage collected when applications are shut down, and the first-run error message does not recur after garbage collection.

@igo95862
Copy link
Owner

igo95862 commented Oct 22, 2023

I have a few ideas what might fix these issues.

21a01e5 Use network namespace and NS_GET_USERNS ioctl. This means the slirp4netns definitely be given the user namespace that owns the network namespace. I tested it a bit in the testing command line and previously running an instance more than once would also have had the setns(CLONE_NEWNET): Operation not permitted but after it can run multiple times.

8ad588d Check for magic value when reading from ready file descriptor. Looks like reading from closed pipe can return empty string. Previously only reading from pipe was checked.

Could you give those changes a test @xiota ? I uploaded them to the xiota-fixes branch.

@xiota
Copy link
Author

xiota commented Oct 22, 2023

@igo95862 The potential fixes look promising. Did not see the "Operation not permitted" error message after installing the new branch and rebooting. Would need a few days of use to be sure though. If they work out, would also close #65.

@igo95862
Copy link
Owner

@xiota how was your experience with those fixes?

I plan on releasing 0.8.2 version if they are successful.

@xiota
Copy link
Author

xiota commented Oct 24, 2023

No error messages with the fixes.

Error messages return when the old version is reinstalled and computer rebooted. Error messages gone again when fixed version is installed and computer rebooted.

@igo95862
Copy link
Owner

0.8.2 has been released with the fixes.

@xiota
Copy link
Author

xiota commented Oct 26, 2023

0.8.2 release (12307c4) is working well. Thank you.

@xiota xiota closed this as completed Oct 26, 2023
@igo95862
Copy link
Owner

igo95862 commented Jun 9, 2024

I think I figured out the original bug. It was a concurrency issue with a PID written by a bwrap in to the info file descriptor. It writes the PID as soon as it forks but not all namespaces are initialized. So the original code that used the UserNamespace would either get the owner user namespace or child user namespace based on how fast the code ran.

containers/bubblewrap#634

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants