Skip to content
This repository has been archived by the owner on Nov 10, 2023. It is now read-only.

make allocators and sanitizers work for processes created with multiprocessing's spawn method in dev mode #2657

Open
wants to merge 1 commit into
base: dev
Choose a base branch
from

Conversation

yifuwang
Copy link

@yifuwang yifuwang commented Sep 8, 2021

Summary:

Problem

Currently, the entrypoint for in-place Python binaries (i.e. built with dev
mode) executes the following steps to load system native dependencies (e.g.
sanitizers and allocators):

  • Backup LD_PRELOAD set by the caller
  • Append system native dependencies to LD_PRELOAD
  • Inject a prologue in user code which restores LD_PRELOAD set by the caller
  • execv Python interpreter

The steps work as intended for single process Python programs. However, when a
Python program spawns child processes, the child processes will not load native
dependencies, since they simply execv's the vanilla Python interpreter. A few
examples why this is problematic:

  • The ASAN runtime library is a system native dependency. Without loading it, a
    child process that loads user native dependencies compiled with ASAN will
    crash during static initialization because it can't find _asan_init.
  • jemalloc is also a system native dependency.

Many if not most ML use cases "bans" dev mode because of these problems. It is
very unfortunate considering the developer efficiency dev mode provides. In
addition, a huge amount of unit tests have to run in a more expensive build
mode because of these problems.

For an earlier discussion, see this post.

Solution

Move the system native dependencies loading logic out of the Python binary
entrypoint into an interpreter wrapper, and set the interpreter as
sys.executable in the injected prologue:

  • The Python binary entrypoint now uses the interpreter wrapper, which has the
    same command line interface as the Python interpreter, to run the main
    module.
  • multiprocessing's spawn method now uses the interpreter wrapper to create
    child processes, ensuring system native dependencies get loaded correctly.

Alternative Considered

One alternative considered is to simply not removing system native dependencies
from LD_PRELOAD, so they are present in the spawned processes. However, this
causes some linking issues, which were perhaps the reason LD_PRELOAD was
restored in the first place: in-place Python binaries have access to binaries
install on devservers that are not built with the target platform (e.g.
/bin/sh which is used by some Python standard libraries). These binaries does
not link properly with the system native dependencies.

References

An old RFC for this change: D16210828
The counterpart for opt mode: D16350169

@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D30802446

1 similar comment
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D30802446

yifuwang added a commit to yifuwang/buck that referenced this pull request Sep 14, 2021
…rocessing's spawn method in dev mode (facebook#2657)

Summary:
Pull Request resolved: facebook#2657

#### Problem
Currently, the entrypoint for in-place Python binaries (i.e. built with dev
mode) executes the following steps to load system native dependencies (e.g.
sanitizers and allocators):
- Backup `LD_PRELOAD` set by the caller
- Append system native dependencies to `LD_PRELOAD`
- Inject a prologue in user code which restores `LD_PRELOAD` set by the caller
- `execv` Python interpreter

The steps work as intended for single process Python programs. However, when a
Python program spawns child processes, the child processes will not load native
dependencies, since they simply `execv`'s the vanilla Python interpreter. A few
examples why this is problematic:
- The ASAN runtime library is a system native dependency. Without loading it, a
  child process that loads user native dependencies compiled with ASAN will
  crash during static initialization because it can't find `_asan_init`.
- `jemalloc` is also a system native dependency.

Many if not most ML use cases "bans" dev mode because of these problems. It is
very unfortunate considering the developer efficiency dev mode provides. In
addition, a huge amount of unit tests have to run in a more expensive build
mode because of these problems.

For an earlier discussion, see [this post](https://fb.workplace.com/groups/fbpython/permalink/2897630276944987/).

#### Solution
Move the system native dependencies loading logic out of the Python binary
entrypoint into an interpreter wrapper, and set the interpreter as
`sys.executable` in the injected prologue:
- The Python binary entrypoint now uses the interpreter wrapper, which has the
  same command line interface as the Python interpreter, to run the main
  module.
- `multiprocessing`'s `spawn` method now uses the interpreter wrapper to create
  child processes, ensuring system native dependencies get loaded correctly.

#### Alternative Considered
One alternative considered is to simply not removing system native dependencies
from `LD_PRELOAD`, so they are present in the spawned processes. However, this
causes some linking issues, which were perhaps the reason `LD_PRELOAD` was
restored in the first place: in-place Python binaries have access to binaries
install on devservers that are not built with the target platform (e.g.
`/bin/sh` which is used by some Python standard libraries). These binaries does
not link properly with the system native dependencies.

#### References
An old RFC for this change: D16210828
The counterpart for opt mode: D16350169

fbshipit-source-id: 118d3a4657ba397b1c98b95d62f85ad01e234422
@yifuwang yifuwang force-pushed the export-D30802446-to-dev branch from 63d7d1b to a54cc5f Compare September 14, 2021 22:43
…rocessing's spawn method in dev mode (facebook#2657)

Summary:
Pull Request resolved: facebook#2657

#### Problem
Currently, the entrypoint for in-place Python binaries (i.e. built with dev
mode) executes the following steps to load system native dependencies (e.g.
sanitizers and allocators):
- Backup `LD_PRELOAD` set by the caller
- Append system native dependencies to `LD_PRELOAD`
- Inject a prologue in user code which restores `LD_PRELOAD` set by the caller
- `execv` Python interpreter

The steps work as intended for single process Python programs. However, when a
Python program spawns child processes, the child processes will not load native
dependencies, since they simply `execv`'s the vanilla Python interpreter. A few
examples why this is problematic:
- The ASAN runtime library is a system native dependency. Without loading it, a
  child process that loads user native dependencies compiled with ASAN will
  crash during static initialization because it can't find `_asan_init`.
- `jemalloc` is also a system native dependency.

Many if not most ML use cases "bans" dev mode because of these problems. It is
very unfortunate considering the developer efficiency dev mode provides. In
addition, a huge amount of unit tests have to run in a more expensive build
mode because of these problems.

For an earlier discussion, see [this post](https://fb.workplace.com/groups/fbpython/permalink/2897630276944987/).

#### Solution
Move the system native dependencies loading logic out of the Python binary
entrypoint into an interpreter wrapper, and set the interpreter as
`sys.executable` in the injected prologue:
- The Python binary entrypoint now uses the interpreter wrapper, which has the
  same command line interface as the Python interpreter, to run the main
  module.
- `multiprocessing`'s `spawn` method now uses the interpreter wrapper to create
  child processes, ensuring system native dependencies get loaded correctly.

#### Alternative Considered
One alternative considered is to simply not removing system native dependencies
from `LD_PRELOAD`, so they are present in the spawned processes. However, this
causes some linking issues, which were perhaps the reason `LD_PRELOAD` was
restored in the first place: in-place Python binaries have access to binaries
install on devservers that are not built with the target platform (e.g.
`/bin/sh` which is used by some Python standard libraries). These binaries does
not link properly with the system native dependencies.

#### References
An old RFC for this change: D16210828
The counterpart for opt mode: D16350169

Reviewed By: fried, bobyangyf, Reubend

fbshipit-source-id: 8c13de3517155cf3a8d69a212e30565c5c7277e0
@yifuwang yifuwang force-pushed the export-D30802446-to-dev branch from a54cc5f to ba64e24 Compare September 16, 2021 19:38
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D30802446

facebook-github-bot pushed a commit that referenced this pull request Sep 16, 2021
…rocessing's spawn method in dev mode (#2657)

Summary:
Pull Request resolved: #2657

#### Problem
Currently, the entrypoint for in-place Python binaries (i.e. built with dev
mode) executes the following steps to load system native dependencies (e.g.
sanitizers and allocators):
- Backup `LD_PRELOAD` set by the caller
- Append system native dependencies to `LD_PRELOAD`
- Inject a prologue in user code which restores `LD_PRELOAD` set by the caller
- `execv` Python interpreter

The steps work as intended for single process Python programs. However, when a
Python program spawns child processes, the child processes will not load native
dependencies, since they simply `execv`'s the vanilla Python interpreter. A few
examples why this is problematic:
- The ASAN runtime library is a system native dependency. Without loading it, a
  child process that loads user native dependencies compiled with ASAN will
  crash during static initialization because it can't find `_asan_init`.
- `jemalloc` is also a system native dependency.

Many if not most ML use cases "bans" dev mode because of these problems. It is
very unfortunate considering the developer efficiency dev mode provides. In
addition, a huge amount of unit tests have to run in a more expensive build
mode because of these problems.

For an earlier discussion, see [this post](https://fb.workplace.com/groups/fbpython/permalink/2897630276944987/).

#### Solution
Move the system native dependencies loading logic out of the Python binary
entrypoint into an interpreter wrapper, and set the interpreter as
`sys.executable` in the injected prologue:
- The Python binary entrypoint now uses the interpreter wrapper, which has the
  same command line interface as the Python interpreter, to run the main
  module.
- `multiprocessing`'s `spawn` method now uses the interpreter wrapper to create
  child processes, ensuring system native dependencies get loaded correctly.

#### Alternative Considered
One alternative considered is to simply not removing system native dependencies
from `LD_PRELOAD`, so they are present in the spawned processes. However, this
causes some linking issues, which were perhaps the reason `LD_PRELOAD` was
restored in the first place: in-place Python binaries have access to binaries
install on devservers that are not built with the target platform (e.g.
`/bin/sh` which is used by some Python standard libraries). These binaries does
not link properly with the system native dependencies.

#### References
An old RFC for this change: D16210828
The counterpart for opt mode: D16350169

Reviewed By: fried, bobyangyf, Reubend

fbshipit-source-id: e17696f5c6f31138d9ea7f5e56408097eb282859
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants