Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

plain invocation of python subprocess doesn't inherit sys.path for bootstrap_impl=script #2169

Open
rickeylev opened this issue Aug 31, 2024 · 6 comments · May be fixed by #2409
Open

plain invocation of python subprocess doesn't inherit sys.path for bootstrap_impl=script #2169

rickeylev opened this issue Aug 31, 2024 · 6 comments · May be fixed by #2409

Comments

@rickeylev
Copy link
Contributor

A side-effect of no longer propagating import paths using the PYTHONPATH envvar is that subprocesses don't inherit the paths. This is usually a good thing, but ends up breaking plain calls to python that assume they're going to inherit the current python's settings.

An example is pre-commit and its invocation of virtual env:

# add pre-commit to requirements and process through pip.parse

# BUILD.bazel
load("//python/entry_points:py_console_script_binary.bzl", "py_console_script_binary")

py_console_script_binary(
    name = "pre-commit",
    pkg = "@dev_pip//pre_commit",
    script = "pre-commit",
)

bazel run --@rules_python//python/config_settings:bootstrap_impl=script //:pre-commit

Eventually, it'll run: [sys.executable, '-mvirtualenv', ...], its sys.path will be just the stdlib, and fail to import virtualenv


This is sort of WAI. Part of the purpose of bootstrap_impl=script is to no longer use the envvar so that PYTHONPATH doesn't get too long and bleed into subprocesses.

I'm not sure how to work around this. I guess a legacy option to set the env var?

I'm not sure how this is supposed to work outside of bazel, either. It must assume that it's invoked in a venv or something? The surrounding code seems to indicate it's setting up a venv for pre-commit itself...or something. This all seems odd -- I would have to create a venv with virtualenv in it to run pre-commit so pre-commit can create its own venv? That doesn't sound right.

@aignas
Copy link
Collaborator

aignas commented Aug 31, 2024

At $dayjob I had a similar usecase where I need to have python interpreter with all dependencies set up and with the way the new thing is setup, I am not sure how I could achieve that using the new bootstrap. I had a py_binary that was using sys.executable and just forward the args to the interpreter and using pythonpath env var would just work, but with the new method, I would need to also setup the sys.path myself before invoking the interpreter.

Maybe at the very least there is a way to workaround this where sys.executable could be set to something that sets up the sys.path.

@rickeylev
Copy link
Contributor Author

I was reading some Python docs (venv or site, i can't remember), and they gave me the following idea:

  • A binary creates a wrapper for the interpreter. This becomes sys.executable
  • In that same directory, there's a pth file
  • At interpreter startup (or site init?), it reads pth files from its directory.

@ewianda
Copy link
Contributor

ewianda commented Nov 1, 2024

I was reading some Python docs (venv or site, i can't remember), and they gave me the following idea:

  • A binary creates a wrapper for the interpreter. This becomes sys.executable
  • In that same directory, there's a pth file
  • At interpreter startup (or site init?), it reads pth files from its directory.

@rickeylev could you provide more details on how this can be implemented, we are having issues with prefect library as well. When you say binary do you mean a py_binary or just a script (bash/python)

@ewianda
Copy link
Contributor

ewianda commented Nov 1, 2024

Okay I tried the following approach and it seems to work

Create sitecustomize.py under $(bazel info output_base)/external/rules_python~~python~python_3_11_6_x86_64-unknown-linux-gnu/lib/python3.11/site-packages/

sitecustomize.py

import os
import sys


def find_site_packages_dirs(root_dir):
    # Store all paths with the name 'site-packages'
    site_packages_dirs = []
    # Walk through the directory tree
    for dirpath, dirnames, _ in os.walk(root_dir):
        # Check if 'site-packages' is in the list of directories at this level
        if "site-packages" in dirnames:
            site_packages_path = os.path.join(dirpath, "site-packages")
            site_packages_dirs.append(site_packages_path)

            # Optionally, remove 'site-packages' from dirnames to skip its subtree
            dirnames.remove(
                "site-packages"
            )  # Skip subdirs within 'site-packages' for efficiency

    return site_packages_dirs
root_directory = os.environ["RUNFILES_DIR"]
site_packages_paths = find_site_packages_dirs(root_directory)
sys.path.extend(site_packages_paths)

I can't think of any situation where this fails.

@rickeylev
Copy link
Contributor Author

rickeylev commented Nov 1, 2024

I've been poking this off and on over the last couple days and it looks pretty promising. It looks approximately like this:

It all looks to be working, but its still prototype quality, so needs cleanup and to see what happens when all the tests are run.

Okay I tried (putting site customize in the runtime dir) and it seems to work
I can't think of any situation where this fails.

Some problematic cases I can think of are:

  • Import paths that don't involve site-packages. e.g. when the imports attribute is used
  • I've strayed away from sitecustomize.py because its not as extensible -- there can be only one, and you can't know if overriding it is going to prevent a previously existing one from running. Putting it in the runtime itself probably obviates most of this issue, but hard to say.
  • Adding a sitecustomize.py is possible for the hermetic runtimes (since we effectively control what is in those), but doesn't generalize to custom runtimes or system runtimes.
  • Walking the whole file tree is pretty expensive -- large apps having e.g. 40,000 files is not unheard of.
  • nested binaries with different closures of dependencies would pick up each others site-packages directories.
  • The RUNFILES_DIR env variable may not be set (there's some other vars that may be set). In general, there usually has to be some fallback code if the various runfiles env vars aren't set.

EDIT: Just to clarify, thank you for hacking away on this :). I didn't meant to be overly critical.

@ewianda
Copy link
Contributor

ewianda commented Nov 1, 2024

EDIT: Just to clarify, thank you for hacking away on this :). I didn't meant to be overly critical.

Not at all. I love to learn

I did think of some of the points you mentioned, with potential ways to address them, but that is not relevant since you are working on a potentially more robust solution.

Thanks.

@rickeylev rickeylev linked a pull request Nov 14, 2024 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants