plain invocation of python subprocess doesn't inherit sys.path for bootstrap_impl=script #2169

rickeylev · 2024-08-31T00:49:37Z

A side-effect of no longer propagating import paths using the PYTHONPATH envvar is that subprocesses don't inherit the paths. This is usually a good thing, but ends up breaking plain calls to python that assume they're going to inherit the current python's settings.

An example is pre-commit and its invocation of virtual env:

# add pre-commit to requirements and process through pip.parse

# BUILD.bazel
load("//python/entry_points:py_console_script_binary.bzl", "py_console_script_binary")

py_console_script_binary(
    name = "pre-commit",
    pkg = "@dev_pip//pre_commit",
    script = "pre-commit",
)

bazel run --@rules_python//python/config_settings:bootstrap_impl=script //:pre-commit

Eventually, it'll run: [sys.executable, '-mvirtualenv', ...], its sys.path will be just the stdlib, and fail to import virtualenv

This is sort of WAI. Part of the purpose of bootstrap_impl=script is to no longer use the envvar so that PYTHONPATH doesn't get too long and bleed into subprocesses.

I'm not sure how to work around this. I guess a legacy option to set the env var?

I'm not sure how this is supposed to work outside of bazel, either. It must assume that it's invoked in a venv or something? The surrounding code seems to indicate it's setting up a venv for pre-commit itself...or something. This all seems odd -- I would have to create a venv with virtualenv in it to run pre-commit so pre-commit can create its own venv? That doesn't sound right.

The text was updated successfully, but these errors were encountered:

aignas · 2024-08-31T05:14:03Z

At $dayjob I had a similar usecase where I need to have python interpreter with all dependencies set up and with the way the new thing is setup, I am not sure how I could achieve that using the new bootstrap. I had a py_binary that was using sys.executable and just forward the args to the interpreter and using pythonpath env var would just work, but with the new method, I would need to also setup the sys.path myself before invoking the interpreter.

Maybe at the very least there is a way to workaround this where sys.executable could be set to something that sets up the sys.path.

rickeylev · 2024-09-09T01:31:54Z

I was reading some Python docs (venv or site, i can't remember), and they gave me the following idea:

A binary creates a wrapper for the interpreter. This becomes sys.executable
In that same directory, there's a pth file
At interpreter startup (or site init?), it reads pth files from its directory.

ewianda · 2024-11-01T12:30:35Z

I was reading some Python docs (venv or site, i can't remember), and they gave me the following idea:

A binary creates a wrapper for the interpreter. This becomes sys.executable

In that same directory, there's a pth file

At interpreter startup (or site init?), it reads pth files from its directory.

@rickeylev could you provide more details on how this can be implemented, we are having issues with prefect library as well. When you say binary do you mean a py_binary or just a script (bash/python)

ewianda · 2024-11-01T14:29:55Z

Okay I tried the following approach and it seems to work

Create sitecustomize.py under $(bazel info output_base)/external/rules_python~~python~python_3_11_6_x86_64-unknown-linux-gnu/lib/python3.11/site-packages/

sitecustomize.py

import os
import sys


def find_site_packages_dirs(root_dir):
    # Store all paths with the name 'site-packages'
    site_packages_dirs = []
    # Walk through the directory tree
    for dirpath, dirnames, _ in os.walk(root_dir):
        # Check if 'site-packages' is in the list of directories at this level
        if "site-packages" in dirnames:
            site_packages_path = os.path.join(dirpath, "site-packages")
            site_packages_dirs.append(site_packages_path)

            # Optionally, remove 'site-packages' from dirnames to skip its subtree
            dirnames.remove(
                "site-packages"
            )  # Skip subdirs within 'site-packages' for efficiency

    return site_packages_dirs
root_directory = os.environ["RUNFILES_DIR"]
site_packages_paths = find_site_packages_dirs(root_directory)
sys.path.extend(site_packages_paths)

I can't think of any situation where this fails.

rickeylev · 2024-11-01T16:31:13Z

I've been poking this off and on over the last couple days and it looks pretty promising. It looks approximately like this:

create a venv structure
use a pth file to trigger an import
In that imported module, do the sys.path manipulations
main...rickeylev:rules_python:fix.script.boot.sys.exe

It all looks to be working, but its still prototype quality, so needs cleanup and to see what happens when all the tests are run.

Okay I tried (putting site customize in the runtime dir) and it seems to work
I can't think of any situation where this fails.

Some problematic cases I can think of are:

Import paths that don't involve site-packages. e.g. when the imports attribute is used
I've strayed away from sitecustomize.py because its not as extensible -- there can be only one, and you can't know if overriding it is going to prevent a previously existing one from running. Putting it in the runtime itself probably obviates most of this issue, but hard to say.
Adding a sitecustomize.py is possible for the hermetic runtimes (since we effectively control what is in those), but doesn't generalize to custom runtimes or system runtimes.
Walking the whole file tree is pretty expensive -- large apps having e.g. 40,000 files is not unheard of.
nested binaries with different closures of dependencies would pick up each others site-packages directories.
The RUNFILES_DIR env variable may not be set (there's some other vars that may be set). In general, there usually has to be some fallback code if the various runfiles env vars aren't set.

EDIT: Just to clarify, thank you for hacking away on this :). I didn't meant to be overly critical.

ewianda · 2024-11-01T21:08:15Z

EDIT: Just to clarify, thank you for hacking away on this :). I didn't meant to be overly critical.

Not at all. I love to learn

I did think of some of the points you mentioned, with potential ways to address them, but that is not relevant since you are working on a potentially more robust solution.

Thanks.

When `--bootstrap_impl=script` is used, `PYTHONPATH` is no longer used to set the import paths, which means subprocesses no longer inherit the Bazel paths. This is generally a good thing, but breaks when `sys.executable` is used to directly invoke the interpreter. Such an invocation assumes the interpreter will have the same packages available and works with the system_python bootstrap. To fix, have the script bootstrap use a basic virtual env. This allows it to intercept interpreter startup even when the Bazel executable isn't invoked. Under the hood, there's two pieces to make this work. The first is a binary uses `declare_symlink()` to write a relative-path based symlink that points to the underlying Python interpreter. The second piece is a site init hook (triggered by a `.pth` file using an `import` line) performs sys.path setup as part of site (`import site`) initialization. Fixes #2169 --------- Co-authored-by: Ignas Anikevicius <[email protected]>

rickeylev · 2024-11-23T18:10:35Z

#2409 might have broken bootstrap=script for windows using wsl

@ewianda reported on the PR:

Hmm actually it fails on WSL Ubuntu and is fine on a regular Linux Machine. Says a broken symbolic link

file  /home/ewianda/.cache/bazel/_bazel_ewianda/6091a940c516b5ac88940b423928af22/sandbox/linux-sandbox/7/execroot/_main/bazel-out/k8-fastbuild/bin/benchsci/test_pg13.runfiles/_main/benchsci/_test_pg13.venv/bin/python3
/home/ewianda/.cache/bazel/_bazel_ewianda/6091a940c516b5ac88940b423928af22/sandbox/linux-sandbox/7/execroot/_main/bazel-out/k8-fastbuild/bin/benchsci/test_pg13.runfiles/_main/benchsci/_test_pg13.venv/bin/python3: broken symbolic link to ../../../../../rules_python~~python~python_3_11_6_x86_64-unknown-linux-gnu/bin/python3

Unfortunately, I don't have a windows machine for development. Last time I tried to set one up for bazel dev it took days and I never got it actually working. Our CI doesn't cover windows+wsl. It might take me a bit of time to repro this and fix it.

Elvis, are you able to determine what the correct relative path should be? It kinda looks like there should 4, not 5, ../ in the path. The .. should end up at the runfiles root so that $runfilesRoot/rules_python~~python~python_3_11_6_x86_64-unknown-linux-gnu/bin/python3 is valid.

Building with --spawn_strategy=local should disable the sandbox and let you modify the files within runfiles after building. Then you should be able to recreate the symlink to try different values.

chowder · 2024-11-23T18:58:21Z

#2409 also seems to have broken py_binary and py_test targets in multi-python setups.

Trivial reproduction:

open examples/multi_python_versions
set --@rules_python//python/config_settings:bootstrap_impl=script
try to run any of the targets under tests

I get:

ERROR: Python interpreter not found: /home/chowder/.cache/bazel/_bazel_chowder/8b98838133be84956669659bfbfec2a7/execroot/_main/bazel-out/k8-fastbuild/bin/tests/version_default.runfiles/_main/tests/_version_default.venv/bin/python3

Where the symlink points to:

$ stat /home/chowder/.cache/bazel/_bazel_chowder/8b98838133be84956669659bfbfec2a7/execroot/_main/bazel-out/k8-fastbuild/bin/tests/version_default.runfiles/_main/tests/_version_default.venv/bin/python3
  File: /home/chowder/.cache/bazel/_bazel_chowder/8b98838133be84956669659bfbfec2a7/execroot/_main/bazel-out/k8-fastbuild/bin/tests/version_default.runfiles/_main/tests/_version_default.venv/bin/python3 -> ../../../../../rules_python~~python~python_3_9_x86_64-unknown-linux-gnu/bin/python3

I haven't had the time to look into the venv implementation yet so I'm not sure what the problem is.

~~But I would've expected ${target}.venv/bin/python to symlink to ../../${target}.runfiles/${python_repo}/bin/python3~~ edit:nvm

ewianda · 2024-11-23T19:41:24Z

Sorry @rickeylev I was quick to say that it happens only on Ubuntu WSL, it also happens on regular linux VM on gcp

~/rules_python/examples/bzlmod ((1.0.0-rc0))
$ uname -a
Linux elvis-dev 5.10.0-31-cloud-amd64 #1 SMP Debian 5.10.221-1 (2024-07-14) x86_64 GNU/Linux

~/rules_python/examples/bzlmod ((1.0.0-rc0))
$ bazelisk  test //tests:version_default_test  --@rules_python//python/config_settings:bootstrap_impl=script
INFO: Analyzed target //tests:version_default_test (0 packages loaded, 0 targets configured).
FAIL: //tests:version_default_test (see /home/ewianda/.cache/bazel/_bazel_ewianda/dfd9a9905bb9b0769c7fb751ab8ba191/execroot/_main/bazel-out/k8-fastbuild/testlogs/tests/version_default_test/test.log)
INFO: From Testing //tests:version_default_test:
==================== Test output for //tests:version_default_test:
ERROR: Python interpreter not found: /home/ewianda/.cache/bazel/_bazel_ewianda/dfd9a9905bb9b0769c7fb751ab8ba191/sandbox/linux-sandbox/5/execroot/_main/bazel-out/k8-fastbuild/bin/tests/version_default_test.runfiles/_main/tests/_version_default_test.venv/bin/python3
================================================================================
INFO: Found 1 test target...
Target //tests:version_default_test up-to-date:
  bazel-bin/tests/version_default_test
INFO: Elapsed time: 0.318s, Critical Path: 0.12s
INFO: 2 processes: 2 linux-sandbox.
INFO: Build completed, 1 test FAILED, 2 total actions
//tests:version_default_test                                             FAILED in 0.0s
  /home/ewianda/.cache/bazel/_bazel_ewianda/dfd9a9905bb9b0769c7fb751ab8ba191/execroot/_main/bazel-out/k8-fastbuild/testlogs/tests/version_default_test/test.log

Executed 1 out of 1 test: 1 fails locally.

ewianda · 2024-11-23T19:45:17Z

changing

rules_python/python/private/py_executable_bazel.bzl

Line 372 in 4ee7e25

parent = "/".join([".."] * (interpreter_actual_path.count("/") + 1))

to + 0 or removing the extra ../ fixes the problem.

I assume there is no automated test for --@rules_python//python/config_settings:bootstrap_impl=script

ewianda · 2024-11-23T19:48:50Z

Our CI doesn't cover windows+wsl.

Windows + wsl2 is pretty mature. It is indistinguishable from native Linux installation. So need need to cover this case in CI.

chowder · 2024-11-23T19:56:02Z

Ok - so I don't quite understand the original logic - but I think the correct logic for this:

rules_python/python/private/py_executable_bazel.bzl

Lines 370 to 374 in 4ee7e25

    
           interpreter = ctx.actions.declare_symlink("{}/bin/{}".format(venv, py_exe_basename)) 
        
           interpreter_actual_path = runtime.interpreter.short_path 
        
           parent = "/".join([".."] * (interpreter_actual_path.count("/") + 1)) 
        
           rel_path = parent + "/" + interpreter_actual_path 
        
           ctx.actions.symlink(output = interpreter, target_path = rel_path)

is supposed to be:

EDIT: ignore the below - i'm wrong and stupid

interpreter = ctx.actions.declare_symlink("{}/bin/{}".format(venv, py_exe_basename))
interpreter_actual_path = runtime.interpreter.short_path  # Always relative to .runfiles/_main

escapes = 2  # To escape out of <target>.venv/bin
escapes += executable.short_path.count("/") # To escape into .runfiles/_main

parent = "/".join([".."] * escapes)
rel_path = parent + "/" + interpreter_actual_path

ctx.actions.symlink(output = interpreter, target_path = rel_path)

(executable propagated from call-site)

Let me raise a PR in a moment so we can see this better.

ewianda · 2024-11-23T20:01:03Z

Ok - so I don't quite understand the original logic - but I think the correct logic for this:

rules_python/python/private/py_executable_bazel.bzl

Lines 370 to 374 in 4ee7e25

interpreter = ctx.actions.declare_symlink("{}/bin/{}".format(venv, py_exe_basename))

interpreter_actual_path = runtime.interpreter.short_path

parent = "/".join([".."] * (interpreter_actual_path.count("/") + 1))

rel_path = parent + "/" + interpreter_actual_path

ctx.actions.symlink(output = interpreter, target_path = rel_path)

is supposed to be:
interpreter = ctx.actions.declare_symlink("{}/bin/{}".format(venv, py_exe_basename))
interpreter_actual_path = runtime.interpreter.short_path  # Always relative to .runfiles/_main

escapes = 2  # To escape out of <target>.venv/bin
escapes += executable.short_path.count("/") # To escape into .runfiles/_main

parent = "/".join([".."] * escapes)
rel_path = parent + "/" + interpreter_actual_path

ctx.actions.symlink(output = interpreter, target_path = rel_path)
(executable propagated from call-site)

Let me raise a PR in a moment so we can see this better.

If read this correctly it translates to parent = "/".join([".."] * (interpreter_actual_path.count("/") + 2)),
which fails for me

chowder · 2024-11-23T20:07:54Z

They're not equivalent, no - can you try these changes out please?

https://github.com/bazelbuild/rules_python/pull/2439/files

ewianda · 2024-11-23T22:43:25Z

Thanks @chowder , bazel test is now passing, py_binary also runs with no issues. But bazel run is failing for py_console_script

ERROR: Python interpreter not found: /home/ewianda/.cache/bazel/_bazel_ewianda/6091a940c516b5ac88940b423928af22/sandbox/linux-sandbox/912/execroot/_main/bazel-out/k8-opt-exec-ST-d57f47055a04/bin/external/rules_python~/python/private/py_console_script_gen_py.runfiles/rules_python~/python/private/_py_console_script_gen_py.venv/bin/python3

chowder · 2024-11-24T01:14:33Z

OHH okay, I think see how this is supposed to work now.

@ewianda I've updated the PR

The only case it doesn't cover is an external rule using a Python interpreter stored in the main repository (but I reckon that's a bit of an unusual setup).

…`--bootstrap_impl=script` (#2439) Computing the relative path from the venv interpreter to the underlying interpreter was incorrectly using the actual interpreter's directory depth, not the venv interpreter's directory depth, when computing the distance from the venv interpreter to the runfiles root. The net effect is the correct relative path would only be computed for binaries with the same directory depth as the actual interpreter (e.g. 2). This went undetected in CI because the tests for this logic just happen to have the same directory depth as the actual interpreter used. To fix, compute the relative path to the runfiles root using the venv interpreter directory. Also added a test in a more nested directory to test this case. Along the way: * Change relative path computation to compute a minimum relative path. * Fix the internals to pass a runfiles-root relative path, not main-repo relative path, for the actual interpreter, as intended. Fixes #2169 --------- Co-authored-by: Richard Levasseur <[email protected]> Co-authored-by: Richard Levasseur <[email protected]> Co-authored-by: Ignas Anikevicius <[email protected]>

aignas mentioned this issue Aug 31, 2024

fix: Use -P to enable safe path semantics instead of PYTHONSAFEPATH #2122

Open

rickeylev mentioned this issue Nov 14, 2024

fix: make sys.executable work with script bootstrap #2409

Merged

rickeylev closed this as completed in #2409 Nov 21, 2024

rickeylev reopened this Nov 23, 2024

chowder mentioned this issue Nov 23, 2024

fix: correctly obtain relative path required for the venv created by --bootstrap_impl=script #2439

Merged

rickeylev closed this as completed in #2439 Nov 25, 2024

aignas mentioned this issue Dec 26, 2024

pip_parse() does not correctly read *.pth files? #2071

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

plain invocation of python subprocess doesn't inherit sys.path for bootstrap_impl=script #2169

plain invocation of python subprocess doesn't inherit sys.path for bootstrap_impl=script #2169

rickeylev commented Aug 31, 2024

aignas commented Aug 31, 2024

rickeylev commented Sep 9, 2024

ewianda commented Nov 1, 2024

ewianda commented Nov 1, 2024 •

edited

Loading

rickeylev commented Nov 1, 2024 •

edited

Loading

ewianda commented Nov 1, 2024

rickeylev commented Nov 23, 2024

chowder commented Nov 23, 2024 •

edited

Loading

ewianda commented Nov 23, 2024 •

edited

Loading

ewianda commented Nov 23, 2024

ewianda commented Nov 23, 2024 •

edited

Loading

chowder commented Nov 23, 2024 •

edited

Loading

ewianda commented Nov 23, 2024 •

edited

Loading

chowder commented Nov 23, 2024

ewianda commented Nov 23, 2024

chowder commented Nov 24, 2024

plain invocation of python subprocess doesn't inherit sys.path for bootstrap_impl=script #2169

plain invocation of python subprocess doesn't inherit sys.path for bootstrap_impl=script #2169

Comments

rickeylev commented Aug 31, 2024

aignas commented Aug 31, 2024

rickeylev commented Sep 9, 2024

ewianda commented Nov 1, 2024

ewianda commented Nov 1, 2024 • edited Loading

rickeylev commented Nov 1, 2024 • edited Loading

ewianda commented Nov 1, 2024

rickeylev commented Nov 23, 2024

chowder commented Nov 23, 2024 • edited Loading

ewianda commented Nov 23, 2024 • edited Loading

ewianda commented Nov 23, 2024

ewianda commented Nov 23, 2024 • edited Loading

chowder commented Nov 23, 2024 • edited Loading

ewianda commented Nov 23, 2024 • edited Loading

chowder commented Nov 23, 2024

ewianda commented Nov 23, 2024

chowder commented Nov 24, 2024

ewianda commented Nov 1, 2024 •

edited

Loading

rickeylev commented Nov 1, 2024 •

edited

Loading

chowder commented Nov 23, 2024 •

edited

Loading

ewianda commented Nov 23, 2024 •

edited

Loading

ewianda commented Nov 23, 2024 •

edited

Loading

chowder commented Nov 23, 2024 •

edited

Loading

ewianda commented Nov 23, 2024 •

edited

Loading