Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Decide on semantics for workspace environment syncing (with --package, --extra, and --dev) #4730

Open
charliermarsh opened this issue Jul 2, 2024 · 4 comments
Labels
preview Experimental behavior

Comments

@charliermarsh
Copy link
Member

We had an extensive discussion about this in Discord. I'll try to summarize some of it here, but we still need to make some important decisions about how virtualenv syncing should behave for workspaces. Most of these changes aren't hard to implement, but they do have a lot of follow-on impact.

Today, we use a single virtualenv at the workspace root. When you do uv run -p package, we bring the environment in-sync with the necessary dependencies for package. That is, we don't install the entire workspace -- only the dependencies that are required for package. This has the benefit that you can't accidentally import some other dependency from the workspace within the package code (it enforces some amount of isolation).

However, this syncing is purely additive. So if you run uv run -p foo then uv run -p bar, we'll install any packages necessary for bar, but we won't uninstall the packages that we needed for foo (but not bar).

Extras behave similarly. So if you run uv run -p bar --extra foo, we'll install the extras activated by foo, and then leave them activated if you run uv run -p bar.

This strategy has some desirable properties (we enforce isolation on initial uv run -p, we only sync the necessary packages rather than building the entire workspace) but it's sort of a compromise (we don't enforce strict isolation on subsequent uv run -p).

One alternative we discussed was: creating a virtualenv per package in the workspace, on-demand. So if you did uv run -p package, we'd create a virtualenv in package with that package's dependencies. We'd also have a virtualenv in the workspace root.

This gives us the same properties that we have today, with the added benefit that it does enforce strict isolation. But there are two downsides: (1) you now have multiple virtualenvs in your project, which could confuse editors (I think this is totally fine); (2) you don't have a single virtualenv with the entire workspace installed, which means your editor could lack completions and Go-To definition for packages in the workspace (this problem also exists today); and (3) it doesn't solve the extras problem.

(3) seems to be the critical one. Extras are kind of the same problem (stateful environments) as -p, but don't have a natural analog to "store the virtualenv in the package". In short, we leave extras enabled after a uv run, and this could be problematic for some packages.

One solution to the "extras" problem would be to do a strict sync on every uv run. So if you did uv run -p foo --extra bar and then uv run -p foo, we'd remove the bar extra on that second invocation. The main downside here seems to be that you uv sync -p foo --extra bar and then uv run -p foo would remove the bar extra, which some have argued is unintuitive. (N.B. uv sync -p foo does not exist today.)

@charliermarsh
Copy link
Member Author

charliermarsh commented Jul 2, 2024

One proposal:

  • uv run --package creates a venv in --package
  • uv run operates with pip install semantics (doesn't remove extraneous packages)
  • uv sync operates with pip install semantics
  • uv sync --locked operates with pip sync semantics
  • uv run --locked operates with pip sync semantics

Unclear if uv run --package --extra would modify the base environment or ephemerally layer on the extra.

I think this proposal is reasonable as "I need extras removed when they're not activated" is sort of a niche requirement, and so making it a non-default isn't... that bad?

@charliermarsh charliermarsh added the preview Experimental behavior label Jul 2, 2024
@charliermarsh
Copy link
Member Author

Another proposal from @konstin:


Alright here's the pitch:

  • uv sync installs everything in .venv. That gives your IDE (and type checkers) all packages for analysis. uv sync retains --package and --extra, but they are when you need that level of control, most users should use uv run. If you use uv sync with parameters, you effectively opt in to managing venv state yourself.
  • uv run runs in a separate environment by default. We place that venv under .uv/venvs, so you can clean it up with rm -rf .uv and inspect it if needs be. This means there are only two folder in your project with uv state (.venv and .uv; that's one too many for me already, but we don't get out of .venv)
  • If you need a per-project env, you can cd packages/bert && uv venv && uv sync -p bert. It's a regular venv still so IDEs will run regularly just on this package.
  • When you do uv run -p bert --extra torch, we create something like .uv/venvs/bert-torch with exactly that set of dependencies. This means that you get both a strict invocation and a cached environment on the next run. (ofc if your lockfile changed, we update the env)

In a way this gives us three levels in the uv CLI:

  • The pip mode, everything is manual, there are the known quirks. Interesting for users coming from pip and pip-tools
  • Managing your venv manually with uv sync --package ... --extra .... Similar to poetry or pdm, that's probably what i'd use a lot cause that's where my level of expertise is
  • The managed uv run mode. You shouldn't know about venvs at all beyond "my IDE see .venv" to use this. This is the easiest, most user friendly way, but also the one with the least control when you're used to pip installing to your hand-curated venvs.

The huge drawback here: Modulo default-{packages,extras,command}, you have to specify all arguments in every uv run every time.

@konstin
Copy link
Member

konstin commented Jul 2, 2024

I want to add another perspective to this: It says the problem we're looking at is venv state. There are several things that influence the state of the venv:

  • Additional packages, both workspace packages (-p) and extras (--extra). To me, a workspace is one big virtual package where the members are optional dependencies like extras, so i've put them here with the extras.
  • Removing packages, such as --no-dev
  • Changing the python version. Changing the python version has the effect that we have to change many packages and that the meaning of extras can change (through markers with both python version and extra condition).

Different use cases want different venvs:

  • IDEs, type checker and other linters want to have a .venv with all dependencies installed (for slightly different definitions of all) so they can inspect your dependencies.
  • When running pytest on a specific of package on a specific python version with specific extras i'd like to test only that configuration. In this case, i'm ready to specify what exactly i need and generally accept that the venv is temporary,
  • When i run e.g. the backend entrypoint, it would be nice if only the backend dependencies were available to check they are correct (nice to have).
  • When deploying (mainly docker containers), i want to sync only a specific set of packages, extras and no dev deps.

Contrary to the pitch in the comment above, i've realized that for my use cases, isolation generally isn't that important, i'm fine with a single "everything venv", as long as i can run with isolation when i need it. Caching isolated temporary venvs created e.g. running pytest with isolation (ideally in a single gitignored folder in the project root where i can clear them) would be nice.

@potiuk
Copy link

potiuk commented Jul 2, 2024

I'd agree with @konstin - and that's something that might be useful for Airflow.

For development, even for regular running of the tests, having single .venv where you can install all packages is ideal, while you have a way to run tests or scripts in separate, isolated venvs. You should be able to choose whether you want to install all packages or just one or few packages there.

There are cases where you have "main" package and "extra" packages and some of the extra packages have (sometimes optional) dependencies on each other - and in this case isolated environments will not work - because even for tests you need the "other" packages as well - at least for some of the tests.

So natural behaviour would be:

a) run your development, autocomplete, tests in your "potentially-all-combined" environment.
b) cd to your package and run your tests (or subset of those that do not require other packages) there in an "isolated environment).
c) (optional) allow those "isolated" environments to install other packages if you really want (to be able to test those cross-package tests)

I think choosing which packages to install and syncing the "main" or "isolated" (if c) is supported) venvs should be purely additive - UNLESS you deliberately choose to cleanup the venv before. That should simplify all the "remove package from spec" cases - you allow the venv to have more things than needed, and if. you really want a clean version, run the installation with --clean switch.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
preview Experimental behavior
Projects
None yet
Development

No branches or pull requests

3 participants