Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CEP for specifying the location of site-packages in repodata #90

Merged
merged 23 commits into from
Oct 22, 2024

Conversation

jjhelmus
Copy link
Contributor

Description

This CEP proposed to introduce an optional python_site_packages_path field in repodata that python packages can use to explicitly specify the location of the site-packages directory. This allows for noarch: python packages to be linked into the correct location when this path does not match the on used by standard builds of CPython. This includes alternative implementation of the Python programming langauge (PyPy, GraalPy) and free-threading builds of CPython.

The motivation and request for this CEP came from conda/conda#14053, which discusses noarch: python support in conda in environments with the experimental free-threading configuration of CPython 3.13.

cep-999.md Outdated Show resolved Hide resolved
cep-999.md Outdated Show resolved Hide resolved
cep-999.md Outdated Show resolved Hide resolved
@dholth
Copy link
Contributor

dholth commented Sep 16, 2024

https://github.com/conda/ceps/pull/86/files is working in the same space, to allow ABI3 packages

Copy link
Contributor

@dholth dholth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tend to be in favor of putting the site-packages path in a separate file in python's info/, for example in link.json or paths.json where it would be possible to increment the version, and make it non-installable by incompatible condas.

cep-999.md Outdated Show resolved Hide resolved
cep-999.md Outdated
"subdir": osx-arm64,
"version": "3.13.0rc1",
"sha256": "fa0ae22c13450fe6c30c754ee5efbd7fe7e7533b878d7be96e74de56211d19df",
"python_site_packages_path": "lib/python3.13t/site-packages"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"python_site_packages_path": "lib/python3.13t/site-packages"
"python_site_packages": "lib/python3.13t/site-packages"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The field specified in the CEP is python_site_packages_path with _path

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, suggestion is to rename without _path

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm open to either name. Does anyone have a strong opinion one way or another?

jjhelmus and others added 2 commits September 20, 2024 11:30
Co-authored-by: Daniel Holth <[email protected]>
Co-authored-by: Daniel Holth <[email protected]>
@jjhelmus
Copy link
Contributor Author

I have put together a draft PR for conda-build that implements support for specifying this field in the build section of a recipe, see conda/conda-build#5502.

With this I created sample packages that specify this metadata field for testing with downstream tools. They are available on anaconda.org in the jjhelmus/label/sp_path channel.

There are three noarch packages with the following paths specified

  • python 3.99.99 : lib/python3.99t/site-packages - A "free-threading" like path
  • python 3.99.90 : sample_path/from_the_file/index_json - Something that stands out
  • python 3.99.50 : ../../../../../invalid_path : Should be rejected by tooling as being invalid

@jjhelmus
Copy link
Contributor Author

conda/conda#14256 contains an rough draft of the change needed for conda to make use of this field.

@isuruf
Copy link

isuruf commented Sep 29, 2024

Should we fix the name for python in the shebang of entry-points in this CEP or a separate one? Currently there's a python3.13t->python3.13 symlink

@jezdez
Copy link
Member

jezdez commented Oct 4, 2024

Should we fix the name for python in the shebang of entry-points in this CEP or a separate one? Currently there's a python3.13t->python3.13 symlink

I'd suggest keeping it here TBH, since a whole CEP for a simple change like this may be overkill.

Copy link
Member

@jezdez jezdez left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left a few comments, mostly just tweaks for readability

cep-999.md Outdated Show resolved Hide resolved
cep-999.md Outdated Show resolved Hide resolved
cep-999.md Outdated Show resolved Hide resolved
cep-999.md Outdated Show resolved Hide resolved
cep-999.md Outdated Show resolved Hide resolved
cep-999.md Outdated Show resolved Hide resolved
cep-999.md Outdated Show resolved Hide resolved
cep-999.md Outdated Show resolved Hide resolved
cep-999.md Outdated Show resolved Hide resolved
cep-999.md Outdated
Comment on lines 58 to 62
def is_valid(target_prefix: str, python_site_packages_path: str) -> bool:
target_prefix = os.path.realpath(target_prefix)
full_path = os.path.realpath(os.path.join(target_prefix, python_site_packages_path))
test_prefix = os.path.commonpath((target_prefix, full_path))
return test_prefix == target_prefix
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
def is_valid(target_prefix: str, python_site_packages_path: str) -> bool:
target_prefix = os.path.realpath(target_prefix)
full_path = os.path.realpath(os.path.join(target_prefix, python_site_packages_path))
test_prefix = os.path.commonpath((target_prefix, full_path))
return test_prefix == target_prefix
from pathlib import Path
def is_valid(target_prefix: str, python_site_packages_path: str) -> bool:
target_prefix = Path(target_prefix).resolve()
full_path = (target_prefix / python_site_packages_path).resolve()
return target_prefix in full_path.parents

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This style of check would require resolving target_prefix as well as it needs to take into account symbolic links and various re-directs on Windows file systems.
The function as written in the CEP closely matches the filter used by by tarfile module in the CPython standard library for validating paths, https://github.com/python/cpython/blob/v3.13.0/Lib/tarfile.py#L816-L818

@isuruf
Copy link

isuruf commented Oct 4, 2024

A bit of context about the python executable name.

On Unix, CPython (upstream and conda) has python3.13 for freethreading and default builds. For freethreading python3.13 is a symlink to python 3.13t. On Unix, PyPy does the same

On Windows, CPython upstream has python.exe for default build and python3.13t.exe for freethreading builds as there's only one installer. On conda-forge, python3.13.exe and python.exe is a copy of python3.13t.exe for freethreading builds.
On PyPy, they have python3.10.exe copy for python.exe

For noarch: packages, the entry-points right now uses python3.13 for unix and python.exe for windows. Do we want to keep using these or have a way for specifying the executable name? Note that it's straightforward to change the name in Unix as it's about changing the shebang, but it's hard on windows due to the need to create an executable.

I'm okay with keeping the current status quo and requiring python<major>.<minor> on Unix and python.exe on Windows for all python flavours.

@jjhelmus
Copy link
Contributor Author

jjhelmus commented Oct 4, 2024

At least on macOS, Python virtual environment created using venv from the standard library provide executables named python, python3 and python3.x with the second two being symlinks to the first. A t executable is not provided for freethreading environments which is surprising to me.

By default the path of the executable used to run pip is used when creating console entry-points in these environments. The pip, pip3 and pip3.x command all use the path to python for the executable so this path will be used unless pip is started using python3{.X} -m pip.

Based on this I would be in favor of conda using python for the shebang in entry-points.

@jjhelmus
Copy link
Contributor Author

jjhelmus commented Oct 4, 2024

I tend to be in favor of putting the site-packages path in a separate file in python's info/, for example in link.json or paths.json where it would be possible to increment the version, and make it non-installable by incompatible condas.

The benefit of including the site-packages path in repodata is the ability to correct it with hotfixing after the package is built. This can be used to add this field to existing packages, like the PyPy and GraalPy packages in conda-forge, or correct mistakes in future packages. If functionality is not needed then the field could be moved to link.json or paths.json.

Another benefit of including this is repodata is discoverability. Repodata can be examined without downloading and extracting the package itself.

Incrementing the version would prevent the installation with incompatible version of condas but the user experience is not great. Specifically the environment would solve and the user would not be aware of a problem until the packages are downloaded, extracted and staged for linking.

jjhelmus and others added 9 commits October 4, 2024 13:57
Co-authored-by: Jannis Leidel <[email protected]>
Co-authored-by: Jannis Leidel <[email protected]>
Co-authored-by: Jannis Leidel <[email protected]>
Co-authored-by: Jannis Leidel <[email protected]>
Co-authored-by: Jannis Leidel <[email protected]>
Co-authored-by: Jannis Leidel <[email protected]>
Co-authored-by: Jannis Leidel <[email protected]>
Co-authored-by: Jannis Leidel <[email protected]>
Co-authored-by: Jannis Leidel <[email protected]>
Co-authored-by: Jannis Leidel <[email protected]>
Copy link

@mariusvniekerk mariusvniekerk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So what should the behavior be if more than one package in installed with this field set?

I assume failure of some kind?

@baszalmstra
Copy link
Contributor

Another advantage of storing this information in the repodata is that that information is always readily available during installation, even for already installed records (even if the cache is cleared).

@baszalmstra
Copy link
Contributor

So what should the behavior be if more than one package in installed with this field set?

Good question! I assumed it would be ignored except for the package that provides python. In rattler we currently assume that is the package called “python”. I wonder how conda does this.

@jjhelmus
Copy link
Contributor Author

jjhelmus commented Oct 8, 2024

So what should the behavior be if more than one package in installed with this field set?

I assume failure of some kind?

This field is only allowed on the conda package named python, if packages with other names include this field it is ignored as mentioned in the CEP. There can only be one python package installed in an environment so there will never be a case where there is more that one package installed with this field.

Another way of thinking about this change is that instead of using the package with name python to determine the major and minor version that are used in the site-packages path, the python package can also use this field to declare the path. The sample conda PR shows how this could be implemented.

@jezdez
Copy link
Member

jezdez commented Oct 8, 2024

In coordination with @jjhelmus, I post this request for a vote:

This vote falls under the "Enhancement Proposal Approval" policy of the conda governance policy,
please vote and/or comment on this proposal at your earliest convenience.

It needs 60% of the Steering Council to vote yes to pass.

To vote, please use the following form to vote. If you have questions concerning the proposal, you may also leave a comment or code review.

This vote will end in one week on 2024-10-15, End of Day, Anywhere on Earth (AoE).

@xhochy (Uwe Korn)

  • yes
  • no
  • abstain

@CJ-Wright (Christopher J. 'CJ' Wright)

  • yes
  • no
  • abstain

@mariusvniekerk (Marius van Niekerk)

  • yes
  • no
  • abstain

@chenghlee (Cheng H. Lee)

  • yes
  • no
  • abstain

@ocefpaf (Filipe Fernandes)

  • yes
  • no
  • abstain

@marcelotrevisani (Marcelo Duarte Trevisani)

  • yes
  • no
  • abstain

@msarahan (Michael Sarahan)

  • yes
  • no
  • abstain

@mbargull (Marcel Bargull)

  • yes
  • no
  • abstain

@jakirkham (John Kirkham)

  • yes
  • no
  • abstain

@jezdez (Jannis Leidel)

  • yes
  • no
  • abstain

@wolfv (Wolf Vollprecht)

  • yes
  • no
  • abstain

@jaimergp (Jaime Rodríguez-Guerra)

  • yes
  • no
  • abstain

@kkraus14 (Keith Kraus)

  • yes
  • no
  • abstain

@baszalmstra (Bas Zalmstra)

  • yes
  • no
  • abstain

@beckermr (Matthew R. Becker)

  • yes
  • no
  • abstain

@Hind-M (Hind Montassif)

  • yes
  • no
  • abstain

@trallard (Tania Allard)

  • yes
  • no
  • abstain

@dholth
Copy link
Contributor

dholth commented Oct 8, 2024

When conda doesn't understand this field, does that Python package work?

@jjhelmus
Copy link
Contributor Author

jjhelmus commented Oct 8, 2024

When conda doesn't understand this field, does that Python package work?

Yes, the current GraalPy, PyPy and free-threading CPython python packages in conda-forge include work arounds like symlinks or sitecustomize.py files that so that noarch: python packages work even though conda installs these packages into the incorrect path. The CEP mentions these work arounds the recommends that they be maintained for some time:

Because existing releases of conda, mamba and pixi do not support this field, python packages should continue to provide work around to allow files to be installed into an incorrect site-packages directory until such time as this proposal has been implemented and available for a sufficient amount of time.

@jezdez jezdez added the vote Voting following governance policy label Oct 9, 2024
cep-999.md Outdated Show resolved Hide resolved
Co-authored-by: jaimergp <[email protected]>
@wolfv
Copy link
Contributor

wolfv commented Oct 13, 2024

I just had one shower-thought: what if we would store a map for this, and could that help with noarch: ruby / noarch: php etc. packages?

It's been a long standing issue that we cannot support other languages with noarch packages as easily as Python for which each package manager needs to add some specific code.

If we had a noarch_map: { ... } field of some sort that would map directories, such as:

"noarch_map": {
   "site-packages": "lib/python3.10/site-packages"
}

And for Ruby it could read something like:

"noarch_map": {
   "gems": "lib/ruby5.3/gemfiles/gems"
}

This would still leave open the question on pre-compiling py / rb / ... files but it would solve one of the noarch-ruby problems and would not introduce more python special handling.

The noarch_map could be consulted for any noarch package (generic or not) that contains a folder of the sort.

I think this might be worth it to revisit if we can afford it.

I was trying to find some old references / issues, could only find one so far.

@isuruf
Copy link

isuruf commented Oct 14, 2024

I don't think we should introduce any other noarch types. noarch: generic is enough for most use-cases and we should just use a version independent path for other languages. For python, it's historic debt.

@mcg1969
Copy link

mcg1969 commented Oct 14, 2024

But is that possible?

Certainly, one interpretation of noarch-Python is that it's historic debt. But eliminating it would require patching the default path search behavior of python; and there is version-dependent byte compilation as well.

In contrast, for R we are fortunate that the search path is platform and version-independent, but the byte compilation issue remains, which means that currently we have to rebuild many noarch packages with each major R release.

I think it would be good to enumerate the additional languages being considered and the potential obstacles to the use of noarch-generic. If there are none, great, but at least for me I'm speaking from ignorance of these alternatives.

@wolfv
Copy link
Contributor

wolfv commented Oct 14, 2024

@isuruf we wouldn't have to introduce any new noarch types. Just move the files in the right places and the noarch-map would take care of it :)
I am just advocating to produce less technical debt for python specifics.

@isuruf
Copy link

isuruf commented Oct 14, 2024

Certainly, one interpretation of noarch-Python is that it's historic debt. But eliminating it would require patching the default path search behavior of python; and there is version-dependent byte compilation as well.

Right, it would require a .pth file to change the search behaviour. noarch: python should be used only for byte compilation, the moving of files shouldn't be necessary IMO, but unrelated to this CEP.

@wolfv, yes, but it's adding another layer of unnecessary behaviour when all we need to do is to make sure the packages end up in a version-independent path in the first place. You are producing more general technical debt in order to produce less technical debt for python specifics.

@baszalmstra
Copy link
Contributor

@jezdez What is the status of this CEP? Did it pass or are there still discussions to be had?

@jezdez
Copy link
Member

jezdez commented Oct 20, 2024

@jezdez What is the status of this CEP? Did it pass or are there still discussions to be had?

I'll update the PR in the coming week, currently out of office.

@jjhelmus
Copy link
Contributor Author

If we had a noarch_map: { ... } field of some sort that would map directories, such as:

Why make this a nested field when any individual package should only ever define one entry. A python package can declare the location of the site-packages directory, it would not declare a R or Ruby location. In addition "Flat is better than nested".

@wolfv
Copy link
Contributor

wolfv commented Oct 21, 2024

One could imagine multiple entries, such as:

noarch_map: {
  "r_bin": "lib/r/5.0/bin",
  "r_lib": "lib/r/lib",
}

and noarch packages that would have r_bin and r_lib at the root would get mapped.

But yeah, idk, maybe not useful enough :)

@jjhelmus
Copy link
Contributor Author

conda/rattler#909 has the beginnings of an implementation of this CEP in rattler.

@wolfv
Copy link
Contributor

wolfv commented Oct 21, 2024

Awesome, that's great! :)

@jezdez
Copy link
Member

jezdez commented Oct 21, 2024

The vote is closed, and we have the following result:

Total voters: 17 (valid: 13 = 76.47%)

Yes votes (13 / 100.00%):

No votes (0 / 0.00%)):

Abstain votes (0 / 0.00%):

Not voted (4):

Invalid votes (0):

Thus we reached quorum and enough YES votes to mark this as accepted. 🎉

@jezdez jezdez merged commit 982868f into conda:main Oct 22, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
vote Voting following governance policy
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants