Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor and allow some support of extra-index-urls again #5028

Closed
wants to merge 1 commit into from

Conversation

matteius
Copy link
Member

@matteius matteius commented Apr 2, 2022

This PR tries to strike a balance between enforcing security in the new package restrictions, while still allowing extra indexes be specified loosely. I am not convinced that we should be doing this longer term -- in other words I think we should be aiming to pin our packages to specifically which index the package should need to be pulled from, a specific index if not the primary index, which is usually pypi.

As the pip documentation points out:

Using this option to search for packages which are not in the main repository (such as private packages) is unsafe, per a security vulnerability called dependency confusion: an attacker can claim the package on the public repository in a way that will ensure it gets chosen over the private package.

There is another caveat to not pinning your index explicitly, in the case of Torch. The caveat is that in this current incantation of the pipenv code, the pinned index correctly picks up all 8 hashes from the pinned index; however, if you rely on the extra index searching to find your dependency in the extra index (without explicitly pinning the requirement to the alternate index) it for some reason only pulls in the 1 hash that matches your system architecture out of the 8. I am not sure that is going to be sorted out readily as the hash aggregation logic is, well complicated and dependent on other factors in the resolution process as well.

I wanted to put this out there as a fix for the reported issues in the near term and gather feedback on what the real use case is for not pinning the index requirement in the first place? Note that the logic implicitly pins all packages with a non-specified index to be the default, which is usually pypi. So you can see how just having a second extra index here with this change makes all unpinned packages less predictable as to where they will be pulled from--it at least becomes not explicit.

The issue

Fixes #5022
Fixes #5021

The fix

@DanielPerezJensen @Bananaman and @masato-yasuda Could you review my concerns and potential change and provide feedback on the issues at hand. This technically fixes the issue being reported of the package not resolving when the index is not explicitly pinned, but I don't love the caveat that the hash collection is now incomplete when relying on this feature (I'm not entirely sure why yet). Plus I don't want to invest a lot of time into supporting it when there exists the security advisory from last year, it feels ultimately safer to be explicit in pinning indexes. Should we accept this patch? Will it really be improving quality of life and a legitimate use case, or will it be weakening the security of the tool? I am willing to consider it, since that was the prior behavior--but also willing to consider longer term deprecating it in favor of always pinning the non primary index packages to a specific index by name and potentially improving what CLI arguments are available to help manage this.

The checklist

  • Associated issue
  • A news fragment in the news/ directory to describe this fix with the extension .bugfix, .feature, .behavior, .doc. .vendor. or .trivial (this will appear in the release changelog). Use semantic line breaks and name the file after the issue number or the PR #.

@DanielPerezJensen
Copy link

@matteius I don't feel like I have the expertise or knowledge to properly evaluate if this is the correct course. To my mind it does make more sense, but any security concerns should have priority I feel.

@matteius
Copy link
Member Author

matteius commented Apr 4, 2022

I think there are too many edge cases of trying to support the --extra-index-url as it was before -- for example, when pypi isn't actually your primary index or pypi mirror this logic breaks down or has potential edge cases I haven't considered. If you look at my other linked PR that improves the documentation and potentially features around named and pinned indexes, that is where I think the real effort should be expended.

@masato-yasuda
Copy link

masato-yasuda commented Apr 5, 2022

@matteius
Thanks very much for your information and PR! I also talked with my team member and I got the following comment :)


Thank you for your information

Let me explain our usecase in detail. We think many users have similar usecase.

We have private index, and this private index cannnot be access without access token.
We have to pass our access token to pipenv via environment variable, because we build container image in CI pipline.
Of course, our token must not be included in Pipfile and Pipfile.lock.

We want to install private packages from our private index and also public packages from pypi.
Our private packages may depends on the other private packages or public packages or both.

After reading your comment and PR https://github.com/pypa/pipenv/pull/5029/files#diff-67eeccaffd4730cc19ea43afd8caafcf84db895d57ac27bf10e0ad1e002ce17c, we have understood how we should use.
And we think the best practice is the following.

Create Pipfile
Run the following command.

pipenv install

Add private index entry to Pipfile manually
Edit Pipvile using text editor as follows.

Pipfile

[[source]]
url = "https://pypi.org/simple"
verify_ssl = true
name = "pypi"

[[source]]
url = "https://${MY_CREDENTIAL}@mydomain/simple"
verify_ssl = true
name = "myindexname"

[packages]

[dev-packages]

[requires]
python_version = "3.8"

Install packages

# To install public package, run the following command.
pipenv install requests
# To install private package, run the following command.
pipenv install --index=myindexname private-package

We have tested this sequence using pipenv==2022.3.28 and that works fine!
So, we will change how we use. Thank you very much.


I would like to add the reason why we didn't use this flow and encounterd the issue #5021.
This is simply because we didn't understood how we should do, and after try-and-error we got the following wrong solution about an years ago.

We created Pipfile by running

pipenv install

And edit Pipfile as follows.
Pipfile

[[source]]
url = "https://pypi.org/simple"
verify_ssl = true
name = "pypi"

[[source]]
url = "https://${MY_CREDENTIAL}@mydomain/simple"
verify_ssl = true
# We did not specify name to this index.

[packages]

[dev-packages]

[requires]
python_version = "3.8"

Then, we could install our private packages using pipenv==v2020.11.15.

pipenv install private-package

This command succeeded, but recently we noticed that we got broken Pipfile.lock.
The Pipfile.lock says that private-package was installed from pypi, but this package does not exists in pypi.

Pipfile.lock

...
    "default": {
        "private-package": {
            "hashes": [
                "{hash}",
                "{hash}"
            ],
            "index": "pypi",
            "version": "==0.x.y"
        },

...

After updating pipenv, we have encounterd the following issues and reported.
#5002
#5021

I'm wondoring if some users ,like us, misunderstand how to use pipenv with their private index.
How about adding some information to these section in docs?
https://pipenv.pypa.io/en/latest/advanced/#injecting-credentials-into-pipfiles-via-environment-variables
https://pipenv.pypa.io/en/latest/advanced/#specifying-package-indexes

For example, add this statement to https://pipenv.pypa.io/en/latest/advanced/#specifying-package-indexes
This statement help users know how to install via command line.

...

You can install requests from the "home" index by running the following command.
pipenv install --index=home requests

...

And we noticed that the help message of pipenv install --help a little confusing.
It says -i, --index TEXT Target PyPI-compatible package index url., but we can also specify the package index name which is defined in the Pipfile.

I also think improving documentation leads many users the best solution, as you mentioned.

@matteius
Copy link
Member Author

matteius commented Apr 6, 2022

@masato-yasuda Thank you for your useful feedback. I have added the essence of what you have captured in the specifying indexes section in this open PR: #5029

Could you indicate more what additional detail you are looking for in the section on credentials with environment variables?

I will look into what you mean about the install help text.

@oz123 oz123 closed this in #5029 Apr 6, 2022
@masato-yasuda
Copy link

@matteius Thank you for your comment & updates. Let me share my team member's answer about your question.


For the section on credentials, I thought that the similar statement will help users if exists. However, It might not be necessary.
The section on Specifying Package Indexes, which is near the credential section, would guide them.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
3 participants