Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add google-cloud-bigquery as explicit google-provider dependency #38753

Merged
merged 1 commit into from
Apr 4, 2024

Conversation

potiuk
Copy link
Member

@potiuk potiuk commented Apr 4, 2024

The new Google Cloud Bigquery released few days ago, caused some weird backtracking UV issue for Python 3.11 builds where airflow uses PyPI providers in PROD image builds. UV seems to fail on really old version of google-cloud-bigquery (1.28.2) that has a bad version specifier for one of the dependencies (de instead of dev).

We should add the google-cloud-bigquery explicitly and limit it to a relatively newer version. Airflow uses latest 3.20.1 version now in constraints and limiting bigquery to >= 3.0.1 (first non-yanked version released > 2 years ago in March 2022) is a good lower limit.


^ Add meaningful description above
Read the Pull Request Guidelines for more information.
In case of fundamental code changes, an Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in a newsfragment file, named {pr_number}.significant.rst or {issue_number}.significant.rst, in newsfragments.

The new Google Cloud Bigquery released few days ago, caused some
weird backtracking UV issue for Python 3.11 builds where airflow uses
PyPI providers in PROD image builds. UV seems to fail on really
old version of google-cloud-bigquery (1.28.2) that has a bad version
specifier for one of the dependencies (de instead of dev).

We should add the google-cloud-bigquery explicitly and limit it to
a relatively newer version. Airflow uses latest 3.20.1 version now
in constraints and limiting bigquery to >= 3.0.1 (first non-yanked
version released > 2 years ago in March 2022) is a good lower
limit.
@boring-cyborg boring-cyborg bot added area:providers provider:google Google (including GCP) related issues labels Apr 4, 2024
@potiuk
Copy link
Member Author

potiuk commented Apr 4, 2024

Example problem from v2-9-test it solves: https://github.com/apache/airflow/actions/runs/8555464849/job/23446007205#step:10:3693

#64 6.287   Caused by: Couldn't parse metadata of google_cloud_bigquery-1.28.2-py2.py3-none-any.whl from https://files.pythonhosted.org/packages/ce/af/89ccb3dd70a86516cb408dd7b7484d2fdd073bdce6405f722f75e6058e66/google_cloud_bigquery-1.28.2-py2.py3-none-any.whl.metadata
  #64 6.287   Caused by: after parsing 2.0, found "de" after it, which is not part of a valid version

@potiuk potiuk merged commit 5c8510a into apache:main Apr 4, 2024
92 checks passed
@potiuk potiuk deleted the limit-google-cloud-bigquery branch April 4, 2024 22:35
@hussein-awala
Copy link
Member

I could not reproduce it with UV 0.1.29 and the command you provided in the issue created in UV project, but I didn't try with 0.1.28:

(airflow) ➜  airflow git:(uv_0.1.29) ✗ uv pip install 'apache-airflow[aiobotocore,amazon,async,celery,cncf-kubernetes,common-io,docker,elasticsearch,ftp,google,google-auth,grap
hviz,grpc,hashicorp,http,ldap,microsoft-azure,mysql,odbc,openlineage,pandas,postgres,redis,sendgrid,sftp,slack,snowflake,ssh,statsd,uv,virtualenv] @ file:///Users/hussein-awala
/github/airflow/dist/apache_airflow-2.10.0.dev0-py3-none-any.whl'

Resolved 339 packages in 47.68s
   Built python-ldap==3.4.4
   Built starkbank-ecdsa==2.2.0
   Built mysqlclient==2.2.4
   Built wirerope==0.4.7
   Built methodtools==0.4.7
   Built json-merge-patch==0.2                                                                                                                                                  Downloaded 121 packages in 5.11s
Installed 121 packages in 382ms
 + adal==1.2.7
 + adlfs==2023.10.0
 - apache-airflow==2.8.3
 + apache-airflow==2.10.0.dev0 (from file:///Users/hussein-awala/github/airflow/dist/apache_airflow-2.10.0.dev0-py3-none-any.whl)
 + apache-airflow-providers-celery==3.6.1
 + apache-airflow-providers-docker==3.9.2
 + apache-airflow-providers-elasticsearch==5.3.3
 - apache-airflow-providers-fab==1.0.1
 + apache-airflow-providers-fab==1.0.2
 + apache-airflow-providers-google==8.3.0
 + apache-airflow-providers-grpc==3.4.1
 + apache-airflow-providers-hashicorp==3.6.4
 + apache-airflow-providers-microsoft-azure==9.0.1
 + apache-airflow-providers-mysql==5.5.4
 + apache-airflow-providers-odbc==4.1.0
 + apache-airflow-providers-openlineage==1.6.0
 + apache-airflow-providers-postgres==5.10.2
 + apache-airflow-providers-redis==3.6.0
 + apache-airflow-providers-sendgrid==3.4.0
 + apache-airflow-providers-sftp==4.9.0
 + apache-airflow-providers-slack==8.6.1
 + apache-airflow-providers-snowflake==5.2.0
 + apache-airflow-providers-ssh==3.10.1
 + asyncssh==2.14.2
 + authlib==1.3.0
 + azure-batch==14.2.0
 + azure-datalake-store==0.0.53
 + azure-keyvault-secrets==4.8.0
 + azure-kusto-data==4.3.1
 + azure-mgmt-containerinstance==10.1.0
 + azure-mgmt-containerregistry==10.3.0
 + azure-mgmt-datafactory==6.1.0
 + azure-mgmt-datalake-nspkg==3.0.1
 + azure-mgmt-datalake-store==0.5.0
 + azure-mgmt-nspkg==3.0.2
 + azure-mgmt-resource==23.0.1
 + azure-mgmt-storage==21.1.0
 + azure-nspkg==3.0.2
 + azure-servicebus==7.12.1
 + azure-storage-blob==12.19.1
 + azure-storage-file-datalake==12.14.0
 + azure-storage-file-share==12.15.0
 + azure-synapse-artifacts==0.18.0
 + azure-synapse-spark==0.7.0
 + bcrypt==4.1.2
 - boto3==1.28.85
 + boto3==1.28.64
 - botocore==1.31.85
 + botocore==1.31.64
 + cattrs==23.2.3
 + docstring-parser==0.16
 + elastic-transport==8.13.0
 + elasticsearch==8.13.0
 + eventlet==0.36.1
 - flask-appbuilder==4.3.11
 + flask-appbuilder==4.4.1
 + flower==2.0.1
 + gevent==24.2.1
 + google-ads==23.1.0
 - google-api-python-client==2.108.0
 + google-api-python-client==1.12.11
 + google-cloud-aiplatform==1.46.0
 + google-cloud-appengine-logging==1.4.3
 + google-cloud-audit-log==0.2.5
 + google-cloud-automl==2.13.3
 + google-cloud-bigquery-datatransfer==3.15.1
 + google-cloud-bigtable==1.7.1
 + google-cloud-build==3.24.0
 + google-cloud-datacatalog==3.19.0
 + google-cloud-dataform==0.5.9
 + google-cloud-dataplex==1.13.0
 + google-cloud-dataproc==5.9.3
 + google-cloud-dataproc-metastore==1.15.3
 + google-cloud-dlp==1.0.1
 + google-cloud-kms==2.21.3
 + google-cloud-language==1.3.1
 + google-cloud-logging==3.10.0
 + google-cloud-memcache==1.9.3
 + google-cloud-monitoring==2.19.3
 + google-cloud-orchestration-airflow==1.12.1
 + google-cloud-os-login==2.14.3
 + google-cloud-pubsub==2.21.1
 + google-cloud-redis==2.15.3
 + google-cloud-resource-manager==1.12.3
 - google-cloud-secret-manager==2.16.4
 + google-cloud-secret-manager==1.0.1
 + google-cloud-spanner==1.19.2
 + google-cloud-speech==1.3.3
 - google-cloud-storage==2.13.0
 + google-cloud-storage==1.44.0
 + google-cloud-tasks==2.16.3
 + google-cloud-texttospeech==1.0.2
 + google-cloud-translate==1.7.1
 + google-cloud-videointelligence==1.16.2
 + google-cloud-vision==1.0.1
 + google-cloud-workflows==1.14.3
 + grpcio-gcp==0.2.2
 + humanize==4.9.0
 + ijson==3.2.3
 + json-merge-patch==0.2
 + ldap3==2.9.1
 + looker-sdk==24.2.1
 + methodtools==0.4.7
 + msrestazure==0.6.4
 + mysql-connector-python==8.3.0
 + mysqlclient==2.2.4
 + paramiko==3.4.0
 + prometheus-client==0.20.0
 - protobuf==4.24.4
 + protobuf==4.25.3
 + psycopg2-binary==2.9.9
 + pyodbc==5.1.0
 + pyopenssl==24.1.0
 + python-http-client==3.3.7
 + python-ldap==3.4.4
 + sendgrid==6.11.0
 + shapely==2.0.3
 + slack-sdk==3.27.1
 + snowflake-connector-python==3.8.0
 + snowflake-sqlalchemy==1.5.1
 + sqlalchemy-bigquery==1.10.0
 + sshtunnel==0.4.0
 + starkbank-ecdsa==2.2.0
 + statsd==4.0.1
 + tornado==6.4
 - universal-pathlib==0.1.4
 + universal-pathlib==0.2.2
 - uritemplate==4.1.1
 + uritemplate==3.0.1
 + wirerope==0.4.7
 + zope-event==5.0
 + zope-interface==6.2

Do I need to create a new venv?

@potiuk
Copy link
Member Author

potiuk commented Apr 5, 2024

It's been very specific - an I am not sure if it is constantly / currently reproducible - this is the nature of dependency resolving that it constantly changes, depending on what is currrently available in PyPI, what state you are with your cache and some heuristics that might change resolutions even following the slightest changes. What I know about this case is:

It was (consistently) happening:

  1. In v2-9-test branch during the last 2 days or so
  2. ONLY with Python 3.11 (!) . 3.8 - 3.10 and 3.12 were fine ( 🤯 )
  3. In the PROD building step where we built airflow package from sources breeze release-mnagement prepare-airflow-package and using that wheel package to install airflow with all PROD extras (so all other providers and deps were supposed to be installed from PyPi.
  4. UV cache should be clean and disabled (export UV_NO_CACHE="true") - UV cache increases size of the image almost 2x so we disable it.
  5. The only packages installed in the venv were pip==24.0 and uv==1.28.0
  6. This was the command that failed:

uv pip install 'apache-airflow[aiobotocore,amazon,async,celery,cncf-kubernetes,common-io,
docker,elasticsearch,ftp,google,google-auth,graphviz,grpc,hashicorp,http,ldap,
microsoft-azure,mysql,odbc,openlineage,pandas,postgres,redis,sendgrid,sftp,
slack,snowflake,ssh,statsd,uv,virtualenv] @ file:///docker-context-files/apache_airflow-2.9.0-py3-none-any.whl'

And even for our builds - this is a very unusual step - usually we install airflow with constraints generated with the CI build. But this one does not use constraints, because this is a CACHE build - one that produces a base PROD image that we are using to build subsequent PROD images - and in this case, it almost does not matter what resolution we arrrive it becasue that particular step is going to be invalidated anyway because we will build a different airlfow packge next time, so in this case it only matters that this step is fast and succeeds so that all the previous layers can be used to build the next PROD image from subsequent v2-9-test build faster.

And the error was:

#64 6.287 error: Failed to download: google-cloud-bigquery==1.28.2
  #64 6.287   Caused by: Couldn't parse metadata of google_cloud_bigquery-1.28.2-py2.py3-none-any.whl from https://files.pythonhosted.org/packages/ce/af/89ccb3dd70a86516cb408dd7b7484d2fdd073bdce6405f722f75e6058e66/google_cloud_bigquery-1.28.2-py2.py3-none-any.whl.metadata
  #64 6.287   Caused by: after parsing 2.0, found "de" after it, which is not part of a valid version
  #64 6.287 pyarrow (<2.0de,>=1.0.0) ; (python_version >= "3.5") and extra == 'all'

You can see one of the failing builds here: https://github.com/apache/airflow/actions/runs/8555464849/job/23446007205#step:10:3677

Corresponding builds for other Python versions resulted in:

  #60 1.344 + uv pip install 'apache-airflow[aiobotocore,amazon,async,celery,cncf-kubernetes,common-io,docker,elasticsearch,ftp,google,google-auth,graphviz,grpc,hashicorp,http,ldap,microsoft-azure,mysql,odbc,openlineage,pandas,postgres,redis,sendgrid,sftp,slack,snowflake,ssh,statsd,uv,virtualenv] @ file:///docker-context-files/apache_airflow-2.9.0-py3-none-any.whl'
  #60 9.585 Resolved 339 packages in 8.23s
  #60 16.38 Downloaded 336 packages in 6.78s
  #60 17.14 Installed 336 packages in 761ms
  #60 17.14  + adal==1.2.7
  #60 17.14  + adlfs==2024.2.0
...

You can see more failed runs here: https://github.com/apache/airflow/actions?query=branch%3Av2-9-test - it WAS consistently happening until I switched the builds to use pip . For example:

https://github.com/apache/airflow/actions/runs/8559085301/job/23458003710#step:10:3718

This one takes a bit longer as expected (145 s) - but works.

 #64 2.532 + pip install --root-user-action ignore 'apache-airflow[aiobotocore,amazon,async,celery,cncf-kubernetes,common-io,docker,elasticsearch,ftp,google,google-auth,graphviz,grpc,hashicorp,http,ldap,microsoft-azure,mysql,odbc,openlineage,pandas,postgres,redis,sendgrid,sftp,slack,snowflake,ssh,statsd,uv,virtualenv] @ file:///docker-context-files/apache_airflow-2.9.0-py3-none-any.whl'
  #64 3.259 Processing /docker-context-files/apache_airflow-2.9.0-py3-none-any.whl (from apache-airflow[aiobotocore,amazon,async,celery,cncf-kubernetes,common-io,docker,elasticsearch,ftp,google,google-auth,graphviz,grpc,hashicorp,http,ldap,microsoft-azure,mysql,odbc,openlineage,pandas,postgres,redis,sendgrid,sftp,slack,snowflake,ssh,statsd,uv,virtualenv]@ file:///docker-context-files/apache_airflow-2.9.0-py3-none-any.whl)
  #64 3.627 Collecting alembic<2.0,>=1.13.1 (from apache-airflow@ file:///docker-context-files/apache_airflow-2.9.0-py3-none-any.whl->apache-airflow[aiobotocore,amazon,async,celery,cncf-kubernetes,common-io,docker,elasticsearch,ftp,google,google-auth,graphviz,grpc,hashicorp,http,ldap,microsoft-azure,mysql,odbc,openlineage,pandas,postgres,redis,sendgrid,sftp,slack,snowflake,ssh,statsd,uv,virtualenv]@ file:///docker-context-files/apache_airflow-2.9.0-py3-none-any.whl)
...
 #64 145.0 Successfully installed Babel-2.14.0 Flask-Babel-2.0.0 Flask-JWT-Extended-4.6.0 Flask-Limiter-3.5.1 Flask-SQLAlchemy-2.5.1 Mako-1.3.2 PyAthena-3.6.0 PyOpenSSL-24.1.0 PyYAML-6.0.1 WTForms-3.1.2 adal-1.2.7 adlfs-2024.2.0 aiobotocore-2.12.2 aiofiles-23.2.1 aiohttp-3.9.3 aioitertools-0.11.0 aiosignal-1.3.1 alembic-1.13.1 amqp-5.2.0 annotated-types-0.6.0 anyio-4.3.0 apache-airflow-2.9.0 apache-airflow-providers-amazon-8.19.0 apache-airflow-providers-celery-3.6.1 apache-airflow-providers-cncf-kubernetes-8.0.1 apache-airflow-providers-common-io-1.3.0 apache-airflow-providers-common-sql-1.11.1 apache-airflow-providers-docker-3.9.2 apache-airflow-providers-elasticsearch-5.3.3 apache-airflow-providers-fab-1.0.2 apache-airflow-providers-ftp-3.7.0 apache-airflow-providers-google-10.16.0 apache-airflow-providers-grpc-3.4.1 apache-airflow-providers-hashicorp-3.6.4 apache-airflow-providers-http-4.10.0 apache-airflow-providers-imap-3.5.0 apache-airflow-providers-microsoft-azure-9.0.1 apache-airflow-providers-mysql-5.5.4 apache-airflow-providers-odbc-4.4.1 apache-airflow-providers-openlineage-1.6.0 apache-airflow-providers-postgres-5.10.2 apache-airflow-providers-redis-3.6.0 apache-airflow-providers-sendgrid-3.4.0 apache-airflow-providers-sftp-4.9.0 apache-airflow-providers-slack-8.6.1 apache-airflow-providers-smtp-1.6.1 apache-airflow-providers-snowflake-5.3.1 apache-airflow-providers-sqlite-3.7.1 apache-airflow-providers-ssh-3.10.1 apispec-6.6.0 argcomplete-3.2.3 asgiref-3.8.1 asn1crypto-1.5.1 asyncssh-2.14.2 attrs-23.2.0 authlib-1.3.0 azure-batch-14.2.0 azure-common-1.1.28 azure-core-1.30.1 azure-cosmos-4.6.0 azure-datalake-store-0.0.53 azure-identity-1.15.0 azure-keyvault-secrets-4.8.0 azure-kusto-data-4.3.1 azure-mgmt-containerinstance-10.1.0 azure-mgmt-containerregistry-10.3.0 azure-mgmt-core-1.4.0 azure-mgmt-cosmosdb-9.4.0 azure-mgmt-datafactory-6.1.0 azure-mgmt-datalake-nspkg-3.0.1 azure-mgmt-datalake-store-0.5.0 azure-mgmt-nspkg-3.0.2 azure-mgmt-resource-23.0.1 azure-mgmt-storage-21.1.0 azure-nspkg-3.0.2 azure-servicebus-7.12.1 azure-storage-blob-12.19.1 azure-storage-file-datalake-12.14.0 azure-storage-file-share-12.15.0 azure-synapse-artifacts-0.18.0 azure-synapse-spark-0.7.0 backoff-2.2.1 bcrypt-4.1.2 beautifulsoup4-4.12.3 billiard-4.2.0 blinker-1.7.0 boto3-1.34.51 botocore-1.34.51 cachelib-0.9.0 cachetools-5.3.3 cattrs-23.2.3 celery-5.3.6 certifi-2024.2.2 cffi-1.16.0 chardet-5.2.0 charset-normalizer-3.3.2 click-8.1.7 click-didyoumean-0.3.1 click-plugins-1.1.1 click-repl-0.3.0 clickclick-20.10.2 colorama-0.4.6 colorlog-4.8.0 configupdater-3.2 connexion-2.14.2 cron-descriptor-1.4.3 croniter-2.0.3 cryptography-41.0.7 db-dtypes-1.2.0 decorator-5.1.1 deprecated-1.2.14 dill-0.3.8 distlib-0.3.8 dnspython-2.6.1 docker-7.0.0 docstring-parser-0.16 docutils-0.20.1 elastic-transport-8.13.0 elasticsearch-8.13.0 email-validator-2.1.1 eventlet-0.36.1 filelock-3.13.3 flask-2.2.5 flask-appbuilder-4.4.1 flask-caching-2.1.0 flask-login-0.6.3 flask-session-0.5.0 flask-wtf-1.2.1 flower-2.0.1 frozenlist-1.4.1 fsspec-2024.3.1 gcloud-aio-auth-4.2.3 gcloud-aio-bigquery-7.1.0 gcloud-aio-storage-9.2.0 gcsfs-2024.3.1 gevent-24.2.1 google-ads-23.1.0 google-analytics-admin-0.22.7 google-api-core-2.18.0 google-api-python-client-2.125.0 google-auth-2.29.0 google-auth-httplib2-0.2.0 google-auth-oauthlib-1.2.0 google-cloud-aiplatform-1.46.0 google-cloud-appengine-logging-1.4.3 google-cloud-audit-log-0.2.5 google-cloud-automl-2.13.3 google-cloud-batch-0.17.17 google-cloud-bigquery-3.20.1 google-cloud-bigquery-datatransfer-3.15.1 google-cloud-bigtable-2.23.0 google-cloud-build-3.24.0 google-cloud-compute-1.18.0 google-cloud-container-2.45.0 google-cloud-core-2.4.1 google-cloud-datacatalog-3.19.0 google-cloud-dataflow-client-0.8.10 google-cloud-dataform-0.5.9 google-cloud-dataplex-1.13.0 google-cloud-dataproc-5.9.3 google-cloud-dataproc-metastore-1.15.3 google-cloud-dlp-3.16.0 google-cloud-kms-2.21.3 google-cloud-language-2.13.3 google-cloud-logging-3.10.0 google-cloud-memcache-1.9.3 google-cloud-monitoring-2.19.3 google-cloud-orchestration-airflow-1.12.1 google-cloud-os-login-2.14.3 google-cloud-pubsub-2.21.1 google-cloud-redis-2.15.3 google-cloud-resource-manager-1.12.3 google-cloud-run-0.10.5 google-cloud-secret-manager-2.19.0 google-cloud-spanner-3.44.0 google-cloud-speech-2.26.0 google-cloud-storage-2.16.0 google-cloud-storage-transfer-1.11.3 google-cloud-tasks-2.16.3 google-cloud-texttospeech-2.16.3 google-cloud-translate-3.15.3 google-cloud-videointelligence-2.13.3 google-cloud-vision-3.7.2 google-cloud-workflows-1.14.3 google-crc32c-1.5.0 google-re2-1.1 google-resumable-media-2.7.0 googleapis-common-protos-1.63.0 graphviz-0.20.3 greenlet-3.0.3 grpc-google-iam-v1-0.13.0 grpc-interceptor-0.15.4 grpcio-1.62.1 grpcio-gcp-0.2.2 grpcio-status-1.62.1 gunicorn-21.2.0 h11-0.14.0 httpcore-1.0.5 httplib2-0.22.0 httpx-0.27.0 humanize-4.9.0 hvac-2.1.0 idna-3.6 ijson-3.2.3 importlib-resources-6.4.0 importlib_metadata-7.0.0 inflection-0.5.1 isodate-0.6.1 itsdangerous-2.1.2 jinja2-3.1.3 jmespath-1.0.1 json-merge-patch-0.2 jsonpath_ng-1.6.1 jsonschema-4.21.1 jsonschema-specifications-2023.12.1 kombu-5.3.6 kubernetes-29.0.0 kubernetes_asyncio-29.0.0 lazy-object-proxy-1.10.0 ldap3-2.9.1 limits-3.10.1 linkify-it-py-2.0.3 lockfile-0.12.2 looker-sdk-24.2.1 lxml-5.2.1 markdown-it-py-3.0.0 markupsafe-2.1.5 marshmallow-3.21.1 marshmallow-oneofschema-3.1.1 marshmallow-sqlalchemy-0.28.2 mdit-py-plugins-0.4.0 mdurl-0.1.2 more-itertools-10.2.0 msal-1.28.0 msal-extensions-1.1.0 msrest-0.7.1 msrestazure-0.6.4 multidict-6.0.5 mysql-connector-python-8.3.0 mysqlclient-2.2.4 numpy-1.26.4 oauthlib-3.2.2 openlineage-integration-common-1.11.1 openlineage-python-1.11.1 openlineage-sql-1.11.1 opentelemetry-api-1.24.0 opentelemetry-exporter-otlp-1.24.0 opentelemetry-exporter-otlp-proto-common-1.24.0 opentelemetry-exporter-otlp-proto-grpc-1.24.0 opentelemetry-exporter-otlp-proto-http-1.24.0 opentelemetry-proto-1.24.0 opentelemetry-sdk-1.24.0 opentelemetry-semantic-conventions-0.45b0 ordered-set-4.1.0 pandas-2.1.4 pandas-gbq-0.22.0 paramiko-3.4.0 pathspec-0.12.1 pendulum-3.0.0 platformdirs-3.11.0 pluggy-1.4.0 ply-3.11 portalocker-2.8.2 prison-0.2.1 prometheus-client-0.20.0 prompt-toolkit-3.0.43 proto-plus-1.23.0 protobuf-4.25.3 psutil-5.9.8 psycopg2-binary-2.9.9 pyarrow-15.0.2 pyasn1-0.5.1 pyasn1-modules-0.3.0 pycparser-2.22 pydantic-2.6.4 pydantic-core-2.16.3 pydata-google-auth-1.8.2 pygments-2.17.2 pyjwt-2.8.0 pynacl-1.5.0 pyodbc-5.1.0 pyparsing-3.1.2 python-daemon-3.0.1 python-dateutil-2.9.0.post0 python-dotenv-1.0.1 python-http-client-3.3.7 python-ldap-3.4.4 python-nvd3-0.15.0 python-slugify-8.0.4 pytz-2024.1 redis-4.6.0 redshift_connector-2.1.0 referencing-0.34.0 requests-2.31.0 requests-oauthlib-2.0.0 requests_toolbelt-1.0.0 rfc3339-validator-0.1.4 rich-13.7.1 rich-argparse-1.4.0 rpds-py-0.18.0 rsa-4.9 s3transfer-0.10.1 scramp-1.4.4 sendgrid-6.11.0 setproctitle-1.3.3 setuptools-66.1.1 shapely-2.0.3 six-1.16.0 slack_sdk-3.27.1 sniffio-1.3.1 snowflake-connector-python-3.7.1 snowflake-sqlalchemy-1.5.1 sortedcontainers-2.4.0 soupsieve-2.5 sqlalchemy-1.4.52 sqlalchemy-bigquery-1.10.0 sqlalchemy-jsonfield-1.0.2 sqlalchemy-spanner-1.6.2 sqlalchemy-utils-0.41.2 sqlalchemy_redshift-0.8.14 sqlparse-0.4.4 sshtunnel-0.4.0 starkbank-ecdsa-2.2.0 statsd-4.0.1 tabulate-0.9.0 tenacity-8.2.3 termcolor-2.4.0 text-unidecode-1.3 time-machine-2.14.1 tomlkit-0.12.4 tornado-6.4 typing-extensions-4.10.0 tzdata-2024.1 uc-micro-py-1.0.3 unicodecsv-0.14.1 universal-pathlib-0.2.2 uritemplate-4.1.1 urllib3-2.0.7 vine-5.1.0 virtualenv-20.25.1 watchtower-3.1.0 wcwidth-0.2.13 websocket-client-1.7.0 werkzeug-2.2.3 wrapt-1.16.0 yarl-1.9.4 zipp-3.18.1 zope.event-5.0 zope.interface-6.2

There is an issue I created about it: astral-sh/uv#2821 that uv maintainers seem to be eager to fix quite soon, and there is a bit similar issue (at least with very strange backtracking and failing on some old versions of transitive packages) created by @notatallshaw who actively works on testing and verifying pip and uv resolution algorithms and uses airflow as quite a testing ground: astral-sh/uv#1560 - where the heuristic of uv gives different results than pip (which is pretty expected as in many cases - especlally apache-airflow[all] there are multiple matching solutions) .

What I found so far that our CI builds (where we use devel-all and --resolution highest give usuallly very close results with uv and pip's --eager-upgrade - so I continue using constraint generation using uv as it is way faster.

But installing just airflow[some deps] without --highest resolution or --eager-upgrade gives often quite different results for uv and pip - in this case for example UV installed airflow with not-the-latest-google-provider - that's why switching PROD runs in release branches to pip is likely to stay.

@notatallshaw
Copy link
Contributor

notatallshaw commented Apr 5, 2024

I wasn't able to quickly reproduce this, but some flags that help with reproducing these issues for uv in the future are --dry-run to not pollute your environment, --reinstall to take versions from the index even if you have installed versions that satisfy your requirements, and --exclude-newer to fix a point in time against the the index (e.g. --exclude-newer 2024-04-04T17:00:00Z).

In general the behavior of uv's resolution algorithm (powered by pugrub-rs) has a tendency to want to exchaust all possible versions of a particular package more likely than pip's resolution algorithm (powered by resolvelib), though pip does sometimes produce the same behavior under certain circumstances.

Part of the problem is that no one designing resolution algorithms is thinking of the idea that the source data is messy, and visiting one node over another node on the resolution graph could be significantly more expensive (including causing resoluition to fail).

For this issue of a bad specifier, pip currently supports a very wide range of specifiers and versions due to using legacy specifier and version from an older version of packaging, however pip is going to drop legacy specifiers and versions likely in the next version of pip. So pip is going to become less lenient than uv, and therefore is going to skip over bad specifiers and versions, uv should do the same (and in fact I think uv should eventually drop supporting any kind of bad versions or specifiers).

@potiuk
Copy link
Member Author

potiuk commented Apr 5, 2024

Good input. Thanks @notatallshaw :)

In general the behavior of uv's resolution algorithm (powered by pugrub-rs) has a tendency to want to exchaust all possible versions of a particular package more likely than pip's resolution algorithm (powered by resolvelib), though pip does sometimes produce the same behavior under certain circumstances.

I suggested in my issue that uv could (at least initially) limit a number of versions considered. For example in certain cases - like installing most recent packages - limit all transient dependencies to say max. 2 years old or 10 versions. Even if that would fail, you can repeat it with relaxed limits. This kind of heuristics could save a lot of hassle.

For this issue of a bad specifier, pip currently supports a very wide range of specifiers and versions due to using legacy specifier and version from an older version of packaging, however pip is going to drop legacy specifiers and versions likely in the next version of pip. So pip is going to become less lenient than uv, and therefore is going to skip over bad specifiers and versions, uv should do the same (and in fact I think uv should eventually drop supporting any kind of bad versions or specifiers).

Agree. uv has the big advntage for now that it does not have to support all those legacy cases - far less number of people will complain if something won't work :)

@notatallshaw
Copy link
Contributor

notatallshaw commented Apr 5, 2024

Agree. uv has the big advntage for now that it does not have to support all those legacy cases - far less number of people will complain if something won't work :)

Well, uv is trying to capture the existing Python ecosystem, which means they are making choices to support a wider range of things rather than follow the spec. You can see here all the "fixups" they do to specifiers: https://github.com/astral-sh/uv/blob/c4107f9c40b58cea629b6a7b6e59663d30f96f41/crates/pypi-types/src/lenient_requirement.rs#L11-L57

And since they've been public I've seen users report several times on bad specifiers they rely on and I've seen uv add them to the list of fixups. It's a bit of a shame, but I understand their reasoning. I think it will be up to pip to release a breaking change and force the ecosystem to follow the spec.

@potiuk
Copy link
Member Author

potiuk commented Apr 5, 2024

🙃

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:providers provider:google Google (including GCP) related issues
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants