-
Notifications
You must be signed in to change notification settings - Fork 14.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add google-cloud-bigquery as explicit google-provider dependency #38753
Conversation
The new Google Cloud Bigquery released few days ago, caused some weird backtracking UV issue for Python 3.11 builds where airflow uses PyPI providers in PROD image builds. UV seems to fail on really old version of google-cloud-bigquery (1.28.2) that has a bad version specifier for one of the dependencies (de instead of dev). We should add the google-cloud-bigquery explicitly and limit it to a relatively newer version. Airflow uses latest 3.20.1 version now in constraints and limiting bigquery to >= 3.0.1 (first non-yanked version released > 2 years ago in March 2022) is a good lower limit.
Example problem from v2-9-test it solves: https://github.com/apache/airflow/actions/runs/8555464849/job/23446007205#step:10:3693
|
I could not reproduce it with UV 0.1.29 and the command you provided in the issue created in UV project, but I didn't try with 0.1.28: (airflow) ➜ airflow git:(uv_0.1.29) ✗ uv pip install 'apache-airflow[aiobotocore,amazon,async,celery,cncf-kubernetes,common-io,docker,elasticsearch,ftp,google,google-auth,grap
hviz,grpc,hashicorp,http,ldap,microsoft-azure,mysql,odbc,openlineage,pandas,postgres,redis,sendgrid,sftp,slack,snowflake,ssh,statsd,uv,virtualenv] @ file:///Users/hussein-awala
/github/airflow/dist/apache_airflow-2.10.0.dev0-py3-none-any.whl'
Resolved 339 packages in 47.68s
Built python-ldap==3.4.4
Built starkbank-ecdsa==2.2.0
Built mysqlclient==2.2.4
Built wirerope==0.4.7
Built methodtools==0.4.7
Built json-merge-patch==0.2 Downloaded 121 packages in 5.11s
Installed 121 packages in 382ms
+ adal==1.2.7
+ adlfs==2023.10.0
- apache-airflow==2.8.3
+ apache-airflow==2.10.0.dev0 (from file:///Users/hussein-awala/github/airflow/dist/apache_airflow-2.10.0.dev0-py3-none-any.whl)
+ apache-airflow-providers-celery==3.6.1
+ apache-airflow-providers-docker==3.9.2
+ apache-airflow-providers-elasticsearch==5.3.3
- apache-airflow-providers-fab==1.0.1
+ apache-airflow-providers-fab==1.0.2
+ apache-airflow-providers-google==8.3.0
+ apache-airflow-providers-grpc==3.4.1
+ apache-airflow-providers-hashicorp==3.6.4
+ apache-airflow-providers-microsoft-azure==9.0.1
+ apache-airflow-providers-mysql==5.5.4
+ apache-airflow-providers-odbc==4.1.0
+ apache-airflow-providers-openlineage==1.6.0
+ apache-airflow-providers-postgres==5.10.2
+ apache-airflow-providers-redis==3.6.0
+ apache-airflow-providers-sendgrid==3.4.0
+ apache-airflow-providers-sftp==4.9.0
+ apache-airflow-providers-slack==8.6.1
+ apache-airflow-providers-snowflake==5.2.0
+ apache-airflow-providers-ssh==3.10.1
+ asyncssh==2.14.2
+ authlib==1.3.0
+ azure-batch==14.2.0
+ azure-datalake-store==0.0.53
+ azure-keyvault-secrets==4.8.0
+ azure-kusto-data==4.3.1
+ azure-mgmt-containerinstance==10.1.0
+ azure-mgmt-containerregistry==10.3.0
+ azure-mgmt-datafactory==6.1.0
+ azure-mgmt-datalake-nspkg==3.0.1
+ azure-mgmt-datalake-store==0.5.0
+ azure-mgmt-nspkg==3.0.2
+ azure-mgmt-resource==23.0.1
+ azure-mgmt-storage==21.1.0
+ azure-nspkg==3.0.2
+ azure-servicebus==7.12.1
+ azure-storage-blob==12.19.1
+ azure-storage-file-datalake==12.14.0
+ azure-storage-file-share==12.15.0
+ azure-synapse-artifacts==0.18.0
+ azure-synapse-spark==0.7.0
+ bcrypt==4.1.2
- boto3==1.28.85
+ boto3==1.28.64
- botocore==1.31.85
+ botocore==1.31.64
+ cattrs==23.2.3
+ docstring-parser==0.16
+ elastic-transport==8.13.0
+ elasticsearch==8.13.0
+ eventlet==0.36.1
- flask-appbuilder==4.3.11
+ flask-appbuilder==4.4.1
+ flower==2.0.1
+ gevent==24.2.1
+ google-ads==23.1.0
- google-api-python-client==2.108.0
+ google-api-python-client==1.12.11
+ google-cloud-aiplatform==1.46.0
+ google-cloud-appengine-logging==1.4.3
+ google-cloud-audit-log==0.2.5
+ google-cloud-automl==2.13.3
+ google-cloud-bigquery-datatransfer==3.15.1
+ google-cloud-bigtable==1.7.1
+ google-cloud-build==3.24.0
+ google-cloud-datacatalog==3.19.0
+ google-cloud-dataform==0.5.9
+ google-cloud-dataplex==1.13.0
+ google-cloud-dataproc==5.9.3
+ google-cloud-dataproc-metastore==1.15.3
+ google-cloud-dlp==1.0.1
+ google-cloud-kms==2.21.3
+ google-cloud-language==1.3.1
+ google-cloud-logging==3.10.0
+ google-cloud-memcache==1.9.3
+ google-cloud-monitoring==2.19.3
+ google-cloud-orchestration-airflow==1.12.1
+ google-cloud-os-login==2.14.3
+ google-cloud-pubsub==2.21.1
+ google-cloud-redis==2.15.3
+ google-cloud-resource-manager==1.12.3
- google-cloud-secret-manager==2.16.4
+ google-cloud-secret-manager==1.0.1
+ google-cloud-spanner==1.19.2
+ google-cloud-speech==1.3.3
- google-cloud-storage==2.13.0
+ google-cloud-storage==1.44.0
+ google-cloud-tasks==2.16.3
+ google-cloud-texttospeech==1.0.2
+ google-cloud-translate==1.7.1
+ google-cloud-videointelligence==1.16.2
+ google-cloud-vision==1.0.1
+ google-cloud-workflows==1.14.3
+ grpcio-gcp==0.2.2
+ humanize==4.9.0
+ ijson==3.2.3
+ json-merge-patch==0.2
+ ldap3==2.9.1
+ looker-sdk==24.2.1
+ methodtools==0.4.7
+ msrestazure==0.6.4
+ mysql-connector-python==8.3.0
+ mysqlclient==2.2.4
+ paramiko==3.4.0
+ prometheus-client==0.20.0
- protobuf==4.24.4
+ protobuf==4.25.3
+ psycopg2-binary==2.9.9
+ pyodbc==5.1.0
+ pyopenssl==24.1.0
+ python-http-client==3.3.7
+ python-ldap==3.4.4
+ sendgrid==6.11.0
+ shapely==2.0.3
+ slack-sdk==3.27.1
+ snowflake-connector-python==3.8.0
+ snowflake-sqlalchemy==1.5.1
+ sqlalchemy-bigquery==1.10.0
+ sshtunnel==0.4.0
+ starkbank-ecdsa==2.2.0
+ statsd==4.0.1
+ tornado==6.4
- universal-pathlib==0.1.4
+ universal-pathlib==0.2.2
- uritemplate==4.1.1
+ uritemplate==3.0.1
+ wirerope==0.4.7
+ zope-event==5.0
+ zope-interface==6.2 Do I need to create a new venv? |
It's been very specific - an I am not sure if it is constantly / currently reproducible - this is the nature of dependency resolving that it constantly changes, depending on what is currrently available in PyPI, what state you are with your cache and some heuristics that might change resolutions even following the slightest changes. What I know about this case is: It was (consistently) happening:
And even for our builds - this is a very unusual step - usually we install airflow with constraints generated with the CI build. But this one does not use constraints, because this is a And the error was:
You can see one of the failing builds here: https://github.com/apache/airflow/actions/runs/8555464849/job/23446007205#step:10:3677 Corresponding builds for other Python versions resulted in:
You can see more failed runs here: https://github.com/apache/airflow/actions?query=branch%3Av2-9-test - it WAS consistently happening until I switched the builds to use https://github.com/apache/airflow/actions/runs/8559085301/job/23458003710#step:10:3718 This one takes a bit longer as expected (145 s) - but works.
There is an issue I created about it: astral-sh/uv#2821 that What I found so far that our CI builds (where we use But installing just |
I wasn't able to quickly reproduce this, but some flags that help with reproducing these issues for uv in the future are In general the behavior of uv's resolution algorithm (powered by pugrub-rs) has a tendency to want to exchaust all possible versions of a particular package more likely than pip's resolution algorithm (powered by resolvelib), though pip does sometimes produce the same behavior under certain circumstances. Part of the problem is that no one designing resolution algorithms is thinking of the idea that the source data is messy, and visiting one node over another node on the resolution graph could be significantly more expensive (including causing resoluition to fail). For this issue of a bad specifier, pip currently supports a very wide range of specifiers and versions due to using legacy specifier and version from an older version of packaging, however pip is going to drop legacy specifiers and versions likely in the next version of pip. So pip is going to become less lenient than uv, and therefore is going to skip over bad specifiers and versions, uv should do the same (and in fact I think uv should eventually drop supporting any kind of bad versions or specifiers). |
Good input. Thanks @notatallshaw :)
I suggested in my issue that
Agree. |
Well, uv is trying to capture the existing Python ecosystem, which means they are making choices to support a wider range of things rather than follow the spec. You can see here all the "fixups" they do to specifiers: https://github.com/astral-sh/uv/blob/c4107f9c40b58cea629b6a7b6e59663d30f96f41/crates/pypi-types/src/lenient_requirement.rs#L11-L57 And since they've been public I've seen users report several times on bad specifiers they rely on and I've seen uv add them to the list of fixups. It's a bit of a shame, but I understand their reasoning. I think it will be up to pip to release a breaking change and force the ecosystem to follow the spec. |
🙃 |
The new Google Cloud Bigquery released few days ago, caused some weird backtracking UV issue for Python 3.11 builds where airflow uses PyPI providers in PROD image builds. UV seems to fail on really old version of google-cloud-bigquery (1.28.2) that has a bad version specifier for one of the dependencies (de instead of dev).
We should add the google-cloud-bigquery explicitly and limit it to a relatively newer version. Airflow uses latest 3.20.1 version now in constraints and limiting bigquery to >= 3.0.1 (first non-yanked version released > 2 years ago in March 2022) is a good lower limit.
^ Add meaningful description above
Read the Pull Request Guidelines for more information.
In case of fundamental code changes, an Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in a newsfragment file, named
{pr_number}.significant.rst
or{issue_number}.significant.rst
, in newsfragments.