Skip to content

[FLINK-38231][python] Standardise use of uv for PyFlink building, testing & linting #26897

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

autophagy
Copy link
Contributor

What is the purpose of the change

With FLINK-37775 and FLINK-36900 we have started to use uv for managing Python testing environments and installing linting tools, as well as defining test/lint/typecheck dependencies in pyproject.toml.

However, the CI/CD scripts and developer documentation is in a bit of an inbetween state. We use uv for creating Python virtual environments for tool installation, but uv supports these natively with uv run disregarding the need for a custom made virtual environment, for example.

This PR does the following:

  • Uses uv in our CI/CD scripts and developer documentation where possible. When it comes to lint stages such as flake8 and mypy, for example, the dependencies for those checks are managed by uv using uv run. This means the install steps for those tools in lint-python.sh can be removed, as this is managed by uv.
  • Used the tox-uv extension to tox so that tox uses uv to create the correct python environment for testing against various python versions, rather than relying on premade virtual environments. This also means the old install-command.sh script is no longer needed.
  • Replaced instances where the apache-flink and apache-flink-libraries packages were being built via python setup.py to instead use uv build, taking advantage of build isolation and automatic build dependency management.
  • Migrated static package metadata for apache-flink and apache-flink-libraries into their own pyproject.toml files, so they are viewed as concrete projects by uv.
  • Added ./apache-flink-libraries as a uv source so that, during development, the apache-flink-libraries package is automatically built (for example, when doing uv pip install -e . in the flink-python project). This sidesteps the need for building and installing the apache-flink-libraries dependency manually from source when doing local development.
  • Changed the build-wheels.sh script to build the pyflink wheels using uv build --python <python-version>. This, coupled with the tox changes, means that the py_env step of lint-python.sh (where we create venvs for supported python versions) can be removed.

Brief change log

  • Migrated the lint-python.sh script to use uv run for running lint, testing, typechecking and docs building steps.
  • Added tox-uv and bumped the tox dependency so that it can create the virtualenvs that it needs to run tests as needed with uv.
  • Updated building and testing scripts to use uv build, uv run and uv pip where possible.
  • Added section to developer docs about building the PyFlink project using uv.

Verifying this change

This change is already covered by existing tests, such as PyFlink unit tests, end-to-end tests and running the build-wheels.sh script.

Does this pull request potentially affect one of the following parts:

  • Dependencies (does it add or upgrade a dependency): (no)
  • The public API, i.e., is any changed class annotated with @Public(Evolving): (no)
  • The serializers: (no)
  • The runtime per-record code paths (performance sensitive): (no)
  • Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Kubernetes/Yarn, ZooKeeper: (no)
  • The S3 file system connector: (no)

Documentation

  • Does this pull request introduce a new feature? (no)
  • If yes, how is the feature documented? (not applicable)

@flinkbot
Copy link
Collaborator

flinkbot commented Aug 12, 2025

CI report:

Bot commands The @flinkbot bot supports the following commands:
  • @flinkbot run azure re-run the last Azure build

@autophagy
Copy link
Contributor Author

@dianfu @HuangXingBo While i'm digging around the build/releasing stuff - do you know why we build/publish a wheel and an sdist for apache-flink, but only an sdist for apache-flink-libraries?

@dianfu
Copy link
Contributor

dianfu commented Aug 13, 2025

@autophagy

  1. The purpose of apache-flink-libraries:
    The purpose of apache-flink-libraries is to split the JAR files(which are huge) into a separate project. Otherwise, when we release a new version of PyFlink, the total size of the artifacts is very large (each artifacts contains the JAR files), about 2 GB or so since there are multiple artifacts for each Python versions supported and for each platform supported. PyPI has a limitation for each project on the size it could use. We have contacted the PyPI to increase the project size multiple times before introducing apache-flink-libraries.

  2. Why there is only sdist for apache-flink-libraries:
    Since it only contains JAR files, sdist is enough. wheel packages are usually for cython files which are platform-dependent. Besides, the purpose of this project is to reduce the artifact size of each release, if we still publish wheel packages, it will still take too much size.

@autophagy
Copy link
Contributor Author

@dianfu Ah, makes sense! Thank you for the context 🙂

@@ -34,10 +34,10 @@ source venv/bin/activate ""
# install PyFlink dependency
if [[ $1 = "" ]]; then
# install the latest version of pyflink
pip install apache-flink
uv pip install apache-flink
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I notice we are using a very back level version of uv. Can we use the latest here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll bump it to the latest - we could also use the https://astral.sh/uv/install.sh script to always get the latest too. Not sure whats the right approach.

@github-actions github-actions bot added the community-reviewed PR has been reviewed by the community. label Aug 13, 2025
@github-actions github-actions bot added community-reviewed PR has been reviewed by the community. and removed community-reviewed PR has been reviewed by the community. labels Aug 15, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
community-reviewed PR has been reviewed by the community.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants