Skip to content

Conversation

@ashb
Copy link
Member

@ashb ashb commented Apr 16, 2021

DO NOT MERGE

For approval only.

kaxil and others added 30 commits February 5, 2021 12:47
…s. (#14107)

Revert "Fix Commands to install Airflow in docker/install_airflow.sh (#14099)"

This reverts commit 68758b8.

Also fixes the docker build script that was the reason for original
attempt to fix it.

(cherry picked from commit 212d5cd)
`attachable` is only a property of compose version 3.1 files, but we are
on 2.2 still.

This was failing on self-hosted runners with an error
`networks.example.com value Additional properties are not allowed
('attachable' was unexpected)`

(cherry picked from commit 5dbabdd)
* Run CI (Build Image and committer PRs) on self-hosted runner

The queue for GitHub actions is starting to get painful.

This PR changes it to run any build of main (push or scheduled), or a PR
from a committer on the self-hosted runner.

However since this setting is configured _in_ the repo, it could be
changed by a PR, so we can't rely on that; we also use a custom build of
the self-hosted runner binary that enforces PRs are from contributors,
otherwise it will fail the build

* Don't free space on self-hosted runners

It deletes the images we have pre-cached (defeating the point of
pre-loading them) and we manage freeing space via another process.

The exception to this is the K8s tests, where we don't need the images
we pre-cache, but we do want the space in our tmpfs.

* Some jobs take longer on self-hosted runners

* When running KinD tests on CI, use pass through the host docker creds to Kind

This enabled self-hosted runners (which are logged in to a docker hub
user) to pull images without running in to pull limits.

* Work around supurious blocking IO failures

This seems to be an occasional problem on the build provider package job
which produces a lot of output all at once.

The error appears as the job just failing with an exit code (sometimes 1, othertimes 120) but no
error.

(cherry picked from commit 4816707)
The change to this file was untestable until it was merged to master :(

(cherry picked from commit 9005f28)
When the image is prepared, PIP installation produces progress
bars which are annoying - especially in the CI environment.

This PR adds argument to control progress bar and sets it to "off"
for CI builds.

(cherry picked from commit 9b7852e)
When the user wants to test pull requests or - especilly - master
build in their own fork, they will not likely have self-hosted
runners configured and the builds will fail.

This change uses self-hosted runners only if the build is run in
the context of the `apache/airflow` repository.

(cherry picked from commit 14890f0)
This change attempts to stabilize pylint checks. Since we have
recently added self-hosted runners with multiple CPUS, seems that
re-enabling parallel mode for pylint makes perfect sense as we will
finally be able to use the parallelism and speed up static checks
significantly.

Previously the tests were run in single-processor mode in attempt
to avoid random mistakes where different files were processed in
different processes. This led to random pylint or mypy problems
and aither false-positives or false negatives before especially
when it came to circular dependencies. but since we are now past
heavy refactoring, this should be no problem for future changes
and occasional false positive/negative is less disruptive than
long checks.

The attempt is made to apply sorting order in the files passed
to pylint. This should provide more stability in the results
of running the tests in PR and in master.

We had some custom pylint plugins that prevented using of pylint
parallelism. For now we are giving up on one of the plugins
(no asserts use) and we rely on committer's review on that (we
have a rule in place to only use asserts in tests). The other
plugin was replaced by coming back to separation of "main code"
and "test code" and applying different rules to those - we have
now two different configuration files|: pylintrc and
pylintrc-tests to control settings for those two different cases.

Mypy and flake8 have been parallelized at the level of pre-commits.

By implementing those changes we are likely to speed up the
tests on self-hosted runners 6x-8x times.

Also caching of pre-commit environments (including the
pre-commit installation) is improved. Previously we only cached the
environment created by the pre-commit, but with this change we
also cache the pre-commit installation in .local directory - this
should save 1-2 minutes of the job.

(cherry picked from commit 82cb041)
…14227)

There are two types of constraints now:

* default constraints that contain all depenedncies of airflow,
  all the provider packages released at the time of the relese
  of that version, as well as all transitive dependencies. Following
  those constraints, you can be sure Airflow's installation is
  repeatable

* no-providers constraints - containing only the dependencies needed
  for core airflow installation. This allows to install/upgrade
  airflow without also forcing the provider's to be installed at
  specific version of Airflow.

This allows for flexible management of Airflow and Provider
packages separately. Documentation about it has been added.

Also the provider 'extras' for apache airflow do not keep direct
dependencies to the packages needed by the provider. Those
dependencies are now transitive only - so 'provider' extras only
depend on 'apache-airflow-provider-EXTRA' package and all
the dependencies are transitive. This will help in the future
to avoid conflicts when installing newer providers using extras.

(cherry picked from commit d524cec)
* Fix caching of python images during builds

After GHCR change, github image caching for python images was
broken. GHCR requires each image that we push to it to be
tagged with "org.opencontainers.image.source" label to point
to the right project repository. Otherwise it will not appear
in the repository images and we will not be able to manage
permissions of the image.

So we'v been extending the python base image with appropriate
label, but it was not in all places necessary. As the result.
the builds in CI were usign different images than the cache
image in DockerHub, which in turn caused full image rebuilds
ALWAYS. By implementing this change we make sure tha the the
"airflow-tainted" python images - including the label.

All images (when using breeze) are now built using such
airflow-tainted image which in turn should give5 up to 5-8
minutes saving on building the images job per each job (CI
and production)

The change implements the following behaviours:

1) When image is built on CI or locally without
   UPGRADE_TO_NEWER_DEPENDENCIES flag and without
   FORCE_PULL_IMAGES flag -t the latest image from the
   GitHub Registry or DockerHub registry is used.

2) When both FORCE_PULL_IMAGES and UPGRADE_TO_NEWER_DEPENDENCY
   are set to something different than "false" - the latest Python
   images are used - they are pulled first and tagged appropriately.

This way - always when UPGRADE_TO_NEWER_DEPENDENCIES are set (also
when master pushes are done) - such builds will use latest version
of python base images published on DockerHub.

* Update _push_pull_remove_images.sh

(cherry picked from commit b7d9c34)
This change enables easy switching between GitHub Package Registry
and GitHub Container Registry by simply adding GITHUB_REGISTRY
secret to be either `docker.package.github.com` or `ghcr.io`.

This makes it easy to switch only by the Apache Airflow repository
run builds, as it requires preparation of images (to make them
public and to add permissions to manage them) after they got
created for the first time. GitHub Package Registry works
out-of-the-box but it is less stable and considered a legacy,
also it does not allow image retention.

Documentation has been updated to reflect the reasoning of choosing
this solution as well as describing maintenance processes around
images (including adding new Python version)

(cherry picked from commit 25fa309)
The Pre-commit cache did not work when there is a change in
version of host python version used. When python version changed
from 3.6.12 to 3.6.13, the cache rendered useless.

This change adds python versio to the cache key specification,
including fallback value also including the python version, so
that even after changing pre-commit, the cache can be rebuilt
from the last-good cache but for the same python version.

(cherry picked from commit f3bc8ab)
)

Upgrade to newer dependencies is only set to something else than
false in two cases now:

* when setup.py or setup.cfg change
* when we are running push build

Previously - mistakenly - it was also set when "full_tests_needed"
label was set on a PR.

This caused for example the #19628 PR to fail before
moto was limited to <2 in #14433.

(cherry picked from commit fe31137)
The #14332 change introduced --local flag for static checks and
caching of the pre-commit virtualenv but the necessary PATH change
was not added to `basic_static_checks.sh`. This PR fixes it.

(cherry picked from commit c4da66c)
Because we specified `--rcfile` as an argument to the script, the check
for "no files" was failing and pylint was being run with no files,
resulting in an error.

(cherry picked from commit ab72b05)
The `date` command on MacOS does not have --rfc* flags so we have
to replace the flag with manually crafted rfc-compliant date.

Fixes #14393

(cherry picked from commit 64cf2ae)
Whenever docker commands should be used the --dry-run-docker
flag will print the command rather than execute it.

Closes: #14460
(cherry picked from commit aa28e4e)
It takes insane 9:30 on the self-hosted runner and likely incurs
a lot of cost on pushing the docs (and it's value is limited)

(cherry picked from commit 20f391e)
* Update hadolint from v1.22.1 to v1.18.0

* fixup! Update hadolint from v1.22.1 to v1.18.0

* fixup! fixup! Update hadolint from v1.22.1 to v1.18.0

Co-authored-by: Kamil Breguła <[email protected]>
(cherry picked from commit cc7260a)
* Production image can be run as root

* fixup! Production image can be run as root

* fixup! fixup! Production image can be run as root

Co-authored-by: Kamil Bregula <[email protected]>
Co-authored-by: Kamil Breguła <[email protected]>
(cherry picked from commit 7979b75)
The asset recompilation message did not work well in case of the
tests - where we did not mount local sources. It was always
showign the instructions to recompile the assets.

This is now fixed and the "OK" message is green.

(cherry picked from commit 9f35cff)
wolfier and others added 15 commits April 15, 2021 14:00
Fixes: #14675

Instead of building the relative url manually, we can leverage flask's url generation to account for differing airflow base URL and HTML base URL.

(cherry picked from commit aaa3bf6)
Currently as long as argument '-p' if present, code tries to mask it.

However, '-p' may mean something else (not password), like a boolean flag. Such cases may result in exception

(cherry picked from commit 486b764)
…eated Pods (#15204)

closes: #15193

Currently condition if the pod is created by Airflow is not considered. This commit fixes this.

We decide if the Pod is created by Airflow via checking if it has all the labels added in PodGenerator.construct_pod() or KubernetesPodOperator.create_labels_for_pod().

(cherry picked from commit c594d9c)
We've seen instances of connection resets happening, particularly in
Azure, that are remedied by enabling tcp_keepalive. Enabling it by
default should be safe and sane regardless of where we are running.

(cherry picked from commit 6e31465)
The current implementation imports Connection on import time, which
causes a circular import when a model class needs to reference a hook
class.

By applying this fix, the airflow.hooks package is completely decoupled
with airflow.models on import time, so model code can reference hooks.
Hooks, on the other hand, generally don't reference model classes.

Fix #15325.

(cherry picked from commit 7560316)
This module was removed/renamed in mysqlclient 2.0 -- this new name
works for both

(cherry picked from commit a162f2b)
(cherry picked from commit f69bb82)
There is some problem where the stackdriver logging "leaks" out of this
test and remains configured, (where as the S3 one doesn't) but given
this test is not actually testing logs, a quick, fix is just to change
with remote logging handler we use.
After stabilizing the builds on CI, the master builds started
to finally get green - except pushing to prod image cache which
continuous to fail as it missed python image to push.

This PR fixes it.

(cherry picked from commit 1dfbb8d)
This bug got introduced in #14909. Removed sorting from list and tuple as list & tuples preserve order unlike set.

The following DAG errors with: `TypeError: '<' not supported between instances of 'dict' and 'dict'`

```python
from airflow import models
from airflow.operators.dummy import DummyOperator
from datetime import datetime, timedelta
params = {
    "staging_schema": [{"key:":"foo","value":"bar"},
                       {"key:":"this","value":"that"}]
}

with models.DAG(dag_id='test-dag',
                start_date=datetime(2019, 2, 14),
                schedule_interval='30 13 * * *',
                catchup=False,
                max_active_runs=1,
                params=params
                ) as dag:
    my_task = DummyOperator(
        task_id='task1'
    )
```

Full Error:

```
  File "/usr/local/lib/python3.7/site-packages/airflow/serialization/serialized_objects.py", line 210, in <dictcomp>
    return cls._encode({str(k): cls._serialize(v) for k, v in var.items()}, type_=DAT.DICT)
  File "/usr/local/lib/python3.7/site-packages/airflow/serialization/serialized_objects.py", line 212, in _serialize
    return sorted(cls._serialize(v) for v in var)
TypeError: '<' not supported between instances of 'dict' and 'dict'
During handling of the above exception, another exception occurred:
...
```

This is because `sorted()` does not work with dict as it can't compare. Removed sorting from list & tuples which fixes it.
It also fails when we have set with multiple types.

(cherry picked from commit d115040)
…nches (#15394)

They use the same python image as master (as already mentioned in the
comments in ci_prepare_prod_image_on_ci.sh) so we don't want to try
and push the python image when we aren't building the main branch.

(cherry picked from commit f94effe)
Afer merging the constraints, the 'recursive' mode was not added
to checkout resulting with non-checked out github push action.

This commit fixes it and adds color to diff output in commit
to better show differences when pushing.

(cherry picked from commit 6b78394)
@ashb ashb requested a review from kaxil April 16, 2021 15:43
@boring-cyborg boring-cyborg bot added area:API Airflow's REST/HTTP API area:dev-tools labels Apr 16, 2021
@ashb ashb changed the title Prepare Prepare 2.0.2 Apr 16, 2021
@ashb ashb requested a review from ephraimbuddy April 16, 2021 15:52
@kaxil kaxil marked this pull request as draft April 16, 2021 16:07
@github-actions
Copy link

The PR most likely needs to run full matrix of tests because it modifies parts of the core of Airflow. However, committers might decide to merge it quickly and take the risk. If they don't merge it quickly - please rebase it to the latest master at your convenience, or amend the last commit of the PR, and push it with --force-with-lease.

@github-actions github-actions bot added the full tests needed We need to run full set of tests for this PR to merge label Apr 16, 2021
@ashb ashb merged commit e494306 into v2-0-stable Apr 16, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area:API Airflow's REST/HTTP API area:dev-tools full tests needed We need to run full set of tests for this PR to merge

Projects

None yet

Development

Successfully merging this pull request may close these issues.