Skip to content

Commit

Permalink
Run some example in Kubernetes execution mode in CI (#1127)
Browse files Browse the repository at this point in the history
## Description

### Migrate example from
[cosmos-example](https://github.com/astronomer/cosmos-example/)

The [cosmos-example](https://github.com/astronomer/cosmos-example/)
repository currently contains several examples, including those that run
in Kubernetes execution mode. This setup has made testing local changes
in Kubernetes execution mode challenging and keeping the documentation
up-to-date is also not easy. Therefore, it makes sense to migrate the
Kubernetes examples from
[cosmos-example](https://github.com/astronomer/cosmos-example/) to this
repository. This PR resolved the below issue in this regard
- Migrate the
[jaffle_shop_kubernetes](https://github.com/astronomer/cosmos-example/blob/main/dags/jaffle_shop_kubernetes.py)
example DAG to the this repository.
- Moved the Dockerfile from
[cosmos-example](https://github.com/astronomer/cosmos-example/blob/main/Dockerfile.postgres_profile_docker_k8s)
to this repository to build the image with the necessary DAGs and DBT
projects
I also adjusted both the example DAG and Dockerfile to work within this
repository.

### Automate running locally 
I introduce some scripts to make running Kubernetes DAG easy.

**postgres-deployment.yaml:** Kubernetes resource file for spinning up
PostgreSQL and creating Kubernetes secrets.

**integration-kubernetes.sh:** Runs the Kubernetes DAG using pytest.

**kubernetes-setup.sh:**

- Builds the Docker image with the Jaffle Shop dbt project and DAG, and
loads the Docker image into the local registry.
- Creates Kubernetes resources such as PostgreSQL deployment, service,
and secret.

**Run DAG locally**
Prerequisites:

- Docker Desktop
- KinD (Kubernetes in Docker)
- kubectl

Steps:
1. Create cluster:  `kind create cluster`
2. Create Resource: `scripts/test/kubernetes-setup.sh` (This will set up
PostgreSQL and load the DBT project into the local registry)
3. Run DAG: `cd dev && scripts/test/integration-kubernetes.sh` this will
execute this DAG with a pytest you can also run directly with airflow
command given that project is installed in your virtual env
```
time AIRFLOW__COSMOS__PROPAGATE_LOGS=0 AIRFLOW__COSMOS__ENABLE_CACHE=1 AIRFLOW__COSMOS__CACHE_DIR=/tmp/ AIRFLOW_CONN_EXAMPLE_
CONN="postgres://postgres:[email protected]:5432/postgres" PYTHONPATH=`pwd` AIRFLOW_HOME=`pwd` AIRFLOW__CORE__DAGBAG_IMPORT_TIMEOUT=20000 AIRFLOW__CORE__DAG_FILE_PROCESSOR_TIMEOUT=20000 airflow dags test jaffle_shop_kubernetes  `date -Iseconds`
```
### Run jaffle_shop_kubernetes in CI
To avoid regression we have automated running the jaffle_shop_kubernetes
in CI

- Set up the GitHub Actions infrastructure to run DAGs using Kubernetes
execution mode
- Use
[container-tools/kind-action@v1](https://github.com/container-tools/kind-action)
to create a KinD cluster.
- Used the bash script to streamline the creation of Kubernetes
resources, build and load the image into a local registry, and execute
tests.
- At the moment I'm running the pytest from virtual env


### Documentation changes
Given that the DAG
[jaffle_shop_kubernetes](https://github.com/astronomer/cosmos-example/blob/main/dags/jaffle_shop_kubernetes.py)
is now part of this repository, I have automated the example rendering
for Kubernetes execution mode. This ensures that we avoid displaying
outdated example code.


https://astronomer.github.io/astronomer-cosmos/getting_started/execution-modes.html#kubernetes
<img width="822" alt="Screenshot 2024-08-15 at 8 03 59 PM"
src="https://github.com/user-attachments/assets/1eadad09-9b7c-43e1-bcd8-b08dd21e3878">


https://astronomer.github.io/astronomer-cosmos/getting_started/kubernetes.html#kubernetes

<img width="812" alt="Screenshot 2024-08-15 at 8 04 22 PM"
src="https://github.com/user-attachments/assets/7161fa9b-e5c1-44d8-8702-b2c583dee236">

### Future work

- Use the hatch target to run the test. I have introduced the hatch
target to run the Kubernetes example with hatch, but it's currently not
working due to a mismatch between the local and container DBT project
paths. This requires a bit more work.
- Remove the virtual environment step (Install packages and
dependencies) in the CI configuration for Run-Kubernetes-Tests and use
hatch instead.
- Update the profile YAML to use environment variables for the port, as
it is currently hardcoded.
- Remove the host from the Kubernetes secret and replace it with the
username and make corresponding change in DAG
- Currently, we need to export both POSTGRES_DATABASE and POSTGRES_DB in
the Dockerfile because both are used in the project. To ensure
consistency, avoid exporting both and instead make the environment
variables consistent across the repository
- Not a big deal in this context, but we have some hardcoded values for
secrets. It would be better to parameterize them

GH issue for future improvement:
#1160

### Example CI Run

-
https://github.com/astronomer/astronomer-cosmos/actions/runs/10405590862

## Related Issue(s)

closes: #535

## Breaking Change?

No

<!-- If this introduces a breaking change, specify that here. -->

## Checklist

- [x] I have made corresponding changes to the documentation (if
required)
- [x] I have added tests that prove my fix is effective or that my
feature works
  • Loading branch information
pankajastro committed Aug 15, 2024
1 parent 89f5999 commit e1ff924
Show file tree
Hide file tree
Showing 21 changed files with 379 additions and 55 deletions.
73 changes: 70 additions & 3 deletions .github/workflows/test.yml
Original file line number Diff line number Diff line change
Expand Up @@ -163,7 +163,6 @@ jobs:
POSTGRES_DB: postgres
POSTGRES_SCHEMA: public
POSTGRES_PORT: 5432
SOURCE_RENDERING_BEHAVIOR: all

- name: Upload coverage to Github
uses: actions/upload-artifact@v2
Expand Down Expand Up @@ -235,7 +234,6 @@ jobs:
POSTGRES_DB: postgres
POSTGRES_SCHEMA: public
POSTGRES_PORT: 5432
SOURCE_RENDERING_BEHAVIOR: all

- name: Upload coverage to Github
uses: actions/upload-artifact@v2
Expand Down Expand Up @@ -379,7 +377,6 @@ jobs:
POSTGRES_DB: postgres
POSTGRES_SCHEMA: public
POSTGRES_PORT: 5432
SOURCE_RENDERING_BEHAVIOR: all

- name: Upload coverage to Github
uses: actions/upload-artifact@v2
Expand Down Expand Up @@ -461,12 +458,82 @@ jobs:
AIRFLOW_CONN_EXAMPLE_CONN: postgres://postgres:[email protected]:5432/postgres
PYTHONPATH: /home/runner/work/astronomer-cosmos/astronomer-cosmos/:$PYTHONPATH

Run-Kubernetes-Tests:
needs: Authorize
runs-on: ubuntu-latest
strategy:
matrix:
python-version: [ "3.11" ]
airflow-version: [ "2.9" ]
steps:
- uses: actions/checkout@v3
with:
ref: ${{ github.event.pull_request.head.sha || github.ref }}
- uses: actions/cache@v3
with:
path: |
~/.cache/pip
.local/share/hatch/
key: coverage-integration-kubernetes-test-${{ runner.os }}-${{ matrix.python-version }}-${{ matrix.airflow-version }}-${{ hashFiles('pyproject.toml') }}-${{ hashFiles('cosmos/__init__.py') }}

- name: Set up Python ${{ matrix.python-version }}
uses: actions/setup-python@v4
with:
python-version: ${{ matrix.python-version }}

- name: Create KinD cluster
uses: container-tools/kind-action@v1

- name: Install packages and dependencies
run: |
python -m venv venv
source venv/bin/activate
pip install --upgrade pip
pip install -e ".[tests]"
pip install apache-airflow-providers-cncf-kubernetes
pip install dbt-postgres==1.8.2 psycopg2==2.9.3 pytz
pip install apache-airflow==${{ matrix.airflow-version }}
# hatch -e tests.py${{ matrix.python-version }}-${{ matrix.airflow-version }} run pip freeze
- name: Run kubernetes tests
run: |
source venv/bin/activate
sh ./scripts/test/kubernetes-setup.sh
cd dev && sh ../scripts/test/integration-kubernetes.sh
# hatch run tests.py${{ matrix.python-version }}-${{ matrix.airflow-version }}:test-kubernetes
env:
AIRFLOW_HOME: /home/runner/work/astronomer-cosmos/astronomer-cosmos/
AIRFLOW_CONN_EXAMPLE_CONN: postgres://postgres:[email protected]:5432/postgres
AIRFLOW_CONN_AWS_S3_CONN: ${{ secrets.AIRFLOW_CONN_AWS_S3_CONN }}
AIRFLOW_CONN_GCP_GS_CONN: ${{ secrets.AIRFLOW_CONN_GCP_GS_CONN }}
AIRFLOW_CONN_AZURE_ABFS_CONN: ${{ secrets.AIRFLOW_CONN_AZURE_ABFS_CONN }}
AIRFLOW__CORE__DAGBAG_IMPORT_TIMEOUT: 90.0
PYTHONPATH: /home/runner/work/astronomer-cosmos/astronomer-cosmos/:$PYTHONPATH
COSMOS_CONN_POSTGRES_PASSWORD: ${{ secrets.COSMOS_CONN_POSTGRES_PASSWORD }}
DATABRICKS_CLUSTER_ID: mock
DATABRICKS_HOST: mock
DATABRICKS_WAREHOUSE_ID: mock
DATABRICKS_TOKEN: mock
POSTGRES_HOST: localhost
POSTGRES_USER: postgres
POSTGRES_PASSWORD: postgres
POSTGRES_DB: postgres
POSTGRES_SCHEMA: public
POSTGRES_PORT: 5432

- name: Upload coverage to Github
uses: actions/upload-artifact@v2
with:
name: coverage-integration-kubernetes-test-${{ matrix.python-version }}-${{ matrix.airflow-version }}
path: .coverage

Code-Coverage:
if: github.event.action != 'labeled'
needs:
- Run-Unit-Tests
- Run-Integration-Tests
- Run-Integration-Tests-Expensive
- Run-Kubernetes-Tests
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
Expand Down
18 changes: 18 additions & 0 deletions dev/Dockerfile.postgres_profile_docker_k8s
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
FROM python:3.11

RUN pip install dbt-postgres==1.8.2 psycopg2==2.9.3 pytz

ENV POSTGRES_DATABASE=postgres
ENV POSTGRES_DB=postgres
ENV POSTGRES_HOST=postgres.default.svc.cluster.local
ENV POSTGRES_PASSWORD=postgres
ENV POSTGRES_PORT=5432
ENV POSTGRES_SCHEMA=public
ENV POSTGRES_USER=postgres

RUN mkdir /root/.dbt
COPY dags/dbt/jaffle_shop/profiles.yml /root/.dbt/profiles.yml

RUN mkdir dags
COPY dags dags
RUN rm dags/dbt/jaffle_shop/packages.yml
12 changes: 12 additions & 0 deletions dev/dags/dbt/jaffle_shop/profiles.yml
Original file line number Diff line number Diff line change
Expand Up @@ -10,3 +10,15 @@ default:
dbname: "{{ env_var('POSTGRES_DB') }}"
schema: "{{ env_var('POSTGRES_SCHEMA') }}"
threads: 4

postgres_profile:
target: dev
outputs:
dev:
type: postgres
dbname: "{{ env_var('POSTGRES_DATABASE') }}"
host: "{{ env_var('POSTGRES_HOST') }}"
pass: "{{ env_var('POSTGRES_PASSWORD') }}"
port: 5432 # "{{ env_var('POSTGRES_PORT') | as_number }}"
schema: "{{ env_var('POSTGRES_SCHEMA') }}"
user: "{{ env_var('POSTGRES_USER') }}"
98 changes: 98 additions & 0 deletions dev/dags/jaffle_shop_kubernetes.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,98 @@
"""
## Jaffle Shop DAG
[Jaffle Shop](https://github.com/dbt-labs/jaffle_shop) is a fictional eCommerce store. This dbt project originates from
dbt labs as an example project with dummy data to demonstrate a working dbt core project. This DAG uses the cosmos dbt
parser to generate an Airflow TaskGroup from the dbt project folder.
The step-by-step to run this DAG are described in:
https://astronomer.github.io/astronomer-cosmos/getting_started/kubernetes.html#kubernetes
"""

from airflow import DAG
from airflow.providers.cncf.kubernetes.secret import Secret
from pendulum import datetime

from cosmos import (
DbtSeedKubernetesOperator,
DbtTaskGroup,
ExecutionConfig,
ExecutionMode,
ProfileConfig,
ProjectConfig,
)
from cosmos.profiles import PostgresUserPasswordProfileMapping

DBT_IMAGE = "dbt-jaffle-shop:1.0.0"

project_seeds = [{"project": "jaffle_shop", "seeds": ["raw_customers", "raw_payments", "raw_orders"]}]

postgres_password_secret = Secret(
deploy_type="env",
deploy_target="POSTGRES_PASSWORD",
secret="postgres-secrets",
key="password",
)

postgres_host_secret = Secret(
deploy_type="env",
deploy_target="POSTGRES_HOST",
secret="postgres-secrets",
key="host",
)

with DAG(
dag_id="jaffle_shop_kubernetes",
start_date=datetime(2022, 11, 27),
doc_md=__doc__,
catchup=False,
) as dag:
# [START kubernetes_seed_example]
load_seeds = DbtSeedKubernetesOperator(
task_id="load_seeds",
project_dir="dags/dbt/jaffle_shop",
get_logs=True,
schema="public",
image=DBT_IMAGE,
is_delete_operator_pod=False,
secrets=[postgres_password_secret, postgres_host_secret],
profile_config=ProfileConfig(
profile_name="postgres_profile",
target_name="dev",
profile_mapping=PostgresUserPasswordProfileMapping(
conn_id="postgres_default",
profile_args={
"schema": "public",
},
),
),
)
# [END kubernetes_seed_example]

# [START kubernetes_tg_example]
run_models = DbtTaskGroup(
profile_config=ProfileConfig(
profile_name="postgres_profile",
target_name="dev",
profile_mapping=PostgresUserPasswordProfileMapping(
conn_id="postgres_default",
profile_args={
"schema": "public",
},
),
),
project_config=ProjectConfig(dbt_project_path="dags/dbt/jaffle_shop"),
execution_config=ExecutionConfig(
execution_mode=ExecutionMode.KUBERNETES,
),
operator_args={
"image": DBT_IMAGE,
"get_logs": True,
"is_delete_operator_pod": False,
"secrets": [postgres_password_secret, postgres_host_secret],
},
)
# [END kubernetes_tg_example]

load_seeds >> run_models
24 changes: 4 additions & 20 deletions docs/getting_started/execution-modes.rst
Original file line number Diff line number Diff line change
Expand Up @@ -144,27 +144,11 @@ Check the step-by-step guide on using the ``kubernetes`` execution mode at :ref:

Example DAG:

.. code-block:: python
postgres_password_secret = Secret(
deploy_type="env",
deploy_target="POSTGRES_PASSWORD",
secret="postgres-secrets",
key="password",
)
.. literalinclude:: ../../dev/dags/jaffle_shop_kubernetes.py
:language: python
:start-after: [START kubernetes_seed_example]
:end-before: [END kubernetes_seed_example]

docker_cosmos_dag = DbtDag(
# ...
execution_config=ExecutionConfig(
execution_mode=ExecutionMode.KUBERNETES,
),
operator_args={
"image": "dbt-jaffle-shop:1.0.0",
"get_logs": True,
"is_delete_operator_pod": False,
"secrets": [postgres_password_secret],
},
)
AWS_EKS
----------

Expand Down
28 changes: 4 additions & 24 deletions docs/getting_started/kubernetes.rst
Original file line number Diff line number Diff line change
Expand Up @@ -28,30 +28,10 @@ Additional KubernetesPodOperator parameters can be added on the operator_args pa

For instance,

.. code-block:: python
run_models = DbtTaskGroup(
profile_config=ProfileConfig(
profile_name="postgres_profile",
target_name="dev",
profile_mapping=PostgresUserPasswordProfileMapping(
conn_id="postgres_default",
profile_args={
"schema": "public",
},
),
),
project_config=ProjectConfig(PROJECT_DIR),
execution_config=ExecutionConfig(
execution_mode=ExecutionMode.KUBERNETES,
),
operator_args={
"image": DBT_IMAGE,
"get_logs": True,
"is_delete_operator_pod": False,
"secrets": [postgres_password_secret, postgres_host_secret],
},
)
.. literalinclude:: ../../dev/dags/jaffle_shop_kubernetes.py
:language: python
:start-after: [START kubernetes_tg_example]
:end-before: [END kubernetes_tg_example]

Step-by-step instructions
+++++++++++++++++++++++++
Expand Down
1 change: 1 addition & 0 deletions pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -168,6 +168,7 @@ freeze = "pip freeze"
test = 'sh scripts/test/unit.sh'
test-cov = 'sh scripts/test/unit-cov.sh'
test-integration = 'sh scripts/test/integration.sh'
test-kubernetes = "sh scripts/test/integration-kubernetes.sh"
test-integration-dbt-1-5-4 = 'sh scripts/test/integration-dbt-1-5-4.sh'
test-integration-expensive = 'sh scripts/test/integration-expensive.sh'
test-integration-setup = 'sh scripts/test/integration-setup.sh'
Expand Down
1 change: 1 addition & 0 deletions scripts/test/integration-dbt-1-5-4.sh
Original file line number Diff line number Diff line change
Expand Up @@ -10,4 +10,5 @@ pytest -vv \
--durations=0 \
-m integration \
--ignore=tests/perf \
--ignore=tests/test_example_k8s_dags.py \
-k 'basic_cosmos_task_group'
1 change: 1 addition & 0 deletions scripts/test/integration-expensive.sh
Original file line number Diff line number Diff line change
Expand Up @@ -6,4 +6,5 @@ pytest -vv \
--durations=0 \
-m integration \
--ignore=tests/perf \
--ignore=tests/test_example_k8s_dags.py \
-k 'example_cosmos_python_models or example_virtualenv'
16 changes: 16 additions & 0 deletions scripts/test/integration-kubernetes.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
#!/bin/bash

set -x
set -e

# Reset the Airflow database to its initial state
airflow db reset -y

# Run tests using pytest
pytest -vv \
--cov=cosmos \
--cov-report=term-missing \
--cov-report=xml \
--durations=0 \
-m integration \
../tests/test_example_k8s_dags.py
1 change: 1 addition & 0 deletions scripts/test/integration-sqlite.sh
Original file line number Diff line number Diff line change
Expand Up @@ -5,4 +5,5 @@ pytest -vv \
--durations=0 \
-m integration \
--ignore=tests/perf \
--ignore=tests/test_example_k8s_dags.py \
-k 'example_cosmos_sources or sqlite'
3 changes: 2 additions & 1 deletion scripts/test/integration.sh
Original file line number Diff line number Diff line change
Expand Up @@ -20,4 +20,5 @@ pytest -vv \
--durations=0 \
-m integration \
--ignore=tests/perf \
-k 'not (sqlite or example_cosmos_sources or example_cosmos_python_models or example_virtualenv)'
--ignore=tests/test_example_k8s_dags.py \
-k 'not (sqlite or example_cosmos_sources or example_cosmos_python_models or example_virtualenv or jaffle_shop_kubernetes)'
34 changes: 34 additions & 0 deletions scripts/test/kubernetes-setup.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,34 @@
#!/bin/bash

# Print each command before executing it
# Exit the script immediately if any command exits with a non-zero status (for debugging purposes)
set -x
set -e

# Create a Kubernetes secret named 'postgres-secrets' with the specified literals for host and password
kubectl create secret generic postgres-secrets \
--from-literal=host=postgres-postgresql.default.svc.cluster.local \
--from-literal=password=postgres

# Apply the PostgreSQL deployment configuration from the specified YAML file
kubectl apply -f scripts/test/postgres-deployment.yaml

# Build the Docker image with tag 'dbt-jaffle-shop:1.0.0' using the specified Dockerfile
cd dev && docker build --progress=plain --no-cache -t dbt-jaffle-shop:1.0.0 -f Dockerfile.postgres_profile_docker_k8s .

# Load the Docker image into the local KIND cluster
kind load docker-image dbt-jaffle-shop:1.0.0

# Retrieve the name of the PostgreSQL pod using the label selector 'app=postgres'
# The output is filtered to get the first pod's name
POD_NAME=$(kubectl get pods -n default -l app=postgres -o jsonpath='{.items[0].metadata.name}')

# Print the name of the PostgreSQL pod
echo "$POD_NAME"

# Forward port 5432 from the PostgreSQL pod to the local machine's port 5432
# This allows local access to the PostgreSQL instance running in the pod
kubectl port-forward --namespace default "$POD_NAME" 5432:5432 &

# List all pods in the default namespace to verify the status of pods
kubectl get pod
3 changes: 2 additions & 1 deletion scripts/test/performance.sh
Original file line number Diff line number Diff line change
Expand Up @@ -2,4 +2,5 @@ pytest -vv \
-s \
-m 'perf' \
--ignore=tests/test_example_dags.py \
--ignore=tests/test_example_dags_no_connections.py
--ignore=tests/test_example_dags_no_connections.py \
--ignore=tests/test_example_k8s_dags.py
Loading

0 comments on commit e1ff924

Please sign in to comment.