Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MAINTENANCE] Add extras to setup.cfg for optional deps #17

Merged
merged 9 commits into from
Jan 14, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
15 changes: 3 additions & 12 deletions .github/workflows/integration-test.yml
Original file line number Diff line number Diff line change
Expand Up @@ -41,10 +41,7 @@ jobs:
run: python -m pip install --upgrade pip

- name: Install Library
run: pip install .

- name: Install Dependencies
run: pip install -r test-requirements.txt
run: pip install .[postgresql,tests]

- name: Setup
run: |
Expand Down Expand Up @@ -78,10 +75,7 @@ jobs:
run: python -m pip install --upgrade pip

- name: Install Library
run: pip install .

- name: Install Dependencies
run: pip install -r test-requirements.txt
run: pip install .[tests,spark]

- name: Setup
run: |
Expand Down Expand Up @@ -115,10 +109,7 @@ jobs:
run: python -m pip install --upgrade pip

- name: Install Library
run: pip install .

- name: Install Dependencies
run: pip install -r test-requirements.txt
run: pip install .[tests,spark]

- name: Setup
run: |
Expand Down
5 changes: 1 addition & 4 deletions .github/workflows/lint.yml
Original file line number Diff line number Diff line change
Expand Up @@ -20,10 +20,7 @@ jobs:
python-version: 3.12

- name: Install Library
run: pip install .

- name: Install Dependencies
run: pip install -r test-requirements.txt
run: pip install .[lint]

- name: Ruff Formatter
run: ruff format --check .
Expand Down
5 changes: 1 addition & 4 deletions .github/workflows/unit-test.yml
Original file line number Diff line number Diff line change
Expand Up @@ -27,10 +27,7 @@ jobs:
run: python -m pip install --upgrade pip

- name: Install Library
run: pip install .

- name: Install Dependencies
run: pip install -r test-requirements.txt
run: pip install .[tests]

- name: Run Unit Tests
run: pytest -vvv -m unit tests/unit
31 changes: 29 additions & 2 deletions setup.cfg
Original file line number Diff line number Diff line change
Expand Up @@ -6,14 +6,41 @@ python_requires = >=3.9
packages = find_namespace:
include_package_data = true
install_requires =
great-expectations>=1.3.1
apache-airflow>=2.1
great-expectations[snowflake,postgresql,mssql,bigquery,athena,spark,gcp,azure,s3]>=1.3.1
setuptools>=41.0.0

[options.extras_require]
athena =
great-expectations[athena]>=1.3.1
azure =
great-expectations[azure]>=1.3.1
bigquery =
great-expectations[bigquery]>=1.3.1
lint =
mypy==1.14.1
ruff==0.8.3
pytest==8.3.4
pytest-mock==3.14.0
great-expectations[spark, spark-connect]>=1.3.1

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we really need spark in lint?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question. We do because spark dataframes are in the type signature of at least one of the operators. I don't see a way around it, but LMK if you do.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok...that leads me to ask the question: do we not support pandas for this operator?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We do support pandas. And that is part of the base gx install, so it's already included. I'll update the description on this PR, since these are good things to be clear about.

gcp =
great-expectations[gcp]>=1.3.1
mssql =
great-expectations[mssql]>=1.3.1
postgresql =
great-expectations[postgresql]>=1.3.1
s3 =
great-expectations[s3]>=1.3.1
snowflake =
great-expectations[snowflake]>=1.3.1
spark =
great-expectations[spark, spark-connect]>=1.3.1
pyarrow>=4.0.0
tests =
pytest
pytest==8.3.4
pytest-mock==3.14.0

[options.entry_points]
apache_airflow_provider=
provider_info=great_expectations_provider.__init__:get_provider_info

5 changes: 0 additions & 5 deletions test-requirements.txt

This file was deleted.

24 changes: 18 additions & 6 deletions tests/integration/test_validate_dataframe_operator.py
Original file line number Diff line number Diff line change
@@ -1,18 +1,21 @@
from typing import Callable
from __future__ import annotations

from typing import TYPE_CHECKING, Callable

import pandas as pd
import pyspark.sql as pyspark
import pytest
from great_expectations import ExpectationSuite
from great_expectations.expectations import ExpectColumnValuesToBeInSet
from pyspark.sql.connect.dataframe import DataFrame as SparkConnectDataFrame
from pyspark.sql.connect.session import SparkSession as SparkConnectSession

from great_expectations_provider.operators.validate_dataframe import (
GXValidateDataFrameOperator,
)
from integration.conftest import is_valid_gx_cloud_url, rand_name

if TYPE_CHECKING:
from pyspark.sql import SparkSession
from pyspark.sql.connect.session import SparkSession as SparkConnectSession


class TestGXValidateDataFrameOperator:
@pytest.mark.integration
Expand Down Expand Up @@ -57,7 +60,9 @@ def configure_dataframe() -> pd.DataFrame:
assert is_valid_gx_cloud_url(result["result_url"])

@pytest.mark.spark_integration
def test_spark(self, spark_session: pyspark.SparkSession) -> None:
def test_spark(self, spark_session: SparkSession) -> None:
import pyspark.sql as pyspark

column_name = "col_A"
task_id = f"test_spark_{rand_name()}"

Expand Down Expand Up @@ -85,6 +90,8 @@ def configure_dataframe() -> pyspark.DataFrame:

@pytest.mark.spark_connect_integration
def test_spark_connect(self, spark_connect_session: SparkConnectSession) -> None:
from pyspark.sql.connect.dataframe import DataFrame as SparkConnectDataFrame

column_name = "col_A"
task_id = f"test_spark_{rand_name()}"

Expand Down Expand Up @@ -112,14 +119,19 @@ def configure_dataframe() -> SparkConnectDataFrame:


@pytest.fixture
def spark_session() -> pyspark.SparkSession:
def spark_session() -> SparkSession:
import pyspark.sql as pyspark

session = pyspark.SparkSession.builder.getOrCreate()
assert isinstance(session, pyspark.SparkSession)
return session


@pytest.fixture
def spark_connect_session() -> SparkConnectSession:
import pyspark.sql as pyspark
from pyspark.sql.connect.session import SparkSession as SparkConnectSession

session = pyspark.SparkSession.builder.remote("sc://localhost:15002").getOrCreate()
assert isinstance(session, SparkConnectSession)
return session
1 change: 0 additions & 1 deletion tests/unit/test_validate_batch_operator.py
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,6 @@
from great_expectations.expectations import (
ExpectColumnValuesToBeInSet,
)
from pytest_mock import MockerFixture

from great_expectations_provider.operators.validate_batch import GXValidateBatchOperator

Expand Down
Loading