Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pytest with f2s ops #3099

Closed
wants to merge 85 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
85 commits
Select commit Hold shift + click to select a range
6bf1115
Try etl/integraiton separation in tests.
rousik Sep 2, 2023
d89e1f4
fix some formatting in the python thingies
rousik Sep 3, 2023
3b43bef
Minor changes to the tox-pytest setup
rousik Sep 4, 2023
a8fb7dc
Comment out most things to isolate ci-integration-etl
rousik Sep 6, 2023
8587dc7
fix few things
rousik Sep 6, 2023
a5b2867
fix condarc
rousik Sep 6, 2023
4916320
install snappy before pudl
rousik Sep 6, 2023
2b1a116
Fix up integration tests, run on the large machine.
rousik Sep 6, 2023
bb623af
add alembic setup
rousik Sep 6, 2023
da07819
run conda info ahead of running the etl
rousik Sep 6, 2023
7dc86aa
Log conda env
rousik Sep 6, 2023
162f62a
Disable large runners for now
rousik Sep 6, 2023
68c6dee
fix exec shell for the action
rousik Sep 6, 2023
09fb56d
fix exec shell for the action
rousik Sep 6, 2023
f05fcd3
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Sep 6, 2023
173ad8c
Fix path to integration tests.
zaneselvans Sep 6, 2023
c9b83ab
Add concurrency to cancel previous runs
rousik Sep 6, 2023
e35efc9
Merge branch 'parallel-pytest' of github.com:catalyst-cooperative/pud…
rousik Sep 6, 2023
a81ebf0
Fix path to integration tests
rousik Sep 6, 2023
364b053
fix pytest paths
rousik Sep 6, 2023
d4b46fc
restructure etl steps
rousik Sep 6, 2023
ffd56e0
fix typo in alembic setup
rousik Sep 6, 2023
2edea32
uncomment steps and fix mamba setup
rousik Sep 6, 2023
d5fcc73
Install test dependencies with pip
zaneselvans Sep 6, 2023
4f42efa
resolve conflicts with dev branch
zaneselvans Sep 6, 2023
ac95389
Merge branch 'dev' into parallel-pytest
zaneselvans Sep 6, 2023
61bb7b6
Merge remote-tracking branch 'origin/dev' into parallel-pytest
rousik Sep 11, 2023
b1f3b95
Merge remote-tracking branch 'origin/dev' into parallel-pytest
rousik Sep 25, 2023
fb2892f
Add test deps to ci steps that use coverage.
rousik Sep 25, 2023
678d54e
I don't understand tox substitution :-/
rousik Sep 26, 2023
6ec7d99
Fix ETL command to run. Ugh.
rousik Sep 26, 2023
2f43405
Include datasette as tox dependency.
rousik Sep 26, 2023
66f5f54
Fix the input/output path override behavior when running in gha context.
rousik Sep 26, 2023
3ac2879
Merge remote-tracking branch 'origin/dev' into parallel-pytest
rousik Sep 26, 2023
12903a0
Merge branch 'dev' into parallel-pytest
rousik Oct 3, 2023
28af787
Rename github_ tox steps, make them fail unless GITHUB_ACTIONS env ex…
rousik Oct 18, 2023
a725c07
Merge remote-tracking branch 'origin/dev' into parallel-pytest
rousik Oct 18, 2023
00e41fb
Merge branch 'dev' into parallel-pytest
zaneselvans Oct 19, 2023
645ae4f
Merge branch 'dev' into parallel-pytest
zaneselvans Oct 19, 2023
051d2b1
Explicitly demand tox>=4.3.3 for the env reuse.
rousik Oct 21, 2023
58597e4
Drop usage of tox in the new ci-integration test.
rousik Oct 25, 2023
5b162bd
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Oct 25, 2023
b7afb1e
Put back diagnostic steps and install PUDL dependencies.
rousik Oct 25, 2023
e17f35c
Few cleanup tasks.
rousik Oct 25, 2023
64500b7
Fix up the coverage commands.
rousik Oct 25, 2023
3a7b30a
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Oct 25, 2023
2a9f5a2
Fix the path to etl.py.
rousik Oct 25, 2023
f226f38
Run alembic before ETL.
rousik Oct 25, 2023
583f7f8
Merge remote-tracking branch 'origin/dev' into parallel-pytest
rousik Oct 25, 2023
0590f29
Revert changes to other files.
rousik Oct 25, 2023
fa814ea
Also install datasette dependencies.
rousik Oct 28, 2023
c3da09e
Revert changes to .coveragerc
rousik Oct 29, 2023
0fdd73d
Merge remote-tracking branch 'origin/dev' into parallel-pytest
rousik Oct 29, 2023
171dab5
Merge remote-tracking branch 'origin/dev' into parallel-pytest
rousik Nov 1, 2023
2c96289
Merge branch 'dev' into parallel-pytest
rousik Nov 6, 2023
666f1e9
add option to pick runner when running from dispatch
rousik Nov 6, 2023
31dde43
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Nov 6, 2023
297eca0
Merge remote-tracking branch 'origin/dev' into parallel-pytest
rousik Nov 6, 2023
fb30bd5
Fix workflow for runner size determination.
rousik Nov 7, 2023
6ff74f5
Run matrix of integration tests to check scaling.
rousik Nov 7, 2023
567da80
Fix integration workflow file.
rousik Nov 7, 2023
ca9854e
Drop support for matrix evaluation of ci-integration
rousik Nov 7, 2023
b08ead3
Merge remote-tracking branch 'origin/dev' into parallel-pytest
rousik Nov 14, 2023
a23f7eb
Merge remote-tracking branch 'origin/dev' into parallel-pytest
rousik Nov 27, 2023
e834f03
Update conda-lock.yml and rendered conda environment files.
rousik Nov 27, 2023
6d14c83
Merge remote-tracking branch 'origin/dev' into parallel-pytest
rousik Nov 27, 2023
8414ce3
Merge branch 'dev' into parallel-pytest
zaneselvans Nov 28, 2023
65aaa60
Update conda-lock.yml and rendered conda environment files.
zaneselvans Nov 28, 2023
48726c0
Replace ls -R with find -type f.
rousik Nov 27, 2023
cc1f6a9
Revert renames to be more friendly.
rousik Nov 29, 2023
14d9a4a
Fix some more code review comments.
rousik Nov 29, 2023
1c8dd9c
Fix error in call to pytest -n auto
zaneselvans Nov 29, 2023
6e4770c
Merge branch 'dev' into parallel-pytest
zaneselvans Nov 29, 2023
04812b5
Update conda-lock.yml and rendered conda environment files.
zaneselvans Nov 29, 2023
fc023db
Fail gha step if coverage fails to upload.
rousik Nov 29, 2023
7ac4fbb
Merge remote-tracking branch 'origin/dev' into parallel-pytest
rousik Nov 29, 2023
1ea46f5
Use specific coverage file names to aid the process.
rousik Nov 29, 2023
c4f3d98
Split xbrl2sqlite into smaller ops.
rousik Nov 30, 2023
428d04f
Break ferc_to_sqlite op monoliths.
rousik Nov 30, 2023
f6f8d97
Merge branch 'split-ferc2sqlite-ops' into pytest-with-f2s-ops
rousik Nov 30, 2023
d79930b
Update conda-lock.yml and rendered conda environment files.
rousik Nov 30, 2023
1bd80f6
Update pytest.yml
rousik Nov 30, 2023
3bfce3b
Merge remote-tracking branch 'origin/dev' into pytest-with-f2s-ops
rousik Dec 1, 2023
60e6ab5
TESTONLY: Run matrix of concurrency options.
rousik Dec 1, 2023
b5c4f71
Update conda-lock.yml and rendered conda environment files.
rousik Dec 1, 2023
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
125 changes: 80 additions & 45 deletions .github/workflows/pytest.yml
Original file line number Diff line number Diff line change
Expand Up @@ -8,11 +8,18 @@ on:
- opened
- synchronize
- ready_for_review
concurrency:
group: ${{ github.workflow }}-${{ github.ref }}-${{ github.event_name }}
cancel-in-progress: true

env:
PUDL_OUTPUT: /home/runner/pudl-work/output/
PUDL_INPUT: /home/runner/pudl-work/input/
DAGSTER_HOME: /home/runner/pudl-work/dagster_home/
ETL_CONFIG: src/pudl/package_data/settings/etl_fast.yml
ETL_COMMANDLINE_OPTIONS: --gcs-cache-path=gs://zenodo-cache.catalyst.coop
COVERAGE_OPTIONS: --concurrency=multiprocessing
XBRL_WORKERS: 2

jobs:
ci-docs:
Expand Down Expand Up @@ -64,7 +71,6 @@ jobs:
defaults:
run:
shell: bash -l {0}

steps:
- uses: actions/checkout@v4
with:
Expand Down Expand Up @@ -105,31 +111,36 @@ jobs:
path: coverage.xml

ci-integration:
runs-on:
group: large-runner-group
labels: ubuntu-22.04-4core
needs:
- ci-unit
runs-on: ubuntu-22.04-4core
if: github.event.pull_request.draft == false
permissions:
contents: read
id-token: write
strategy:
fail-fast: false
defaults:
run:
shell: bash -l {0}

strategy:
fail-fast: false
matrix:
dagster_workers: [0, 1, 2, 4]
xbrl_workers: [1, 2, 3, 4, 8]
steps:
- uses: actions/checkout@v4
with:
fetch-depth: 2

- name: Install Conda environment using mamba
- name: Install conda-lock environment with micromamba
uses: mamba-org/setup-micromamba@v1
with:
environment-file: environments/conda-lock.yml
environment-name: pudl-dev
cache-environment: true

- name: Install PUDL and its dependencies
run: pip install --no-deps --no-cache-dir .

- name: Log environment details
run: |
conda info
Expand Down Expand Up @@ -169,21 +180,42 @@ jobs:
workload_identity_provider: "projects/345950277072/locations/global/workloadIdentityPools/gh-actions-pool/providers/gh-actions-provider"
service_account: "tox-pytest-github-action@catalyst-cooperative-pudl.iam.gserviceaccount.com"

- name: Run integration tests, trying to use GCS cache if possible
- name: Run ferc_to_sqlite
env:
COVERAGE_FILE: .coverage.ferc_to_sqlite
run: |
pip install --no-deps --editable .
make pytest-integration

coverage run ${{ env.COVERAGE_OPTIONS }} \
src/pudl/ferc_to_sqlite/cli.py \
--dagster-workers ${{ matrix.dagster_workers }} \
--workers ${{ matrix.xbrl_workers }} \
--clobber ${{ env.ETL_COMMANDLINE_OPTIONS }} ${{ env.ETL_CONFIG }}
- name: Fail pipeline early
run: exit 1
- name: Run pudl_etl
env:
COVERAGE_FILE: .coverage.pudl_etl
run: |
alembic upgrade head
coverage run ${{ env.COVERAGE_OPTIONS }} \
src/pudl/cli/etl.py ${{ env.ETL_COMMANDLINE_OPTIONS }} ${{ env.ETL_CONFIG }}
- name: Run integration tests
env:
COVERAGE_FILE: .coverage.pytest
run: |
coverage run ${{ env.COVERAGE_OPTIONS }} \
-m pytest -n auto --live-dbs test/integration
- name: Generate coverage
run: |
coverage combine
coverage xml
- name: Upload coverage
uses: actions/upload-artifact@v3
with:
name: coverage-integration
path: coverage.xml

- name: Log post-test Zenodo datastore contents
run: find ${{ env.PUDL_INPUT }}

ci-coverage:
name: Upload coverage to CodeCov
runs-on: ubuntu-latest
needs:
- ci-docs
Expand All @@ -197,37 +229,40 @@ jobs:
with:
path: coverage
- name: List downloaded files
run: |
ls -R
run: find -type f
- name: Upload test coverage report to CodeCov
uses: codecov/codecov-action@v3
with:
directory: coverage

ci-notify:
runs-on: ubuntu-latest
if: ${{ always() }}
needs:
- ci-docs
- ci-unit
- ci-integration
steps:
- name: Inform the Codemonkeys
uses: 8398a7/action-slack@v3
continue-on-error: true
with:
status: custom
fields: workflow,job,commit,repo,ref,author,took
custom_payload: |
{
username: 'action-slack',
icon_emoji: ':octocat:',
attachments: [{
color: '${{ needs.ci-test.result }}' === 'success' ? 'good' : '${{ needs.ci-test.result }}' === 'failure' ? 'danger' : 'warning',
text: `${process.env.AS_REPO}@${process.env.AS_REF}\n ${process.env.AS_WORKFLOW} (${process.env.AS_COMMIT})\n by ${process.env.AS_AUTHOR}\n Status: ${{ needs.ci-test.result }}`,
}]
}
env:
GITHUB_TOKEN: ${{ github.token }} # required
SLACK_WEBHOOK_URL: ${{ secrets.SLACK_WEBHOOK_URL }} # required
MATRIX_CONTEXT: ${{ toJson(matrix) }} # required
fail_ci_if_error: true

# TODO(rousik): The following slack notification gives no value and
# needs to be fixed. Until then, it might be better to do nothing
# at all.
# ci-notify:
# name: Notify slack
# runs-on: ubuntu-latest
# if: ${{ always() }}
# needs:
# - ci-unit
# - ci-integration
# steps:
# - name: Inform the Codemonkeys
# uses: 8398a7/action-slack@v3
# continue-on-error: true
# with:
# status: custom
# fields: workflow,job,commit,repo,ref,author,took
# custom_payload: |
# {
# username: 'action-slack',
# icon_emoji: ':octocat:',
# attachments: [{
# color: '${{ needs.ci-test.result }}' === 'success' ? 'good' : '${{ needs.ci-test.result }}' === 'failure' ? 'danger' : 'warning',
# text: `${process.env.AS_REPO}@${process.env.AS_REF}\n ${process.env.AS_WORKFLOW} (${process.env.AS_COMMIT})\n by ${process.env.AS_AUTHOR}\n Status: ${{ needs.ci-test.result }}`,
# }]
# }
# env:
# GITHUB_TOKEN: ${{ github.token }} # required
# SLACK_WEBHOOK_URL: ${{ secrets.SLACK_WEBHOOK_URL }} # required
# MATRIX_CONTEXT: ${{ toJson(matrix) }} # required
28 changes: 27 additions & 1 deletion src/pudl/extract/dbf.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,20 +5,22 @@
import warnings
import zipfile
from collections import defaultdict
from collections.abc import Iterator
from collections.abc import Callable, Iterator
from functools import lru_cache
from pathlib import Path
from typing import IO, Any, Protocol, Self

import pandas as pd
import sqlalchemy as sa
from dagster import op
from dbfread import DBF, FieldParser

import pudl
import pudl.logging_helpers
from pudl.metadata.classes import DataSource
from pudl.settings import FercToSqliteSettings, GenericDatasetSettings
from pudl.workspace.datastore import Datastore
from pudl.workspace.setup import PudlPaths

logger = pudl.logging_helpers.get_logger(__name__)

Expand Down Expand Up @@ -465,6 +467,30 @@ def get_db_path(self) -> str:
db_path = str(Path(self.output_path) / self.DATABASE_NAME)
return f"sqlite:///{db_path}"

@classmethod
def get_dagster_op(cls) -> Callable:
"""Returns dagstger op that runs this extractor."""

@op(
name=f"dbf_{cls.DATASET}",
required_resource_keys={
"ferc_to_sqlite_settings",
"datastore",
"runtime_settings",
},
)
def inner_method(context) -> None:
"""Instantiates dbf extractor and runs it."""
dbf_extractor = cls(
datastore=context.resources.datastore,
settings=context.resources.ferc_to_sqlite_settings,
clobber=context.resources.runtime_settings.clobber,
output_path=PudlPaths().output_dir,
)
dbf_extractor.execute()

return inner_method

def execute(self):
"""Runs the extraction of the data from dbf to sqlite."""
logger.info(
Expand Down
37 changes: 6 additions & 31 deletions src/pudl/extract/ferc.py
Original file line number Diff line number Diff line change
@@ -1,42 +1,17 @@
"""Hooks to integrate ferc to sqlite functionality into dagster graph."""


from dagster import Field, op

import pudl
from pudl.extract.ferc1 import Ferc1DbfExtractor
from pudl.extract.ferc2 import Ferc2DbfExtractor
from pudl.extract.ferc6 import Ferc6DbfExtractor
from pudl.extract.ferc60 import Ferc60DbfExtractor
from pudl.workspace.setup import PudlPaths

logger = pudl.logging_helpers.get_logger(__name__)


@op(
config_schema={
"clobber": Field(
bool, description="Clobber existing ferc1 database.", default_value=False
),
},
required_resource_keys={"ferc_to_sqlite_settings", "datastore"},
)
def dbf2sqlite(context) -> None:
"""Clone the FERC Form 1 Visual FoxPro databases into SQLite."""
# TODO(rousik): this thin wrapper seems to be somewhat quirky. Maybe there's a way
# to make the integration # between the class and dagster better? Investigate.
logger.info(f"dbf2sqlite settings: {context.resources.ferc_to_sqlite_settings}")

extractors = [
Ferc1DbfExtractor,
Ferc2DbfExtractor,
Ferc6DbfExtractor,
Ferc60DbfExtractor,
]
for xclass in extractors:
xclass(
datastore=context.resources.datastore,
settings=context.resources.ferc_to_sqlite_settings,
clobber=context.op_config["clobber"],
output_path=PudlPaths().output_dir,
).execute()
ALL_DBF_EXTRACTORS = [
Ferc1DbfExtractor,
Ferc2DbfExtractor,
Ferc6DbfExtractor,
Ferc60DbfExtractor,
]
Loading
Loading