Skip to content

Commit

Permalink
CI maintenance updates and adding CI Unit Tests (NOAA-EMC#2740)
Browse files Browse the repository at this point in the history
This PR has a few maintenance updates to the CI pipeline and adds a test
directory with Unit Tests
**Major Maintenance updates:**
- Added try blocks with appropriate messaging to GitHub PR of failure
for:
- - **scm** checkout
- - build fail (with error logs sent as gists)
- - create experiment fails with `stderr` sent to GitHub PR messaging
- Pre-stage FAILS from the above are now captured these fails allow
FINALIZE to update the label to FAIL (i.e. no more "hanging" CI state
labels in GitHub - see image below)
**Minor Maintenance updates:**
- Fix for STALLED cases reviled from PR 2700 (just needed a lambda
specifier)
- Fixed path to experiment directory in PR message (had dropped EXPDIR
in path)
- Needed `latin-1` decoder in reading log files for publishing
**Added python Unit Tests for CI functionality:**
- Installed **Rocoto** and **wxfow** in GitHub Runner for testing key CI
utility codes
- Cashed the install of Rocoto in the GitHub Runners to greatly reduce
stetup time for running the unit tests
- Unit Tests Python scripts added
- `test_rocostat.py`: rocoto_statcount() rocoto_summary()
rocoto_stalled()
  - `test_setup.py`: setup_expt() test_setup_xml()
  - `test_create_experment`: test_create_experiment()
  - - Runs all PR cases that do not have ICs in the GItHub Runner
- Reporting mechanism in the Actions tab for Python Unit Testing results
- Test case data for STALLED and RUNNING stored on S3 and pulled using
wget during runtime of tests
  • Loading branch information
TerrenceMcGuinness-NOAA authored Jul 11, 2024
1 parent 5ef4db7 commit 4968f3a
Show file tree
Hide file tree
Showing 9 changed files with 327 additions and 15 deletions.
64 changes: 64 additions & 0 deletions .github/workflows/ci_unit_tests.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,64 @@
name: CI Unit Tests
on: [pull_request, push, workflow_dispatch]

jobs:

ci_pytest:
runs-on: ubuntu-latest
name: Run unit tests on CI system
permissions:
checks: write

steps:
- name: Checkout code
uses: actions/checkout@v4

- name: Set up Python
uses: actions/setup-python@v5
with:
python-version: 3.11.8

- name: Install dependencies
run: |
sudo apt-get update
sudo apt-get install -y perl libxml-libxml-perl libxml-libxslt-perl libdatetime-perl
python -m pip install --upgrade pip
pip install pytest
pip install wxflow
pip install wget
- name: Cache Rocoto Install
uses: actions/cache@v4
with:
path: ~/rocoto
key: ${{ runner.os }}-rocoto-${{ hashFiles('**/ci-unit_tests.yaml') }}

- name: Install Rocoto
run: |
if [ ! -d "$HOME/rocoto/bin" ]; then
git clone https://github.com/christopherwharrop/rocoto.git $HOME/rocoto
cd $HOME/rocoto
./INSTALL
fi
echo "$HOME/rocoto/bin" >> $GITHUB_PATH
- name: Run tests
shell: bash
run: |
sudo mkdir -p /scratch1/NCEPDEV
cd $GITHUB_WORKSPACE/sorc
git submodule update --init --recursive
./link_workflow.sh
cd $GITHUB_WORKSPACE/ci/scripts/tests
ln -s ../wxflow
pytest -v --junitxml $GITHUB_WORKSPACE/ci/scripts/tests/test-results.xml
- name: Publish Test Results
if: always()
uses: EnricoMi/publish-unit-test-result-action@v2
with:
files: ci/scripts/tests/test-results.xml
job_summary: true
comment_mode: off
29 changes: 19 additions & 10 deletions ci/Jenkinsfile
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@ pipeline {

options {
skipDefaultCheckout()
//parallelsAlwaysFailFast()
parallelsAlwaysFailFast()
}

stages { // This initial stage is used to get the Machine name from the GitHub labels on the PR
Expand Down Expand Up @@ -90,9 +90,6 @@ pipeline {
stage('3. Build System') {
matrix {
agent { label NodeName[machine].toLowerCase() }
//options {
// throttle(['global_matrix_build'])
//}
axes {
axis {
name 'system'
Expand All @@ -102,6 +99,7 @@ pipeline {
stages {
stage('build system') {
steps {
catchError(buildResult: 'UNSTABLE', stageResult: 'FAILURE') {
script {
def HOMEgfs = "${CUSTOM_WORKSPACE}/${system}" // local HOMEgfs is used to build the system on per system basis under the custome workspace for each buile system
sh(script: "mkdir -p ${HOMEgfs}")
Expand All @@ -120,8 +118,8 @@ pipeline {
if (env.CHANGE_ID) {
sh(script: """${GH} pr comment ${env.CHANGE_ID} --repo ${repo_url} --body "Checkout **Failed** on ${Machine}: ${e.getMessage()}" """)
}
echo "Failed to checkout: ${e.getMessage()}"
STATUS = 'Failed'
error("Failed to checkout: ${e.getMessage()}")
}
def gist_url = ""
def error_logs = ""
Expand Down Expand Up @@ -155,6 +153,7 @@ pipeline {
} catch (Exception error_comment) {
echo "Failed to comment on PR: ${error_comment.getMessage()}"
}
STATUS = 'Failed'
error("Failed to build system on ${Machine}")
}
}
Expand All @@ -174,14 +173,17 @@ pipeline {
}
}
}
}
}
}
}
}
}

stage('4. Run Tests') {
failFast false
when {
expression { STATUS != 'Failed' }
}
matrix {
agent { label NodeName[machine].toLowerCase() }
axes {
Expand All @@ -198,22 +200,28 @@ pipeline {
expression { return caseList.contains(Case) }
}
steps {
catchError(buildResult: 'UNSTABLE', stageResult: 'FAILURE') {
script {
sh(script: "sed -n '/{.*}/!p' ${CUSTOM_WORKSPACE}/gfs/ci/cases/pr/${Case}.yaml > ${CUSTOM_WORKSPACE}/gfs/ci/cases/pr/${Case}.yaml.tmp")
def yaml_case = readYaml file: "${CUSTOM_WORKSPACE}/gfs/ci/cases/pr/${Case}.yaml.tmp"
system = yaml_case.experiment.system
def HOMEgfs = "${CUSTOM_WORKSPACE}/${system}" // local HOMEgfs is used to populate the XML on per system basis
env.RUNTESTS = "${CUSTOM_WORKSPACE}/RUNTESTS"
sh(script: "${HOMEgfs}/ci/scripts/utils/ci_utils_wrapper.sh create_experiment ${HOMEgfs}/ci/cases/pr/${Case}.yaml")
try {
error_output = sh(script: "${HOMEgfs}/ci/scripts/utils/ci_utils_wrapper.sh create_experiment ${HOMEgfs}/ci/cases/pr/${Case}.yaml", returnStdout: true).trim()
} catch (Exception error_create) {
sh(script: """${GH} pr comment ${env.CHANGE_ID} --repo ${repo_url} --body "${Case} **FAILED** to create experment on ${Machine}\n with the error:\n\\`\\`\\`\n${error_output}\\`\\`\\`" """)
error("Case ${Case} failed to create experment directory")
}
}
}
}
}

stage('Run Experiments') {
when {
expression { return caseList.contains(Case) }
}
failFast false
steps {
script {
HOMEgfs = "${CUSTOM_WORKSPACE}/gfs" // common HOMEgfs is used to launch the scripts that run the experiments
Expand Down Expand Up @@ -255,11 +263,11 @@ pipeline {
STATUS = 'Failed'
try {
sh(script: """${GH} pr edit ${env.CHANGE_ID} --repo ${repo_url} --remove-label "CI-${Machine}-Running" --add-label "CI-${Machine}-${STATUS}" """, returnStatus: true)
sh(script: """${GH} pr comment ${env.CHANGE_ID} --repo ${repo_url} --body "Experiment ${Case} **FAILED** on ${Machine} in\n\\`${CUSTOM_WORKSPACE}/RUNTESTS/${pslot}\\`" """)
sh(script: """${GH} pr comment ${env.CHANGE_ID} --repo ${repo_url} --body "Experiment ${Case} **FAILED** on ${Machine} in\n\\`${CUSTOM_WORKSPACE}/RUNTESTS/EXPDIR/${pslot}\\`" """)
} catch (Exception e) {
echo "Failed to update label from Running to ${STATUS}: ${e.getMessage()}"
}
error("Failed to run experiments ${Case} on ${Machine}")
echo "Failed to run experiments ${Case} on ${Machine}"
}
}
}
Expand All @@ -268,6 +276,7 @@ pipeline {
}
}
}

stage( '5. FINALIZE' ) {
agent { label NodeName[machine].toLowerCase() }
steps {
Expand Down
2 changes: 1 addition & 1 deletion ci/cases/yamls/gefs_ci_defaults.yaml
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
defaults:
!INC {{ HOMEgfs }}/parm/config/gefs/yaml/defaults.yaml
base:
HPC_ACCOUNT: {{ 'HPC_ACCOUNT' | getenv }}
ACCOUNT: {{ 'HPC_ACCOUNT' | getenv }}
29 changes: 29 additions & 0 deletions ci/scripts/tests/test_create_experiment.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,29 @@
from wxflow import Executable
from shutil import rmtree
import os
import copy

_here = os.path.dirname(__file__)
HOMEgfs = os.sep.join(_here.split(os.sep)[:-3])
RUNDIR = os.path.join(_here, 'testdata/RUNDIR')


def test_create_experiment():

create_experiment_script = Executable(f'{HOMEgfs}/workflow/create_experiment.py')
yaml_dir = yaml_dir = os.path.join(HOMEgfs, 'ci/cases/pr')
env = os.environ.copy()
env['RUNTESTS'] = RUNDIR

for case in os.listdir(yaml_dir):
if case.endswith('.yaml'):
with open(os.path.join(yaml_dir, case), 'r') as file:
file_contents = file.read()
if 'ICSDIR_ROOT' not in file_contents:
create_experiment = copy.deepcopy(create_experiment_script)
create_experiment.add_default_arg(['-y', f'../../cases/pr/{case}', '--overwrite'])
env['pslot'] = os.path.splitext(case)[0]
create_experiment(env=env)
assert (create_experiment.returncode == 0)

rmtree(RUNDIR)
90 changes: 90 additions & 0 deletions ci/scripts/tests/test_rocotostat.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,90 @@
import sys
import os
from shutil import rmtree
import wget

script_dir = os.path.dirname(os.path.abspath(__file__))
sys.path.append(os.path.join(os.path.dirname(script_dir), 'utils'))

from rocotostat import rocoto_statcount, rocotostat_summary, is_done, is_stalled, CommandNotFoundError
from wxflow import which

test_data_url = 'https://noaa-nws-global-pds.s3.amazonaws.com/data/CI/'

testdata_path = 'testdata/rocotostat'
testdata_full_path = os.path.join(script_dir, testdata_path)


if not os.path.isfile(os.path.join(testdata_full_path, 'database.db')):
os.makedirs(testdata_full_path, exist_ok=True)
workflow_url = test_data_url + str(testdata_path) + '/workflow.xml'
workflow_destination = os.path.join(testdata_full_path, 'workflow.xml')
wget.download(workflow_url, workflow_destination)

database_url = test_data_url + str(testdata_path) + '/database.db'
database_destination = os.path.join(testdata_full_path, 'database.db')
wget.download(database_url, database_destination)

try:
rocotostat = which('rocotostat')
except CommandNotFoundError:
raise CommandNotFoundError("rocotostat not found in PATH")

rocotostat.add_default_arg(['-w', os.path.join(testdata_path, 'workflow.xml'), '-d', os.path.join(testdata_path, 'database.db')])


def test_rocoto_statcount():

result = rocoto_statcount(rocotostat)

assert result['SUCCEEDED'] == 20
assert result['FAIL'] == 0
assert result['DEAD'] == 0
assert result['RUNNING'] == 0
assert result['SUBMITTING'] == 0
assert result['QUEUED'] == 0


def test_rocoto_summary():

result = rocotostat_summary(rocotostat)

assert result['CYCLES_TOTAL'] == 1
assert result['CYCLES_DONE'] == 1


def test_rocoto_done():

result = rocotostat_summary(rocotostat)

assert is_done(result)

rmtree(testdata_full_path)


def test_rocoto_stalled():
testdata_path = 'testdata/rocotostat_stalled'
testdata_full_path = os.path.join(script_dir, testdata_path)
xml = os.path.join(testdata_full_path, 'stalled.xml')
db = os.path.join(testdata_full_path, 'stalled.db')

if not os.path.isfile(os.path.join(testdata_full_path, 'stalled.db')):
os.makedirs(testdata_full_path, exist_ok=True)
workflow_url = test_data_url + str(testdata_path) + '/stalled.xml'
database_url = test_data_url + str(testdata_path) + '/stalled.db'

workflow_destination = os.path.join(testdata_full_path, 'stalled.xml')
wget.download(workflow_url, workflow_destination)

database_destination = os.path.join(testdata_full_path, 'stalled.db')
wget.download(database_url, database_destination)

rocotostat = which('rocotostat')
rocotostat.add_default_arg(['-w', xml, '-d', db])

result = rocoto_statcount(rocotostat)

assert result['SUCCEEDED'] == 11
assert is_stalled(result)

rmtree(testdata_full_path)
89 changes: 89 additions & 0 deletions ci/scripts/tests/test_setup.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,89 @@
from wxflow import Executable, Configuration, ProcessError
from shutil import rmtree
import pytest
import os

_here = os.path.dirname(__file__)
HOMEgfs = os.sep.join(_here.split(os.sep)[:-3])
RUNDIR = os.path.join(_here, 'testdata/RUNDIR')
pslot = "C48_ATM"
account = "fv3-cpu"
foobar = "foobar"


def test_setup_expt():

arguments = [
"gfs", "forecast-only",
"--pslot", pslot, "--app", "ATM", "--resdetatmos", "48",
"--comroot", f"{RUNDIR}", "--expdir", f"{RUNDIR}",
"--idate", "2021032312", "--edate", "2021032312", "--overwrite"
]
setup_expt_script = Executable(os.path.join(HOMEgfs, "workflow", "setup_expt.py"))
setup_expt_script.add_default_arg(arguments)
setup_expt_script()
assert (setup_expt_script.returncode == 0)


def test_setup_xml():

setup_xml_script = Executable(os.path.join(HOMEgfs, "workflow/setup_xml.py"))
setup_xml_script.add_default_arg(f"{RUNDIR}/{pslot}")
setup_xml_script()
assert (setup_xml_script.returncode == 0)

cfg = Configuration(f"{RUNDIR}/{pslot}")
base = cfg.parse_config('config.base')
assert base.ACCOUNT == account

assert "UNKNOWN" not in base.values()

with open(f"{RUNDIR}/{pslot}/{pslot}.xml", 'r') as file:
contents = file.read()
assert contents.count(account) > 5

rmtree(RUNDIR)


def test_setup_xml_fail_config_env_cornercase():

script_content = ('''#!/usr/bin/env bash
export HOMEgfs=foobar
../../../workflow/setup_xml.py "${1}"\n
''')

with open('run_setup_xml.sh', 'w') as file:
file.write(script_content)
os.chmod('run_setup_xml.sh', 0o755)

try:
setup_xml_script = Executable(os.path.join(HOMEgfs, "ci", "scripts", "tests", "run_setup_xml.sh"))
setup_xml_script.add_default_arg(f"{RUNDIR}/{pslot}")
setup_xml_script()
assert (setup_xml_script.returncode == 0)

cfg = Configuration(f"{RUNDIR}/{pslot}")
base = cfg.parse_config('config.base')
assert base.ACCOUNT == account

assert foobar not in base.values()
assert "UNKNOWN" not in base.values()

with open(f"{RUNDIR}/{pslot}/{pslot}.xml", 'r') as file:
contents = file.read()
assert contents.count(account) > 5

except ProcessError as e:
# We expect this fail becuse ACCOUNT=fv3-cpu in config.base and environment
pass
except Exception as e:
# If an exception occurs, pass the test with a custom message
pytest.fail(f"Expected exception occurred: {e}")

finally:
# Cleanup code to ensure it runs regardless of test outcome
os.remove('run_setup_xml.sh')
try:
rmtree(RUNDIR)
except FileNotFoundError:
pass
Loading

0 comments on commit 4968f3a

Please sign in to comment.