Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Set up GitHub Actions Workflow for Testing Parsl with Flux #3159

Merged
merged 73 commits into from
Jun 10, 2024

Conversation

mercybassey
Copy link
Contributor

@mercybassey mercybassey commented Mar 6, 2024

Description

This pull request introduces a new GitHub Actions workflow aimed at testing Parsl's integration with Flux.

Changed Behaviour

The primary focus of this PR is to enhance Parsl's testing infrastructure by integrating a new GitHub Actions workflow aimed at validating Parsl's compatibility and functionality with Flux.

Fixes

Fixes #2713

Type of change

Choose which options apply, and delete the ones which do not apply.

  • Enhancement

@mercybassey
Copy link
Contributor Author

mercybassey commented Mar 6, 2024

Hi @benclifford I've created the CI workflow for testing Flux with Parsl. Initially, the CI failed due to issues with bootstrap.sh and Reframe tests. I simplified the workflow by removing these elements and focused on setting up Flux. After these adjustments, the CI successfully ran with just Flux.

I then added Parsl installation with pip3 install parsl, which ran successfully. I believe this will lay the groundwork for future Parsl and Flux integration tests.

Kindly review. I am open to any feedback or suggestions for further improvements.

@mercybassey mercybassey marked this pull request as ready for review March 6, 2024 09:23
@benclifford
Copy link
Collaborator

benclifford commented Mar 6, 2024

ok, great.

here are some things that I think should happen next in this PR:

i) Install the "right" parsl version

These Github Actions are run to test specific new versions of parsl that only exist in pull requests. This step in your newly added Actions workflow checks out the correct version of Parsl (with whatever changes the submitter has made):

      - name: Checkout
        uses: actions/checkout@v3

but later on you install the latest official release of Parsl (as released on Pypi.org), not the version that was checked out from the pull request:

      - name: Install Parsl
        run: |
          pip3 install parsl

To install the version that is checked out by the checkout step (instead of using the version from Pypi), I think it will be enough to instead call pip3 install . - in that command, . means "install whatever is in the current directory, and we hope it will be parsl... if you look in .github/workflows/ci.yaml you should see both a checkout action and then later on some pip install . commands, and basically I think you should do something like that.

ii) Run some tests to check you installed parsl right.

You can check that the basic parsl install worked, before running anything with parsl+flux, with make local_thread_test (which actually just runs: pytest parsl/tests/ -k "not cleannet" --config parsl/tests/configs/local_threads.py --random-order --durations 10) - that should run for about 10 seconds, and end like this:

=============================================== 148 passed, 222 skipped, 4 warnings in 5.94s ================================================

You might encounter some problems here with stuff not being installed right (missing dependencies, for example)

iii) test parsl+flux

There's one basic flux test file here parsl/tests/test_flux.py so I think you should be able to run:

pytest parsl/tests/test_flux.py

Extras to think about after the above is working, but don't attempt until the basics above work:

iv) These tests have a problem that I dislike: if they detect flux is not working, they will not fail (and so CI will not fail, and so developers will not know that flux is no longer being tested). Probably in a separate PR after this PR is done, we can work on changing that behaviour.

v) Our other executors, we try to run a big range of general Parsl tests to check that lots of different parsl usage patterns and options work with this executor - that would be a useful thing to add in. If you look in the makefile, you will see various executors tested by running pytest with --config <some config> configured for different kinds of executors, and it would be good to make that work for a flux configuration too.

@mercybassey
Copy link
Contributor Author

mercybassey commented Mar 6, 2024

Hi @benclifford I've implemented the pip3 install parsl . and integrated the installation of dependencies from test-requirements.txt into the CI pipeline, so all necessary libraries for testing are installed which worked well.

Following that, I attempted to verify the Parsl installation by running a subset of the test suite as suggested (pytest parsl/tests/ -k "not cleannet" --config parsl/tests/configs/local_threads.py --random-order --durations 10). However, I've encountered a persistent issue where the test fails due to an unrecognized argument --random-order. After adjusting the test command to remove potentially problematic flags, it did not resolve the issue.

I'll be needing your suggestions on potential adjustments or alternative approaches to verify the Parsl installation. Your guidance will be greatly appreciated.

@benclifford
Copy link
Collaborator

The implementation for --random-order is installed from a extra pytest plugin module: https://pypi.org/project/pytest-random-order/

that's usually installed as part of the testing requirements... in the main Parsl dev instructions, that happens as part of "make deps", or you could change your test install to pip install . -r test-requirements.txt which will install both . (the parsl code under test) and also all the dependencies in test-requirements.txt

@mercybassey
Copy link
Contributor Author

mercybassey commented Mar 6, 2024

@benclifford During the CI run, I encountered a test failure in parsl/tests/test_bash_apps/test_stdout.py, which asserts a failure condition related to handling stdout and stderr paths for bash apps. This is happening in the Verify Parsl Installation step.
=================================== FAILURES =================================== _____________________________ test_bad_stderr_file _____________________________

@pytest.mark.issue363
def test_bad_stderr_file():
    """Testing bad stderr file"""

    err = "/bad/dir/t2.err"

    fn = echo_to_streams("Hello world", stderr=err)

    try:
        fn.result()
    except perror.BadStdStreamFile:
        pass
    else:
      assert False, "Did not raise expected exception BadStdStreamFile"
       AssertionError: Did not raise expected exception BadStdStreamFile
       assert False

parsl/tests/test_bash_apps/test_stdout.py:70: AssertionError ----------------------------- Captured stdout call ----------------------------- Hello world

Upon investigating, I identified that the failure was due to the test expecting an exception when attempting to write to a non-existent directory, but the exception was not raised as expected.

Could you please provide some guidance on how you would like me to address this test failure? Should I attempt to modify the test or the underlying functionality to resolve the issue, or is there a different approach you'd prefer?

@benclifford
Copy link
Collaborator

that test looks like this problem some people have had on WSL2: #3160

in that test, the assumption is that / will not be writable, which is true on a traditional multi-user unix system. but apparently not true for that WSL users in #3160, and looks like its not true in the flux container that you're using.

I think seeing this test break in two different places is good motivation to fix the test - maybe you're interested in a side-quest to fix #3160 (as a separate PR) which would help both the WSL users and this flux testing.

@mercybassey
Copy link
Contributor Author

mercybassey commented Mar 7, 2024

that test looks like this problem some people have had on WSL2: #3160

in that test, the assumption is that / will not be writable, which is true on a traditional multi-user unix system. but apparently not true for that WSL users in #3160, and looks like its not true in the flux container that you're using.

I think seeing this test break in two different places is good motivation to fix the test - maybe you're interested in a side-quest to fix #3160 (as a separate PR) which would help both the WSL users and this flux testing.

Okay, l'll take a look

@benclifford
Copy link
Collaborator

looks like recent tests are hitting that hang-at-exit problem again. I'll have a look at that right now.

@mercybassey
Copy link
Contributor Author

@benclifford the failing test passed this time.

@benclifford
Copy link
Collaborator

@benclifford the failing test passed this time.

yes annoying. but we can merge this PR, and just accept that these tests will sometimes fail this way until someone diagnoses and fixes whatever is broken. so I think if you do the tidyups of messy files in this PR, it would then be ready to merge.

@mercybassey
Copy link
Contributor Author

@benclifford the failing test passed this time.

yes annoying. but we can merge this PR, and just accept that these tests will sometimes fail this way until someone diagnoses and fixes whatever is broken. so I think if you do the tidyups of messy files in this PR, it would then be ready to merge.

@benclifford I have removed all the messy files.

@benclifford
Copy link
Collaborator

While trying to understand what's happening with hung runs, I realised that this PR does not record test artefacts in the same way that the main github actions workflow does. I put this commit onto my testing branch to capture a test artefact when a run finishes, with the hope that when I later see a hanging run it will give some more clues.

Artefacts like this have been really useful for debugging tests in general.

You can look in ci.yaml to see what I copied this from

a41b732

@mercybassey
Copy link
Contributor Author

Hi @benclifford. Are there any updates on this issue? I'd like to begin working on this one.

@benclifford
Copy link
Collaborator

@mercybassey you can work on the os x model before this PR #3159 is merged, on another branch - I need to write up an issue for what I discovered with flux.

Comment on lines 6 to 8
build:
runs-on: ubuntu-20.04
permissions:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is now 4 years out of date. Are we interested in using a more recent distro for the test and CI framework?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The flux container image is tagged for jammy which is the subsequent ubuntu LTS release (22.04). The most recent ubuntu LTS is 24.04 which was released just over a month ago, and it looks like there isn't a flux container for that (on https://hub.docker.com/r/fluxrm/flux-sched/tags).

So I'll upgrade this to 22.04 and see what happens.

@benclifford
Copy link
Collaborator

I pulled some of the test changes I made here into PR #3483 for merge first separately.

@benclifford benclifford merged commit 5973f39 into Parsl:master Jun 10, 2024
7 checks passed
@benclifford benclifford mentioned this pull request Jun 24, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Flux + parsl testing
4 participants