Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Changes for finer granularity #13

Open
wants to merge 83 commits into
base: leave_docker_images
Choose a base branch
from

Conversation

davegill
Copy link
Owner

  1. We are back to chem tests 1, 2, 5. There was no timing performance advantage to split
    the tests apart. Howevr, using a BIGGER macine, with more cores for parallel compilation,
    was important.
  2. Zapped NMM.
  3. Split apart the 10, 13, 14, 15 tests. These all have serial components that run quite
    slowly. The may need to be further split.
  4. To support this whole splitting thing, remove the hard-coded "19", and actually count
    how many tests are being conducted.
  5. Introduce the FEATURE flag, in anticipation of usage with Kelly's feature (restart) tests.
  6. With Varun, rewrote the output "part.sh" file. This is to allow the code to be less
    hard-coded in the groovy scripts. This required calling the last_oinly_once script.
  7. The "last_only_once.csh" script now has an argument. This is the directory where the
    various SUCCESS* files are held.

modified: build.csh

1. We are back to chem tests 1, 2, 5. There was no timing performance advantage to split
the tests apart. Howevr, using a BIGGER macine, with more cores for parallel compilation,
was important.
2. Zapped NMM.
3. Split apart the 10, 13, 14, 15 tests. These all have serial components that run quite
slowly. The may need to be further split.
4. To support this whole splitting thing, remove the hard-coded "19", and actually count
how many tests are being conducted.
5. Introduce the FEATURE flag, in anticipation of usage with Kelly's feature (restart) tests.
6. With Varun, rewrote the output "part.sh" file. This is to allow the code to be less
hard-coded in the groovy scripts. This required calling the last_oinly_once script.
7. The "last_only_once.csh" script now has an argument. This is the directory where the
various SUCCESS* files are held.

modified:   build.csh
1. Zap a debugging "HI"
2. Indent the last_only_once.csh call
3. Delineate stanzas with additional line break.

modified:   build.csh
@davegill
Copy link
Owner Author

@vlakshmanan-scala
Varun,
Here are the proposed changes to build.csh.

  1. Note that now the last_only_once.csh file has this line:
cd $1 >& /dev/null
  1. The part.sh looks like this (for test 1 and 2). MUCH MUCH shorter. Good idea!
#########   Comparison.sh - autogenerated    ##############

        # This compares both serial vs openmp and serial vs mpi results


        sudo -S unzip /tmp/raw_output/OUTPUT_output_24.zip -d /tmp/raw_output/OUTPUT_1
        sudo -S unzip /tmp/raw_output/OUTPUT_output_25.zip -d /tmp/raw_output/OUTPUT_1
        sudo -S unzip /tmp/raw_output/OUTPUT_output_26.zip -d /tmp/raw_output/OUTPUT_1
        sudo -S cat /tmp/raw_output/output_24 /tmp/raw_output/output_25 /tmp/raw_output/output_26 | sudo tee -a /tmp/raw_output/final_output/output_1

        sudo -S ls -l /tmp/raw_output/OUTPUT_output_24/home/ubuntu/wrf-stuff/wrf-coop/OUTPUT | sudo tee -a /tmp/raw_output/final_output/output_1
        sudo -S ls -l /tmp/raw_output/OUTPUT_output_25/home/ubuntu/wrf-stuff/wrf-coop/OUTPUT | sudo tee -a /tmp/raw_output/final_output/output_1
        sudo -S ls -l /tmp/raw_output/OUTPUT_output_26/home/ubuntu/wrf-stuff/wrf-coop/OUTPUT | sudo tee -a /tmp/raw_output/final_output/output_1   

        ./last_only_once.csh /tmp/raw_output/OUTPUT_1 | sudo tee -a /tmp/raw_output/final_output/output_1

        sudo -S unzip /tmp/raw_output/OUTPUT_output_27.zip -d /tmp/raw_output/OUTPUT_2
        sudo -S unzip /tmp/raw_output/OUTPUT_output_28.zip -d /tmp/raw_output/OUTPUT_2
        sudo -S cat /tmp/raw_output/output_27 /tmp/raw_output/output_28 | sudo tee -a /tmp/raw_output/final_output/output_2

        sudo -S ls -l /tmp/raw_output/OUTPUT_output_27/home/ubuntu/wrf-stuff/wrf-coop/OUTPUT | sudo tee -a /tmp/raw_output/final_output/output_2
        sudo -S ls -l /tmp/raw_output/OUTPUT_output_28/home/ubuntu/wrf-stuff/wrf-coop/OUTPUT | sudo tee -a /tmp/raw_output/final_output/output_2   

        ./last_only_once.csh /tmp/raw_output/OUTPUT_2 | sudo tee -a /tmp/raw_output/final_output/output_2
  1. The chemistry tests are not consistently number with your re-ordering. I should probably do that internally, rather than in the groovy scripts.

modified:   build.csh
The purpose for this re-ordering is due to time constraints. The chemistry
tests take more than 3x as long to build as the other ARW tests. For $
concerns, all tests were moved to a cheaper (smaller) AWS instance. This
smaller machine type had fewer proessors, and therefore compiled the chem
tests too slowly. The decision was to bump up the machine type, but only
for chemistry job. The easiest way to handle the split machine assignment in
the groovy scripting is to make the chemistry tests the first job.

modified:   build.csh
This is MPI only, now. The first case is "basic".

modified:   build.csh
We need to have either ( SERIAL && OPENMP ) || ( SERIAL && MPI ) to call
the last_only_once script.

modified:   build.csh
To test this on the MMM Classroom machines, I need to have the feature test
one of the early tests, since the scripts are broken apart. The actual order
does not matter, as long as the first test is chemistry (which requires a
larger machine on AWS).

modified:   build.csh
Only one proc is supposed to change the permissions in the OUTPUT directory. So
put the chmod within an if test. Then, when the code is run on the MMM classroom,
all is OK, and also when run on AWS.

modified:   build.csh
In the single_end.csh, make sure to remove the image for wrf_regtest.
If this is laying around on the MMM classroom machines, we can't then
do another test.

modified:   build.csh
modified:   build.csh
The call to the feature_test.csh schript now has an argument for the
RUNDIR, which is used to output the correct name of the SUCCESS message.

modified:   build.csh
Fumble-fingered the quote for the feature_testing.csh call.

modified:   build.csh
modified:   build.csh

On branch regression+feature
Your branch is up to date with 'origin/regression+feature'.
You are currently cherry-picking commit 4a9d582.
Changes to be committed:
	modified:   build.csh
modified:   build.csh
This call has a single argument. That means - clean things up.

modified:   build.csh
1. It needs to only be handled when this is a feature test
2. The feature_testing.csh needs to be prefaced by the docker commands

modified:   build.csh
davegill and others added 30 commits November 9, 2021 20:20
modified:   README_add_feature_test.md
modified:   README_add_feature_test.md
Pull in the da_builds.csh file from the SCRIPTS directory.
This is used by test_000m.csh

modified:   Dockerfile
modified:   Dockerfile-second_part
modified:   Dockerfile-sed
Where are those SUCCESS files???

modified:   build.csh
Need to append home/ubuntu/wrf-stuff/wrf-coop/OUTPUT to the directory
location to where to find the SUCCESS files.

modified:   build.csh
fix test_000m.csh location of the famous "foo" file

modified:   build.csh
Orig New
--------
 1    1
 2    2
11    3
13    4
21    5
22    6

This allows a grouping of jobs at the head of the line that all can benefit from using
the more expensive AWS machines (more processors - but apparently more memory is the
big deal).

modified:   build.csh
modified:   build.csh
17AD done in real*8
No real need for specific move tests - no one does this EVER.

modified:   build.csh
For use with jenkins bringing back all SUCCESS_RUN_WRF files for comparison locally.

modified:   build.csh
This tests the combo of all_wrfvar and
./configure wrfplus
./compile wrfplus

and

./configure 4dvar
./compile all_wrfvar

modified:   build.csh
…csh (#14)

Added to the build.csh script to include new feature tests for diff_opt=2, km_opt=1,2,3, and nest starts later. 
These coincide with [PR#3 in wrf_feature_testing](davegill/wrf_feature_testing#3).
Already have nested cases, want a stochastic case.

modified:   build.csh
Move the 03ST case to a non-OpenMP suite

modified:   build.csh
We want to see a failure message if DA does not compile.

modified:   build.csh
The unique build number will allow multiple regtests to be run
at the same time

modified:   build.csh
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants