-
Notifications
You must be signed in to change notification settings - Fork 31
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use install option in GDASApp build #1302
Comments
This issue is an EE2 compliance issue being tracked on GDASApp issue #1254 |
As a test make the following modification to
Currently executing |
@RussTreadon-NOAA in my test I modified the build script as I believe the logic is flawed when you choose install. Here is the logic: # Build
echo "Building ..."
set -x
if [[ $BUILD_JCSDA == 'YES' ]]; then
make -j ${BUILD_JOBS:-6} VERBOSE=$BUILD_VERBOSE
else
builddirs="gdas iodaconv land-imsproc land-jediincr gdas-utils bufr-query"
for b in $builddirs; do
cd $b
make -j ${BUILD_JOBS:-6} VERBOSE=$BUILD_VERBOSE
cd ../
done
fi
# Install
if [[ -n ${INSTALL_PREFIX:-} ]]; then
echo "Installing ..."
set -x
make install
set +x
fi Note that the script loops over the For quicker build with install it needs to be something like: # Build
echo "Building ..."
set -x
if [[ -z ${INSTALL_PREFIX:-} ]]; then
if [[ $BUILD_JCSDA == 'YES' ]]; then
make -j ${BUILD_JOBS:-6} VERBOSE=$BUILD_VERBOSE
else
builddirs="gdas iodaconv land-imsproc land-jediincr gdas-utils bufr-query"
for b in $builddirs; do
cd $b
make -j ${BUILD_JOBS:-6} VERBOSE=$BUILD_VERBOSE
cd ../
done
fi
fi
set +x
# Install
if [[ -n ${INSTALL_PREFIX:-} ]]; then
echo "Installing ..."
set -x
make -j ${BUILD_JOBS:-6} install
set +x
fi I.e. skip the builddirs since they will be built anyway and let CMake figure out an optimal parallel strategy for building them. However note that it will always be a lot slower than the normal way of building because 1000s of tests will be built. |
Relooking at it you could combine BUILD_JCSDA with INSTALL_PREFIX. Perhaps that was the original intention? |
Thank you @danholdaway for explaining why install is taking so much time. I'm executing faulty logic. Let me try your suggestions. Another item on our TODO list is turning off all the JEDI tests (GDASApp issue #1269 |
Tested the suggested change to |
In that case it might be a prerequisite that the building of the JEDI tests can be turned off. We could proceed with this relatively easily but it's a feature that has met resistance in the past. I think some of that resistance comes from a difference in working styles. Those opposed argue that the cost of building JEDI is in the noise of the time it takes to run an experiment so why introduce the extra flag and complicating of the build infrastructure, which is already complicated. One idea I was playing around with before I went on leave is whether we could try to eliminate the need to build JEDI for most people. Then the issue might go away. I think it would have to be done in tandem with implementing more of a process for updating the JEDI hashes. Perhaps we could brainstorm this week? |
The Hercules build with
File First,
The two other files that reference
|
If the install option for the GDASApp build satisfies the EE2 executable requirement, then we should try to find a way to speed up the build with install. We don't need JEDI ctests when building and installing GDASApp for use in operations. If turning off JEDI ctests speeds up the build and install for operations, we should figure out a way to make this happen. |
@danholdaway , I agree with your brainstorming idea. The core GDASApp infrastructure team needs to develop a plan to work through the various items from the bi-weekly JEDI workflow sync meeting with EIB. |
Work for this issue will be done in feature/install |
@danholdaway recommended making the following change in
The default behavior is for JEDI ctests to be active (
|
Just chining in here since I saw this thread, I think we should basically require JCSDA core to accept PRs that make building the |
Add
Trying to figure out the source for the following tests
|
The usual command for adding a test is |
Tedious process but down to 104 ctests returned by |
What's the make time at this point? Perhaps we can have a few tests being built. If the changes become convoluted tests are likely to creep back in with future code changes anyway. |
The most recent build (configure & compile) on Hercules with 104 ctests took 36:50 (minutes:seconds). |
All the executables for the tests can be built in parallel (any many tests rely on executables built anyway) so it's possible that what you're seeing is correct. Yes there's a lot of tests but possibly dwarfed by the number of source files at this point. |
From what I can remember, the tests themselves are usually trivial to build, what can take some time are the executables that are only used for testing (mostly in UFO) but I think by just building |
Yes, even though the number of ctests has been drastically reduced, the |
@RussTreadon-NOAA perhaps we try this a slightly different way. Inverting and renaming the flag we would have: option( LIBRARY_ONLY_BUILD "Only build JEDI libraries and skip tests and executables" OFF ) Then switch what you've already done to be instead: if(NOT LIBRARY_ONLY_BUILD)
add_subdirectory( test )
endif() Then (for example) the following file: https://github.com/JCSDA-internal/fv3-jedi/blob/develop/src/CMakeLists.txt could be: add_subdirectory( fv3jedi )
if( NOT LIBRARY_ONLY_BUILD )
add_subdirectory( mains )
ecbuild_add_test( TARGET fv3jedi_test_tier1_coding_norms
TYPE SCRIPT
COMMAND ${CMAKE_BINARY_DIR}/bin/cpplint.py
ARGS --quiet --recursive ${CMAKE_CURRENT_SOURCE_DIR}
WORKING_DIRECTORY ${CMAKE_BINARY_DIR}/bin )
endif() You can grep on ' Sorry to ask for additional work but this might improve build time and should drain out the bin directory. One caveat is that I would expect JCSDA to be more resistant to this approach since it may have limited use outside our group. We have the special case of building gdas.x whereas everyone else relies on the executables that we would be turning off. |
Thank you @danholdaway for the suggestion. I'll make a new clone of feature/install and give this a try. |
Complete the following in a clone of feature/install at f49e2e6.
Build completed with following timestamps
The timestamp on the log file is Thu Oct 3 20:51. Configuring took about 4 minutes. Building took around 36 minutes. Installing ran 74 minutes before hitting an error. An
The directories have executables, libraries, module files, etc. I need to
|
@RussTreadon-NOAA this might help for the ioda-converter issue: JCSDA-internal/ioda-converters#1549 |
Thanks @CoryMartin-NOAA |
Manually added the path changes in JCSDA-internal/ioda-converters#1549 into the working copy of
|
Install shouldn't take 74 minutes as it's usually just copying all the files from build to install path. It makes it sound like more code is being built at that time. Can build and install be one step? cd build
ecbuild ../
make -j6 install |
@danholdaway , I didn't know we can specify parallel streams on the install. Let me add |
The key also is to not be issuing make more than once. If doing install it should only be done once, from the top level. |
Timings with
This translates to the approximate timings below
It may be possible to reduce the build time by being more aggressive with the The above work is being done in |
In your build.sh I don't think you need to be running the code in this block: # Build
echo "Building ... `date`"
set -x
if [[ $BUILD_JCSDA == 'YES' ]]; then
make -j ${BUILD_JOBS:-6} VERBOSE=$BUILD_VERBOSE
else
builddirs="gdas iodaconv land-imsproc land-jediincr gdas-utils bufr-query da-utils"
for b in $builddirs; do
cd $b
make -j ${BUILD_JOBS:-6} VERBOSE=$BUILD_VERBOSE
cd ../
done
fi
set +x This will make all the packages |
Refactor as @danholdaway suggested. Rebuild and reinstall
The configure took 3 minutes, The build took 40 minutes. The install took 1 minute, 17 seconds.
|
That sounds reasonable Russ. If you have time it would be good to know if the library only build makes a difference in this mode of running. And perhaps whether increasing the number of cores makes much difference. |
Below are timings for Notes:
Attempts using
|
why are the build times always longer in the feature branch than develop? Is it because it is building everything vs just some things? |
@CoryMartin-NOAA , I think you are right. The flip side of the faster selective build with I replaced three references to We're caught between
2 comes at the cost of modifying I could test option 3 - fast selective build using |
My hunch is that the slow install is because of the selective build, can you do a |
Made the following local modifications in
Execute
|
12:34 to run configure? wow! |
Strange. Is that consistent or was the machine just struggling at that moment? Does the feature branch have a difference in the configure? |
Thanks so much for going through the pains of testing and comparing all these ways of building and installing @RussTreadon-NOAA, tremendously helpful to see all this. What seems to pop out to me is that unfortunately a library only/no tests build doesn't really save all that much time in installing JEDI. JEDI has just become a behemoth of source code that takes an age to compile, and doesn't scale particularly well with processors. Note that the shared drives of HPCs may also not be the best place to see the fastest make times and may explain why the time even started to increase with more processors. And even with quite a bit of work it wasn't possible to turn off all the tests or prevent the bin directory from filling up with things. So ultimately this may not even really satisfy NCO's wishes to have empty or at least clean directories. It seems we would need (possibly a lot) more work to fully eliminate all tests and bin directory copies. It also may be never ending because little would prevent folks from putting tests/executables outside of the fences that we'd create in the CMake files. What do you all think, is this a fair assessment of what we're seeing? |
The |
@danholdaway , I agree with your assessment. Your last point is a major concern. Even if we complete the task of trimming down the configure, compile, and install to satisfy EE2 requirements, maintenance of this set up requires constant vigilance. It's not hard to imaging developers adding new tests or executables outside the blocked sections we added for EE2 compliance. The module approach get us closer to EE2 compliance. It does so, however, at the cost of not being development friendly. Assuming JEDI modules are released via spack-stack, developers are limited to what's in the stack unless they install their own jedi modules and adjust the GDASApp build accordingly. |
Even though configure, compile, and install take more than 30 minutes, does the final
We do this with the operational GSI build. Should we develop a similar |
Having the install is definitely needed as that eliminates the need to link the executables. We can just install to the GFSHOME directory and point to the executable there. I'm fine with adding to the script to keep only bin/gdas* files. |
Thank you @danholdaway for your comment. Enabling install requires minor updates to GDASApp, ioda-converters, and soca. I'll open issues to work on these updates. |
Thanks @RussTreadon-NOAA |
EE2 does not require executable to be copied to the job run directory
$DATA
. Executables can be referenced from$HOMEmodel/exec
.This issue is opened to use the cmake install option to copy GDASApp executables, modules, and libraries in directories more aligned with the EE2 requirement.
The text was updated successfully, but these errors were encountered: