cmake/CMakeKitwareBacklog.txt

--------------------------------------------------------------
Backlog issues for the Trilinos CMake Kitware Contract Work
--------------------------------------------------------------

This is a list of prioritized and unprioritized backlog issues for the
contract with Kitware to improve issues related to the CMake suite of tools
needed for Trilinos.  These items are prioritized both by categories (P1 to
P5) and are also prioritized within a category by position (most
urgent/important on top).  The items are segregated into
different categories depending on what will need explicit changes to
CMake/CTest/CDash and those items that we can do okay without making any
changes to the tools.


A) Prioritized Backlog Items:
------------------------------


A.1) Prioritized Backlog Items that require changes to CMake/CTest/CDash:
-------------------------------------------------------------------------


(*) [CDash] [P3] Separate out coverage results for library code and
for test/example code: Currently, our coverage shows lower coverage
than it should due to some test code logic not getting tested always
(like printing when a test fails).  What would be needed in
CMake/CTest is to be able to tag source files as 'TEST' code and that
would result in coverage statistics being gathered for those files
separately from the renaming code which is assumed to be production
code (library code in the case of Trilinos).  CDash would then need to
write two columns in the coverage results, one for 'Production' code,
and one for 'Test' code.


(*) [CMake] [P3] Install path is being searched for TPLs: The Trilinos install
path (specified by CMAKE_INSTALL_PREFIX) is being searched when the
include/library paths are searched for TPLs, moreover this path is searched
before other systems paths.  Rarely would one think that the install of
Trilinos would contain a TPL, but it does in fact when TriKota is enabled.  
https://software.sandia.gov/bugzilla/show_bug.cgi?id=4627


(*) [CTest] [P4] Support an EXEC_DEPENDENCIES property for tests to deal with
'Not Run' when running driver scripts: Currently, ctest just looks at the
direct command that is invoked to determine if the test can be run or can not
be run.  This is not good because even MPI tests are shown to fail when the
executable is not built but mpiexec of course is present.  Running user
scripts that call built executables like cmake -P scripts also return 'Fail'
would they should be 'Not Run'.  One way to address this is to add a new test
property called something like EXEC_DEPENDENCIES that can be set to a list of
command that must all exist when the test is run or the test will be marked as
'Not Run' ... This should already be done (see the test property ???).


(*) [CDash] [P4] Implement a tool that will go back in time and delete
older test results from the SQL database and/or reduce the size of
files referenced in the SQL database: However, we would want to save
all data for important events like a test going from passing to
failing or failing to passing.  You can get quite sophisticated with
this but implementing the basic types of logic is not hard.
Workaround: We can implement this all ourselves independent of
CTest/CDash by working directly with the MySQL database but that will
be complex and fragile as the CDash MySQL table structure changes with
upgrades.  We also want to have different criteria for cleaning out
Nightly verses Continuous verses Experimental data.  For example, all
Experimental data will be wiped out after two days.  Continuous data
would be thinned out after 3 days and removed all together after two
weeks.


(*) [CDash] [P4] Get CDash to periodically clean out older log
history.


(*) [CDash] [P4] Get CDash to deal correctly with subprojects when considering
the "last build" w.r.t. the number of warnings.  Note that CDash has been
fixed in this w.r.t. the number of passing and failing tests which is the more
important metric.


(*) [CMake] [P3] Add support for drop-down 'ON', 'OFF', '' for the function
SET_CACHE_ON_OFF_EMPTY(...) in all the GUI interfaces (e.g. curses).  This
items has been brought up independently by multiple users.  Note this is
already supported by the QT GUI.


(*) [CDash] [P5] Support for user-defined configure, build, and test status:
Currently, CTest/CDash only supports the test statues of 'Passed', 'Failed',
and 'Not Run'.  There are many projects, however, that would like to support
finer grained test status like 'Diff','Segfault', and 'Timeout'.  It would be
good if CTest/CDash would allow you to define new categories of test statuses
and then display then on the CDash dashboard and new columns and in each test.
Also, when a package fails to configure because it was disabled it would be
nice if this had a special status.  It would also be good if a user-defined
status could also define a color and for a 'Legend' button on CDash to give
the mapping of status to color.  Good colors to choose from would be 'green',
'red', 'yellow', 'orange', 'pink' and perhaps other shades from 'green' to
'red'.


(*) [CTest, CDash] [Epic] Implement strong automated testing of CTest/CDash:
Currently, most testing of CTest and especially CDash is done manually.  This
results in a lot of extra manual work and allows defects to creep in very
easily.  Some type of unit testing infrastructure of CTest/CDash needs to be
devised that can automatically and strongly test important CTest/CDash
behaviors and features.  We would like all new features implemented to be
strongly tested with automated unit tests if feasible.

- (2010/06/15) Kitware has now implemented a coverage tool for PHP that they
are now using to test the coverage of CDash testing.  The coverage is
currently very low but at least now they can add more tests and bring this
coverage up.  The assumption would be that all new CDash capabilities would be
unit tested as (or before) they are implemented.


A.2) Prioritized Backlog Items that do *not* need changes in CMake/CTest/CDash:
---------------------------------------------------------------------------------


(*) [CTest] [P2] Add a --rerun-failed option to 'ctest' to make CTest only
rerun failing tests and then create a new output file that contains all of the
previously run passing tests as well as the newly run test results.  When you
have a very large test suite you sometimes need to rerun a few failing tests
but want to assume that the others will still be okay.  This happens, for
instance, when there are some random annoying MPI errors that go away when you
run the tests again.  This will become more critical as more people do proper
pre-checkin testing with the checkin-test.py script.  It would be nice if
Kitware could add this to CTest proper.  Workaround: We could write our own
code to read the LastTestFailed.log file, re-run just those tests, and then
write a new LastTest.log file replacing the tests that got rerun.  This tool
would not be all that hard to write in an external tool actually, but again,
this is a feature that should go into CTest proper.


(*) [CDash] [P3] General handling of data files in different test directories:
Right now Ctest/CDash only handle STDOUT for tests and not any other files
that might get written.  Therefore, currently, we would have to manually
handle copying test directories and data over to a central server where it can
be accessed from CDash for each test.  It would make a lot of sense if
Ctest/CDash could be extended to directly deal with test data related to
specific tests and handle copying it over to some system where it would be
stored and put in links to it from the dashboard test results pages
automatically.  It would also be useful if CTest could optionally create
sub-directories for tests to run in so that the output files would be kept
separate for copy to the CDash server and for later inspection.  Workaround:
Write scripts and handle everything on our own.  However, CDash would not
display the list of files without some work.  However, you might be able to
use the idea of "measures" to list the files that need to be listed.


(*) [CTest] [P4] Support of batch-style job queuing systems: With large-scale
MPPs and even modest Linux clusters, a batch queuing system like PBS needs to
be used to schedule and launch jobs.  It would be very useful if ctest could
someone support tests that run in batch mode where a bunch of tests are queued
up and then it poles for the completion of the tests and even times them out
(by calling 'qsub -kill') if they are taking too long.  Workaround: This could
be scripted on top of CTest by creating two tests for each submission, one
that launches the test with qsub and one that waits for the job to finish
(with an appropraite time out).  You could run these with a very large -jN to
make all the tests run and then wait. The only thing we would loose is the
ability to set the correct run time but that is a minor issue.  Also, we might
be able to manipulate the submission XML to set the correct time.  This would
be easier if some basic changes were made to CTest but we can do this on our
own.


(*) [CMake] [P4] Improve BLAS library detection.  Brad said there is a BLAS
finding macro that we are currently not using.  It would also be nice if the
BLAS finding capability could handle finding library dependencies of BLAS
(e.g. -lgfortran for static Atlas installations).  It would also be nice if we
could automatically detect the name mangling for the BLAS and LAPACK libraries
independently from the detection of name mangling for the configured Fortran
compiler.  Bascially, you would use the same strategy as with the Fortran
compiler name mangling detection which is to just try out a bunch of
combinations until you find something that works.  To be really safe, you
should also try to run a test problem that calls a BLAS and LAPACK routine to
see if it worked.  Also, really BLAS and LAPACK can be seprate libraries so we
should have seprate logic for each.  What a pain.


B) Unprioritized items:
-----------------------


B.1) Unprioritized Backlog Items that require changes to CMake/CTest/CDash:
---------------------------------------------------------------------------


(*) [CMake] When cross compiling it would be nice to have the ability to specify
absolute directories that contain libraries and headers instead of having to
rely on the prefix only system. This is a problem for many of our TPLS and since
the "PATHS" variable that is passed into find commands is considered part of the
host environment we can't over ride it without using the "BOTH" option for
libraries and headers. One potential solution to this could be to make
corallaries to the CMAKE_INCLUDE_PATH and CMAKE_LIBRARY_PATH variables that are
only used when cross compiling. It would allow us the safety of "ONLY" using the
target environment for libraries and headers while allowing us to have directory
structures that do not match what cmake expects.


(*) [CMake] Strong checking for user input misspelling CMake cache variables:
Currently, if a user misspells the name of a defined user cache variable, the
default for that variable will be used instead and the misspelled user
variable will be ignored.  This is very bad behavior that is carried over from
the make and autotools systems and should not be repeated with the CMake
system.  It would be very useful if the cmake executable could take a new
option (e.g. --validate-cache-variables) or if an internal cache variable
could be set (like CMAKE_VALIDATE_INPUT_CACHE_VARIABLES) that would force the
validation of all user-set cache variables to make sure that they had a
matching internally defined cache variable.  At the very least, CMake could
just print out all of the user-set cache variables that never got used.  This
would be easy to implement and go a long way to helping users catch spelling
errors.  Workaround: I am not sure there is any robust workaround for this
problem.  Only a built-in capability to CMake can address this issue fully.


(*) [CDash] Have CDash log each call to each PHP script storing a) what the
arguments where, b) when the script was called, c) who called the script, and
d) how long the script took to complete.  From these logs, we can do simple
queries and compute simple statistics for responce time (e.g. min time, max
time, mean time, standard deviation, etc.) for each of the PHP scripts.  We
can also use this information to see if the server running CDash is having
problems with responsiveness (i.e. where the same query takes radically
different amounts of time).  We would also be able to use this info to see if
there are certain CDash queries that are habitually taking too long.


(*) [CDash] Add a general query filter for the main
project/subprojects page: Currently, the main project/subprojects page
shows all results for all builds for the current day for all nightly
builds (and optionally other builds).  When a single build on one
machine fails in a catastrophic way, it pollutes the entire list of
subproject results.  If you could filter out bad builds then you could
get the real picture of how the package builds is doing.  Also,
querying over multiple days or specific machines would also be very
useful.


(*) [CTest, CDash] Show the total run-time and timeouts for "Dynamic Analysis"
and "Coverage" tests on the dashboard in some way: Right now we can only see
the total run-time for the build and the regular tests on the CDash dashboard.
These tests take a long time to run and we need to see that.  We also need to
see every 'Dynamic Analysis' test that times out on the dashboard.  Currently,
if a MEMCHECK test times out, there is no record in CDash that it was ever
even run.  If a dynamic analysis tests times out, we need to know it and it
needs to be recorded as a failing MEMCHECK test (and we need to get an error
email from CDash).


(*) [CTest, CDash] Send out CDash error notification emails for failed
coverage tests and failed Dynamic Analysis tests: We need to be
notified when 'Dynamic Analysis' tests don't run or don't finish.


B.2) Unprioritized Backlog Items that do *not* need changes in CMake/CTest/CDash:
---------------------------------------------------------------------------------


(*) [CMake] Come up with a reasonable documenatation system that can
externalize and organize all of the basic Trilinos and package options like
you get with autotools.  Workaround: We could do this externally by
configuring Trilinos and then reading the CMakeCache.txt file and extracting
out all of the various options and documentation (which are also in the file).
We would just need to ignore internal cache variables and would be able to
seprate regular from advanced options.  This would not take very long to
create and we could generate this documenation as part of a nightly testing
process and provide a doxygen document that would then be compiled into
various forms (along with the existing TrilinosCMakeQuickstart.txt document).


(*) [CTest, CDash] Overall time budgets for running package tests: Currently,
you can only put time limits on individual tests.  What would be more useful
would be to put time budget on a whole set of package tests.  All of the tests
that would be run after the overall time limit was reached would be listed as
'Not Run'.  This would allow package developers to group their tests into
different executables any way they would like while only the overall time
usage would be an issue ... This can be handled by running as a CTest test and
setting a time limit.


(*) [CTest] Support for multi-computer running of tests: CTest has the -jN
option that allows the parallel running of tests on a single machine but there
are some testing tools that can run tests in parallel over multiple similar
machines and therefore achieve much higher levels of parallelism.  Workaround:
We could just use remote SSH calls run tests on remote machines.  With a
little CMake code, would could set up to run different tests on different
machines without much trouble.  However, getting accurate timings for each
test would be a problem.  We would need the extension to handle batch-style
tests to really handle this well.


C) Wish list (nice to have but we can live without)
---------------------------------------------------


(*) [CMake] Strong internal checking for variables that are not defined: Just
treating undefined variables as being empty is a bad practice (used by Make
and bad Fortran 77).  This practice is well known to result in higher rates of
software defects.  Turning on strong checking for all of CMake may not be
possible because of a large number of existing modules that may rely on the
undefined-is-empty-and-is-false behavior.  Therefore, a more local solution
like turning on strong checking in individual functions and macros may be more
realistic.  The behavior of command like
CMAKE_ENABLE_STRONG_CHECKING(TRUE/FALSE) would have to be carefully designed
but it would make CMake code much more solid.  In cases where we wanted to
allow a variable to be undefined we would need to call a function like
MAKE_UNDEFINED_EMPTY(<VAR_NAME>) that would define the variable and make it
empty if it was not already set.  Workaround: Currently, we employ a
user-defined function ASSERT_DEFINED(...) which is called right before the
variable is used.  This is not ideal because it is verbose and you can
misspell the name of the variable in the two places which defeats the purpose.
This should be a very low priority item given all of the other issues that we
need to address.  We are not going to fix the CMake lanauge.


(*) [CMake] Support for true  function return values: CMake code would
be  much less verbose  and there  would be  less duplication  of CMake
could add support  for true function   return values.  For  example, a
variable could be set from a function with:

  SET(SOME_VAR ${SOME_USER_FUNC(...)})

or do an if statement like:

  IF (${SOME_BOOL_FUNC(...)})
    ...

That would allow for much cleaner code and less duplication. This should be a
very low priority given all of the other items to address.  We are not going
to fix the CMake language.


(*) [CMake] Support multi-computer builds: Current CMake can only
support parallel builds on a single machine through 'make -jN'.
However, there are build tools that can talk an informal cluster of
similar (i.e. identical) computers (with different IP addresses) and
use them to compile object files in parallel.  Work around: Use
'distgcc'.


(*) [CTest] Add an optional summary section that will show the number of tests
that passed and failed for each Trilinos package (i.e. subproject in current
CTest lingo).  This output might look like:

=========================================================

Start processing tests

...

100% tests passed, 0 tests failed out of 113

Total CPU time = 1050 sec

Summary of subproject tests:

  Teuchos: passed = 25, failed = 1, CPU time = 26 sec
  Epetra: passed = 48, failed=0, CPU time = 150 sec
  ...

=========================================================

Workaround: We could write our own post-processor to take the console output
from ctest and then write the summary results like this.


(*) [CDash] Support for selectively deleting whole sets of builds:
Currently CDash only supports removing one build at a time or all
builds for consecutive days.  The problem is that a single Trilinos
build case results in 40+ individual builds.  It can therefore take 15
minutes of lots of clicking to delete a single bad build case.  It
would be very useful if every build view, both the collapsed and
non-collapsed views, would support a "Delete All Shown Builds" button.
In that way would could query that builds we want to delete and delete
them in one shot.


(*) [CTest] Echo the console output to a summary file
Testing/Temporary/LastTestSummary.log: That way, we can get the same
information later given what is printed to the console.  In C++, this can be
implemented with a splitting stream.  We don't have one in Teuchos yet but we
could create one and then copy it over to the CTest sources.  Actually, I
think boost has one that could be extracted, renamed, simplified and then
moved into the CTest sources.  Workaround: People can just run ctest with '
2>&1 | tee ctest.out' at the end; no big deal.


(*) [CTest, CDash] Support the notion of "features" and tracking of
tests that exercise feature sets (from Martin Pilch): While code
coverage provides an important metric for software quality, a more
critical metric is coverage of features and capabilities used in a
specific simulation by an analyst. Capability is the physics invoked
in the particular simulation e.g., non-linear heat conduction with a
radiation boundary condition with a source term. Features are things
that enable that simulation e.g., tet elements, simulation run in
parallel, with adaptive mesh refinement.  We are interested in knowing
if these features and capabilities used in a specific simulation are
verified and what the gaps are.  Problems are often associated with
the interactions of features and capabilities e.g., adaptivity works
just fine except when it is combined with element death. Consequently,
We also need a measure of the verification (and gaps) of two way
interactions of features and capabilities.  It would be a really cool
if a verification report could be standardized and made available for
every simulation an analyst runs.


(*) [CTest, CDash] Find out why gcov and/or CTest/CDash is reporting false
line misses in the coverage testing: The coverage testing is reporting all
kinds of false line coverage line messes.  See the coverage results of
OptiPack and GlobiPack for instance.  This has nothing to do with CTest/CDash
but instead is an issue for the gcov tool.  We should look at bullseye as an
alternative.


(*) [CTest, CDash] Add graphs showing build and test times over
multiple days for collapsed builds and non-collapsed builds: This would
be driven by the query that is currently being used.  With this, you
can see how the build and run times for particular packages and entire
builds is changing over time.


(*) Show aggregate Trilinos coverage stats on the front results page, rather
than one package to run for the coverage run.  Right now, for each coverage
build, one Trilinos package's coverage stats are displayed on the front page.
A summary of all Trilinos coverage, and the ability to click through to the
package breakdown would be more useful.  Right now, we have to find the
coverage build in the nightly category, click into that, and scroll to the
bottom for package coverage numbers.