Skip to content

Diagnostics Flowchart

Andrew Lee edited this page Jun 16, 2023 · 9 revisions

Introduction

The diagnostics working group is building towards a new set of model diagnostics APIs and tools in IDAES. After numerous discussions with developers with a range of experience and backgrounds about how they diagnose their models, we have created a flowchart that summarizes the overall process.

This flowchart will be the basis for a new API (and eventually UI additions) in IDAES core.

Your feedback is needed! Please look at this flowchart and explanation (below it) and use this issue (#1208) to add questions and comments.

Flowchart

Flowchart

Explanation

Key Points:

  • Diagnostics is not something that should be done only when you have a problem which fails to solve. Diagnostics should be built into your model development workflow where you check for issues after each change to the model to ensure that you did not introduce and issues.

  • You should build your model up gradually from smaller parts and check for modeling issues at each stage; it is always easier to debug a small model and by making gradual changes you will know exactly which constraint introduced an issue and thus help guide you to finding the root cause.

  • You should always start with building and solving a square, steady-state problem even if you intend to eventually do optimization or dynamics. If the square, steady-state problem is poorly formulated and not robust, then there is no point trying to solve anything more complicated.

    • Never go straight to trying to model a dynamic system. All dynamic systems are based on an underlying steady-state system, but are much more complicated. Thus, if you cannot get the steady-state system setup correctly first, there is no way you will get the dynamic model working.
  • We strongly recommend keeping a “modeling logbook” similar to an experimental lab notebook which records what you did in each step and link this to a git commit (or other version control checkpoint). Linking each change to a commit allows you to revert changes back to a known point if thing get messed up, or to replicate an analysis you did on an earlier version of the model. You might not need this very often, but you will be very grateful for this log when you do.

  • Model diagnosis and debugging is a highly iterative process; you will see that the workflow frequently returns back to the “Revise Model” and “Commit Changes and Record in Logbook” steps at the start of the flowchart.

Explanation of Workflow

  1. Construct Simple Model
  • Build you IDAES/Pyomo model
  • It is always easier to diagnose smaller problems and to gradually add complexity so that you can easily identify which change introduced an issue.
  • Go to Step 2
  1. Set Degrees of freedom = 0
  • Determine which variables you wish to fix in order to have zero degrees of freedom in the model
  • Building your model gradually also makes it easier to identify degrees of freedom, as you can see when a change adds a degree of freedom and determine what best to fix.
  • Model development and testing should always with a square, steady-state model.
  • If the steady-state model does not solve or is poorly formulated, then there is no point in trying a more complex type of problem.
  • You should try to set the degrees of freedom to zero first (as best you can); there are tool later on in the workflow that can help identify if there are better choices you could make.
  • Go to Step 3
  1. Create Instance of Diagnostics Toolbox
  • Create an instance of the DiagnosticsToolbox class with your model as the model argument.
  • The IDAES Diagnostics Toolbox is a centralized location for accessing all the diagnostics and debugging tools.
  • Go to Step 4
  1. Commit Changes and Record in Logbook
  • Commit your changes so far using git (or your preferred version control tool)

  • Maintain a logbook of model development and debugging so that you can experiment and revert changes if necessary. We recommend you record the following:

    1. Time and date.
    2. Git hash (or equivalent) – this will let you rewind time to this point in future, and compare changes between this point and any other point.
    3. A record of the changes you made to the model in this commit.
    4. A record of the reasons why you made the changes, including any output from the diagnostics tools.
    5. If applicable, a summary of key results you saw using this version of the model. This will be useful for reference as you continue to develop the model to ensure that changes can be explained.
  • Go to Step 5

  1. Run Structural Analysis
  • Run the DiagnosticsToolbox.run_strucutral_analysis() method
  • The structural analysis tools look at only the model structure and thus do not depend on having a solution yet. Fixing these issues before trying to initialize and solve the model will greatly increase your chances of success.
  • Go to Step 6
  1. Did Structural Analysis return any Warnings?
  • Inspect the reports generated by the run_strucutral_analysis method and determine if any warning or cautions were identified
  • The tools will separate issues into “warnings” which are critical issues that need to be fixed and “cautions” which are things that are not necessarily critical but might cause issues or that users should be aware. We suggest that users take the time to consider all “cautions” and determine if they need to be addressed or if it is safe to leave them for now.
  • Yes: Go to Step 7
  • No: Go to Step 8
  1. Revise model to try to address warnings
  • Whilst many issues are easy to identify, determining how to fix them often is not and generally requires a combination of knowledge of numerical techniques and familiarity with the model. Thus, it is largely in the hands of the modeler to identify the correct course of action to correct any issues.
  • Considering the totality of the information available (diagnostics tools, solver logs, user knowledge), determine a list of possible causes and solutions.
  • Users should go through all the information available before deciding on a course of action; often the first issue identified is not the sole cause of the problem.
  • It is often necessary to experiment with multiple possible fixes to a model.
  • Sometimes, fixing one issue may in fact make the overall problem behavior appear worse. Do not discard a fix immediately just because it did not fix the immediate problem or if things appear to get worse; this often indicates a number of interacting issues and it is necessary to fix all of these in order to see an overall improvement.
  • Go to Step 4
  1. Initialize and Solve Model
  • This should only be done once there are no structural issues in the model; even if you can solve the model for a specific case with structural issues, it will likely give rise to poor numerical behavior which will complicate debugging numerical issues.
  • Go to Step 9
  1. Look at Solver Logs
  • Inspect all the output from the solver and check to see that the problem size appears correct look for indications of poor behavior.
  • The final solver state only tells a fraction of the story; you should always look at all of the solver output to get the full story.
  • The signs to look for will depend on the solver used; documentation on what to look for in IPOPT logs will be made available in the future.
  • Go to Step 10
  1. Was an “Optimal” solution found?
  • Did the solver (or final initialization step) return an “optimal” solution?
  • Yes: Go to Step 11
  • No: Go to Step 12
  1. Is the solution correct?
  • An “optimal” solution does not guarantee the solution is correct; you should also take a moment to check the solution and determine if it makes sense.
  • You should check to make sure that the outlet states make sense and are not taking extreme values.
  • If you had a previous solution to the model, check to make sure that any differences can be explained by the changes you made to the model. The modeling logbook will be useful for recording key states for future reference.
  • It is also a good idea to do a quick check for material and energy conservation.
  • Yes: Go to Step 14
  • No: Go to Step 7
  1. Was a solution returned at all?
  • Often, even if the solver fails to find an optimal solution it will return the last feasible solution it found. In these cases, the last feasible solution can be used for numerical analysis to determine why it might have failed to find an optimal solution.
  • In some cases however, the solver will terminate with a critical error and not return a solution. In these cases, we need to try to get a partial solution to the problem in order to try to debug any numerical issues.
  • Yes: Go to Step 14
  • No: Go to Step 13
  1. Get a Partial Solution to model
  • For cases where a solver has encountered a critical error and failure to return a solution, there are a few techniques we can use to try to get a partial solution.

    1. Review the solver output logs and identify the last iteration with a feasible solution (generally the one immediately before the solver terminated). Set the maximum number of iterations option for the solver to this number (exact argument will depend on the solver) and re-run the solve. Hopefully it will terminate at the maximum number of iterations and return the current solution.
    2. Alternatively, you can use the strongly connected component solver (scc_solver) in pyomo.contrib.incidence_analysis. This tool uses block decomposition techniques to decompose a larger problem into the maximum number of sub-problems that can be solved sequentially. This tool can be useful for initializing models and maybe able to get a solution where other solvers fail, but can also be useful for getting partial solutions and narrowing down where issues exist as it will solve as much of the problem sequentially as possible before failing on the first problematic sub-block.
  • Go to Step 14

  1. Run Numerical Analysis for nominal case
  • Run the DiagnosticsToolbox.run_numerical_analysis() method
  • This method will run a set of checks for possible numerical issues in the model.
  • There are many types of modeling issues that can occur as specific values for variables within a model, such as division by zero errors or attempts to violate bounds.
  • Go to Step 15
  1. Did Numerical Analysis return any Warnings?
  • Inspect the reports generated by the run_numerical_analysis method and determine if any warning or cautions were identified
  • Similar to the structural analysis tools, issues will be classified into warnings and cautions.
  • Yes: Go to Step 7
  • No: Go to Step 16
  1. Generate Robustness Samples
  • Use the model convergence analysis tools (included in the Diagnostics Toolbox) to generate a set of samples covering the full variable space of interest for robustness checking.
  • As numerical issues depend on the value of variables, they can generally only be detected when a solution is close to the problematic value. Thus, it is important to test your models across the full parameter space to ensure there are no hidden issues.
  • Go to Step 17
  1. Solve All Samples
  • Use the model convergence analysis tools to try to solve all the robustness samples.
  • Go to Step 18
  1. Look at solver Logs
  • As before, you should always inspect the full output of the solver logs (in this case for all the samples tested).
  • Go to Step 19
  1. Were there any Failed Cases?
  • As before, we need to check to see if any of the robustness samples failed to solve and if so if partial solutions were returned for those.
  • Yes: Go to Step 20
  • No: Go to Step 22
  1. Did All Failed Cases return some solution?
  • See Step 12
  • Yes: Go to Step 22
  • No: Go to Step 21
  1. Get Partial Solutions to failed cases
  • See Step 13
  • Go to Step 22
  1. Run Numerical Analysis for All Cases
  • Run the run_numerical_analysis() method for all the robustness samples. The method will accept an argument to indicate which samples should be run so that you can do them all in one method call.
  • Go to Step 23
  1. Did Numerical Analysis return any Warnings?
  • See Step 15
  • Yes: Go to Step 7
  • No: Go to Step 24
  1. Did any solver log show poor behavior?
  • Even if no issues have been identified by the diagnostics tools, there are some issues which can be difficult to detect thus we should check the solver logs again to make sure there are no indications of poor behavior.
  • Documentation for interpreting IPOPT solver logs will be made available in the future.
  • Yes: Go to Step 25
  • No: Go to Step 28
  1. Run Singular Value Decomposition (SVD) tool
  • Call the DiagonsticsToolbox.run_svd_analysis() method.
  • If we are still seeing poor solver behavior, then we need to start taking a deeper look into the model. The first step is to inspect the model Jacobian and look for problematic values.
  • Go to Step 26
  1. Did Singular Value Decomposition reveal any issues?
  • If the Singular Value Decomposition method revealed any issues then you should address these and repeat the diagnostics cycle.
  • If the cause of the poor solver behavior is still not clear, we need to look deeper again.
  • Yes: Go to Step 7
  • No: Go to Step 27
  1. Run Degeneracy Hunter Tool
  • Call the DiagonsticsToolbox.degeneracy_hunter() method.
  • This method tries to identify sets of redundant constraints (degeneracies) by solving a MIP on the model incidence matrix.
  • If any degeneracies are found, it will try to return the smallest set of constraints that contain the degeneracy. Users will then need to examine these constraints and determine a) which constraint is redundant and should be removed and b) what constraint should be added in its place.
  • If the cause of the issue is still not found, then the problem lies beyond our current capabilities to detect and users will need to diagnose the issue themselves.
  • Go to Step 7
  1. Write Robustness Tests for model
  • Once the model appears to be behaving well, users should write some tests to ensure that any future changes (either by the user of changes in the upstream tools) do not negatively impact the performance of their model.
  • Using the robustness samples and solutions gathered in Step 22, users should write some tests which repeat the model convergence analysis and compare the results to those from Step 22 to ensure that performance is not negatively impacted.
  • Go to Step 29
  1. Commit Changes and Record in Logbook
  • See Step 4
  • Go to Step 30
  1. Is the Model Finished and ready for optimization?
  • At this point, the current model has been debugged and is ready for further development or use.
  • Yes: Go to Step 32
  • No: Go to Step 31
  1. Add Model Complexity
  • Add one (or at most a few) additional constraints to increase model complexity and repeat the model diagnosis workflow.
  • Users should always try to make additions in the smallest possible increments; this might seem like a slow way to work (especially with repeating the diagnosis workflow at each step) but it will save you time at one point when you catch an error early in the development cycle before it gets buried deep inside a complex model.
  • Go to Step 4
  1. Set Up Optimization Problem
  • Once the model is ready for optimization, you can begin unfixing degrees of freedom.
  • Note that Objectives (and any associated constraints) should be added and tested in the previous steps to ensure there are no structural or numerical issues. Even though there are no degrees of freedom in a square problem, the Objective can still be evaluated and checked for issues before you move to optimization.
  • Go to Step 33
  1. Solve Model
  • Call your preferred solver to try to solve the optimization problem.
  • Go to Step 34
  1. Look at Solver Logs
  • As before, you should always fully inspect the solver logs to see if there are any signs of numerical issues.
  • Go to Step 35
  1. Was a Solution Returned?
  • Check to see if the solver returned a solution. If it did not then you need to get a partial solution to the problem before continuing.
  • Yes: Go to Step 37
  • No: Go to Step 36
  1. Get Partial Solution for Problem
  • See Step 13
  • Go to Step 38
  1. Did the solver log show poor behavior?
  • Check the solver logs for any signs of poor behavior. If there is, then these should be investigated and resolved.
  • Yes: Go to Step 38
  • No: Got to Step 40
  1. Add Current Case to Robustness Cases
  • Use model convergence analysis tools to add current model state to robustness samples.
  • If additional numerical issues have been identified, then this indicate that this point needs to be included in the robustness checks.
  • Go to Step 39
  1. Run Numerical Analysis for current case
  • Call the run_numerical_analysis() method for the current case and diagnose issue(s)
  • Go to Step 7
  1. Is the solution sane?
  • Check the solution to returned by the solver to make sure it makes sense.
  • Even more so that for square problems, an “optimal” solution does not mean a correct solution.
  • Check that all outputs make sense and that mass and energy are conserved.
  • Check to see if any decisions variables are at or near their bounds. If they are, you should ask yourself why and whether this is expected/reasonable.
  • Yes: Go to Step 41
  • No: Go to Step 7
  1. Save Solution and Record in Logbook
  • If the solution appears to be correct, you should save this solution in some form and record this in your logbook for future reference (and so that you know which git hash to go back to in case you cannot replicate this result in the future).
  • Go to Step 42
  1. Are you finished with optimization?
  • If you have finished gathering results, then you are done. Otherwise, go back and set up the next optimization case or add new complexity to the model.
  • Yes: Finished!
  • No: Go to Step 31 or Step 32
Clone this wiki locally