Skip to content

Diagnostics Working Group

Dan Gunter edited this page Sep 12, 2023 · 30 revisions

Diagnostics Working Group

  • Andrew Lee
  • John Siirola
  • Dan Gunter
  • Sarah Poon
  • Robby Parker
  • Michael Bynum
  • Bethany Nicholson
  • Soraya Rawlings
  • Ben Knueven
  • Adam Atia
  • Carl Laird
  • Brandon Paul

Documents

Interview takeaways (google slides)

Meeting Agenda and Notes

12 September 2023

Attending: Andrew, Bethany, Adam A., Soraya, Miranda, Brandon, Ben K., Michael Bynum

23rd May 2023

  • What do we do with all of these notes, and how do we aggregate them?
    • Start with the takeaways
    • Do we have people with the time to review the interview notes?
      • Bethany, Adam, Dan
      • Recordings are currently hosted on Drive
      • Suggestions: email notes, transcripts to reviewer on a case-by-case basis
  • Need a high level summary of results
    • A list of names of what needs to be done, but not the details/how
    • List of common issues that arise with number of times they come up in interviews. Also link these to which interviews they came up in.
  • Robby and Ben K. do not have access to Box
    • Get them access, and then work from Box for all editing.

25th April 2023

  • Include a suggestion to prepare any examples a user might have and want to show during interviews, incl. sharing screen if they want - emphasize that this is optional, but nice if they have something ready.

  • Need to remember to request permission to record in advance.

  • How to generate transcripts - can generate them from the Zoom recording, but not linked to recording (i.e. no timestamp for the comment).

  • Distill the results before sending out the draft workflow to interviewees.

  • Draft of email for next meeting.

  • Initial interviews:

    • First test interviewees: Robby, Hunter Barber, John Eslick (if needed)
    • Who should be the interviewers: Sarah, Andrew, Dan
    • Doodle poll between Sarah, Dan and Andrew to find candidate times, then reach out to targets.
  • Hold next meeting once first interviews are done - Andrew to cancel meetings once we know when this will be.

17th April 2023 – First Working Group Meeting

Attendance: Andrew, Bethany, Brandon, Robby, Soraya, Adam, Sarah, John S., Dan, Ben

  1. What do we mean by model diagnostics/troubleshooting?

    1. How do I know if I might have a problem (what symptoms to look for)?

      • Optimal solution does not guarantee a correct solution; some problems can still give “optimal” solutions.
      • This should include model construction errors (Pyomo and IDAES exceptions) and warning messages (loggers).
      • Problem (e.g., MIP vs. NLP) and solver specific symptoms
      • Python specific issues, notably combining packages.
      • At what point does the user reach out for help (i.e., when do they assume it is a bug and not user error). Be more intentional about warnings/error where users should report issues.
      • Will need to prioritize – this is too much to address in one go
    2. How do I narrow down possible causes of problem (common causes of symptoms)?

    3. What tools should I use to identify root cause of problems (tools to use)?

    4. How do I fix a given problem once I have found it?

  2. What does the ultimate end goal/result look like?

    • End of EY23:

      • documented general workflow for diagnostics

        • Should do some research on the “right” way to do this
        • How do you document troubleshooting?
        • Examples of good troubleshooting: git?
        • Example of good troubleshooting: I ordered a refurbished cell phone recently and it came with a "receipt" of diagnostic results, basically just a laundry list of the tests that were run and their status, e.g. "Wifi: Successful, Speaker: Successful, ..." -Robby
        • Example of good troubleshooting: Diagnostics that a technician runs on a car battery. -Robby
      • an initial diagnostics toolbox to support workflow

    • Scope is very broad, need to identify and prioritize areas to focus on

      • Documentation

        • A broad workflow is probably less useful than targeted solutions to common problems.
      • Examples

        • Good Practices for New Users
      • API and UI

      • Exceptions and warnings

    • What level of user should we target?

      • Computerize John Eslick
      • Start with “most advanced users who struggle”, then slide downwards
      • Need to test solution – target user and target mentor for each item/tool
        • Each test subject becomes the next mentor
  3. How to go about building workflow and toolbox?

    • Large group of experienced users, but no centralized collection of knowledge
      • How do we go about collecting knowledge from these users?
      • Surveys, interviews?
      • What questions should we be asking?
        • Suggestion: develop an initial draft workflow to walk through with users to prompt conversation.
        • What questions to start with (start broad)?
  4. Future meetings

    • Cadence, agenda
    • Move agenda and draft workflows to IDAES wiki

Draft Interview Procedure

Draft Email

Subject: Invitation to provide input on IDAES diagnostics tools

Dear [Insert Name],

As you may be aware, this year the IDAES development team is focusing on developing an example workflow for diagnosing model issues and improving the associated diagnostics tools within IDAES. As part of this effort, we would like to draw on knowledge and experience of our user base in order to identify the common workflows and areas of improvement required.

As an experienced user of the IDAES tools, we would like to interview you about your experiences troubleshooting issues with IDAES (and Pyomo) models. We would welcome any examples or case studies you think would be valuable to share, however do not feel that you need to prepare anything for this interview. Please note that we plan to record these interviews if possible for future reference, so please do not include any proprietary examples. A list of interview questions will be provided in advance.

We would appreciate it if you could fill out the Doodle poll below indicating your availability to talk to the diagnostics team:

[Inset Doodle Link]

Regards,

Andrew Lee on behalf of the IDAES diagnostics team

Questions

  1. Do we have permission to record this meeting?
  2. What level of user would you consider yourself? Do other users come to you for help on troubleshooting?
  3. What problems have you encountered when building and solving IDAES/Pyomo models (or have been asked to help troubleshoot)?
  4. What are the most common issues you encounter (or get asked for help with)?
  5. How did you know you had a problem?
  6. How did you work out where to start with finding and solving the problem (is it construction, initialization, final solve)?
  7. Did you manage to resolve the problem? If not, where did you get stuck?
  8. What steps did you take to resolve the problem?
  9. What were the resources and tools that you used to diagnose and resolve the problem?
    1. Where did you find the tool?
    2. How did you learn to use the tool?
    3. How easy was the tool to use?
    4. How easy was it to interpret the tool output?
    5. Any other comments on the tool?
    6. Any suggestions on how the documentation of the tool could be improved?
  10. Are there any things you would like to have, or have written yourself, to display or visualize information to help diagnose problems?"
  11. When you do get a feasible/optimal solution, what do you do to verify/validate it?
  12. What do you do to avoid problems occurring in the first place?
  13. What would have helped you most as a new user learning how to debug and troubleshoot IDAES models (and EO models in general)?
  14. What do you think are the most confusing/difficult aspects of building and debugging IDAES models (and EO models in general)?

Draft Workflow

Key Questions a User Asks

  1. What are the things (symptoms) I should be looking for to know if I might have an issue?
  2. I am seeing [Symptom X], how do I determine what type of issue I might have?
  3. I think I have [Issue Y], what tool should I use to confirm this and trace the root cause?
  4. I have found the root cause of [Issue Y], what do I do to fix it?

Troubleshooting Workflow/Checklist

  • Assessing the outcome of a run should always start at the very beginning of the output. The final message is important, but so is everything that leads up to it.
  • This workflow should be applied even if the model appears to solve – getting “Optimal Solution Found” does not guarantee the result is correct, and there maybe warning signs of underlying issues that should be addressed.
  • For each symptom/check in this workflow, we will eventually want to have some guidance on what tools to apply and further documentation/examples on how to address these. We won't be able to address all of these in the first year, but we can hopefully build up a list over time starting with the most common issues.
  1. Model Construction

    1. Did an error/exception occur?
      • Look at the error type and output message – these tell you where to start.
      • Look at the full traceback – this shows the series of events leading up to the error. When reporting an error, always include the full traceback – you might not be able to understand all of it will help others trace the issue.
      • Think carefully about what the error message is telling you, and try to set aside what you think is happening. The error message often tells you exactly what went wrong at the given step; you do not need to work out WHAT happened but WHY it happened.
    2. Are there any warning messages? IDAES and Pyomo have logger systems that report potential problems during model construction.
      • You might be inclined to ignore these (and in some cases these might get overwhelming), but they are telling you about potential issues you might need to fix.
      • You can control the level of output using the logger tools to show more or less information.
  2. Initialization

    • Inspect the initialization output, looking for warnings or failures. Output level can be controlled by the logger.
    • IDAES tries to check for obvious issues and raise an exception if something is found, but older/user models may not be as thorough.
    • If a step fails during initialization, it is generally a sign that you need to look at what is happening, even if the routine appears to recover. you should not assume recovery means the result is correct; always check to be sure there is no underlying error first.
    • You should always check for degrees of freedom during initialization; never initialize a model with degrees of freedom, as you cannot guarantee the result. IDAES generally does this for you, but you should double check.
    • If an initialization step fails, isolate the step and look at it in more detail; each initialization step is essentially a sub-problem solve (see below for debugging solve steps).
  3. Solver Calls

    1. Check the final solver state

    2. Check degrees of freedom, number of variables, and number of constraints

    3. Check the solver options

      • Especially the linear solver being used for IPOPT
    4. Check solver logs (solver specific)

      • e.g., for IPOPT look for larger than expected number of iterations, iterations with regularization (especially the final step) and restoration, small values for alphas and large number of line searches, solver reported scaling.
      • Need to have sub-checklists for each solver we use
    5. Check Model Results

      • Results should always be sanity checked, even if the solver reports an optimal solution.
      • Results can also help diagnose issues in the case of a solver failure
      1. Are flows, concentrations, temperatures, and pressures in expected ranges?
      2. Are key performance variables (e.g., conversion/recovery/selectivity/purity) in the expected range?
      3. Do states trend in the right direction (e.g., compressors show a pressure increase)?
      4. Is mass and energy conserved? Energy in particular can be hard to verify, but this is a key check – even if each unit model conserves mass and energy in isolation, it is possible that the flowsheet is degenerate.
      5. Are any state, design or rating variables close to their bounds? If so, make sure these are expected.
      6. In dynamic or spatially discretized models, is the trend of states over the domain smooth and continuous? Are there any unexpected deviations or oscillations?
      7. In optimization cases, ensure that optimized values are sane – e.g., look for extreme values (higher or low).

Interview List

Should consider prioritizing

Third Round:

  • Xiangyu Yin

Possible Others:

  • Mayo
  • Xiangyu
  • Jason Sherman
  • Radhakrishna
  • Alejandro
  • Anca
  • Anuja
  • Jinliang
  • Daison
  • Markus Droeuven
  • Tim Batholomew

Scheduled:

  • Ben K.
  • Ilayda
  • Miguel Zamarripa
  • Chenyu Wang
  • Marcus Holly

Completed:

  • Robby Parker (reviewed by Bethany)
  • Hunter Barber
  • Kanishka Ghosh
  • John Eslick
  • Alex Dudchenko
  • Alex Dowling
  • Damien Agi
  • Doug Allan
Clone this wiki locally