Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix _masterFunc2 fail flag caching and add fail flag identification to IPOPT #407

Merged
merged 15 commits into from
Jun 23, 2024

Conversation

eytanadler
Copy link
Contributor

@eytanadler eytanadler commented Jun 21, 2024

Purpose

Some optimizers don't simultaneously call _masterFunc with the objective and constraint evaluation. In these cases, if the primal fails, the fail flag will only return the accurate value when the cache is not used (the case that actually calls the objective/constraint function). When the cache is used, the fail flag is always returned as 0. A similar behavior occurs with the gradient evaluation. This PR caches the fail flag to fix this edge case.

Secondly, the pyIPOPT wrapper does not do anything if the _masterFunc returns a True fail flag (1). Even if any of the functions fail, IPOPT will just push on as if nothing happened. A CFD-based optimization case I saw with this was that it would get a mesh failure, immediately evaluate the gradients, go straight back to a mesh failure, and so on. The output file was as if nothing had failed and everything was normal. This is clearly not what should happen. I solved it by returning np.array(np.NaN) in the callback functions if fail == 1.

There are two changes in particular that I'd like feedback on:

  1. There's a certain rare logic branch: the gradients are evaluated for a design variable vector at which the primal has not been evaluated. In this case, _masterFunc2 calls itself to evaluate the primal first. In the current pyOptSparse implementation, if this primal fails, that failure is totally ignored. I modified it so that if the primal fails in this case, the gradient _masterFunc2 evaluation will return that failure value, even if the gradient evaluation succeeds. See lines 452--458 and 511-517 of the new code. Is this the right thing to do?
  2. The only way I could figure out to tell IPOPT that the evaluation failed is to return np.array(np.NaN) in whatever callback function is called. I don't see any more elegant way based on their docs. Although the shape isn't guaranteed to match the function's usual array shape, it appears to work based on my test cases. Does this seem like a reasonable way to solve this problem?

P.S. should I update the patch version?

Expected time until merged

A week

Type of change

  • Bugfix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (non-backwards-compatible fix or feature)
  • Code style update (formatting, renaming)
  • Refactoring (no functional changes, no API changes)
  • Documentation update
  • Maintenance update
  • Other (please describe)

Testing

Run the tests I added. Also try running a simple IPOPT optimization case where some of the functions periodically fail. It should backtrack properly (it would just ignore the fail flag previously). For example:

import numpy as np
from pyoptsparse import Optimization, OPT

iters = 0
def objfunc(xdict):
    x = xdict["xvars"]
    funcs = {"obj": x[0]**2 + np.sin(x[0] - 0.5)}

    # Fail every fourth iteration
    global iters
    fail = iters % 4 == 2
    iters += 1

    return funcs, fail

def sensfunc(xdict, funcsDict):
    x = xdict["xvars"]  # Extract array
    funcsSens = {"obj": {"xvars": [2 * x[0] + np.cos(x[0] - 0.5)]}}
    fail = False
    return funcsSens, fail

# Instantiate Optimization Problem
optProb = Optimization("Optimization problem", objfunc)
optProb.addVarGroup("xvars", 1, "c", value=3, scale=1.0)
optProb.addObj("obj")

# Create optimizer
opt = OPT("IPOPT", options={})
sol = opt(optProb, sens=sensfunc)

The IPOPT output should look something like this (note the alpha cutback warnings, which do not appear with the current implementation):

iter    objective    inf_pr   inf_du lg(mu)  ||d||  lg(rg) alpha_du alpha_pr  ls
   0  9.5984721e+00 0.00e+00 5.20e+00   0.0 0.00e+00    -  0.00e+00 0.00e+00   0
   1  4.4065559e+00 0.00e+00 5.30e+00 -11.0 5.20e+00    -  1.00e+00 1.00e+00f  1
Warning: Cutting back alpha due to evaluation error
   2 -1.9724310e-01 0.00e+00 1.59e+00 -11.0 2.62e+00    -  1.00e+00 5.00e-01f  2
   3 -6.2890484e-01 0.00e+00 3.02e-02 -11.0 5.62e-01    -  1.00e+00 1.00e+00f  1
   4 -6.2907135e-01 0.00e+00 1.51e-03 -11.0 1.05e-02    -  1.00e+00 1.00e+00f  1
Warning: Cutting back alpha due to evaluation error
   5 -6.2907166e-01 0.00e+00 7.55e-04 -11.0 5.53e-04    -  1.00e+00 5.00e-01f  2
   6 -6.2907176e-01 0.00e+00 5.10e-08 -11.0 2.76e-04    -  1.00e+00 1.00e+00f  1
   7 -6.2907176e-01 0.00e+00 1.72e-12 -11.0 1.86e-08    -  1.00e+00 1.00e+00f  1

Checklist

  • I have run flake8 and black to make sure the Python code adheres to PEP-8 and is consistently formatted
  • I have formatted the Fortran code with fprettify or C/C++ code with clang-format as applicable
  • I have run unit and regression tests which pass locally with my changes
  • I have added new tests that prove my fix is effective or that my feature works
  • I have added necessary documentation

@eytanadler eytanadler added the bug Something isn't working label Jun 21, 2024
@eytanadler eytanadler requested a review from a team as a code owner June 21, 2024 01:52
@eytanadler eytanadler requested review from lamkina, ArshSaja, ewu63 and marcomangano and removed request for lamkina and ArshSaja June 21, 2024 01:52
Copy link

codecov bot commented Jun 21, 2024

Codecov Report

Attention: Patch coverage is 84.00000% with 4 lines in your changes missing coverage. Please review.

Project coverage is 62.97%. Comparing base (da0077a) to head (fc15ad5).
Report is 4 commits behind head on main.

Files with missing lines Patch % Lines
pyoptsparse/pyIPOPT/pyIPOPT.py 66.66% 4 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff             @@
##             main     #407       +/-   ##
===========================================
- Coverage   74.33%   62.97%   -11.36%     
===========================================
  Files          22       22               
  Lines        3300     3317       +17     
===========================================
- Hits         2453     2089      -364     
- Misses        847     1228      +381     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@eytanadler
Copy link
Contributor Author

Does the Windows action failure have something to do with my changes? It doesn't seem like it, but I'm not very familiar with it

Copy link
Collaborator

@ewu63 ewu63 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good stuff. I had known that the fail flag is not passed to IPOPT correctly (in fact I think SNOPT is the only one that supports it). From my cursory read of the docs passing NaN seems reasonable (and is in fact how many other optimizers prefer to handle failed evals.
As to the implementation, it looks good, I just left a minor comment if we can deal with the cache once instead of running the same line in multiple blocks.

As for the Windows failure, I suspect it's related to the recent release of numpy 2.0 (which broke our meson build in multiple ways, I haven't had time to investigate). See here for a similar log. I would suggest pinning numpy<2 in the env file first.

pyoptsparse/pyOpt_optimizer.py Show resolved Hide resolved
tests/test_optimizer.py Show resolved Hide resolved
pyoptsparse/pyIPOPT/pyIPOPT.py Show resolved Hide resolved
pyoptsparse/pyIPOPT/pyIPOPT.py Show resolved Hide resolved
pyoptsparse/pyOpt_optimizer.py Show resolved Hide resolved
@eytanadler
Copy link
Contributor Author

Good stuff. I had known that the fail flag is not passed to IPOPT correctly (in fact I think SNOPT is the only one that supports it). From my cursory read of the docs passing NaN seems reasonable (and is in fact how many other optimizers prefer to handle failed evals. As to the implementation, it looks good, I just left a minor comment if we can deal with the cache once instead of running the same line in multiple blocks.

As for the Windows failure, I suspect it's related to the recent release of numpy 2.0 (which broke our meson build in multiple ways, I haven't had time to investigate). See here for a similar log. I would suggest pinning numpy<2 in the env file first.

Thanks for the speedy review! I pinned the version of NumPy in the setup.py (I assume this is what you meant by env file?) to <2, but the Windows action still seems to fail. I also bumped pyOptSparse's patch version because I think this is a substantial enough change that it'd be worth versioning it. @ewu63, let me know if you think otherwise.

@eytanadler eytanadler requested a review from ewu63 June 21, 2024 11:22
@ewu63
Copy link
Collaborator

ewu63 commented Jun 21, 2024

Thanks for the speedy review! I pinned the version of NumPy in the setup.py (I assume this is what you meant by env file?) to <2, but the Windows action still seems to fail. I also bumped pyOptSparse's patch version because I think this is a substantial enough change that it'd be worth versioning it. @ewu63, let me know if you think otherwise.

Windows builds do not use setuptools but instead uses conda. The env file used by GHA is here. Also fine with the patch version bump.

@eytanadler
Copy link
Contributor Author

Thanks for the speedy review! I pinned the version of NumPy in the setup.py (I assume this is what you meant by env file?) to <2, but the Windows action still seems to fail. I also bumped pyOptSparse's patch version because I think this is a substantial enough change that it'd be worth versioning it. @ewu63, let me know if you think otherwise.

Windows builds do not use setuptools but instead uses conda. The env file used by GHA is here. Also fine with the patch version bump.

Yay it worked!

Copy link
Collaborator

@ewu63 ewu63 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good work!

Copy link
Contributor

@marcomangano marcomangano left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking great! Thanks for adding the tests, you went beyond what I had in mind. Very elegant setup.

We should double check failure handling with other optimizers and maybe add a test, but that should be a separate PR. The changes in pyIPOPT make sense.

@marcomangano marcomangano merged commit 7376d71 into mdolab:main Jun 23, 2024
12 of 13 checks passed
@eytanadler eytanadler deleted the fix_fail branch June 23, 2024 18:58
@eytanadler
Copy link
Contributor Author

Looking great! Thanks for adding the tests, you went beyond what I had in mind. Very elegant setup.

We should double check failure handling with other optimizers and maybe add a test, but that should be a separate PR. The changes in pyIPOPT make sense.

Thanks! I agree failure handling should be checked, but I didn’t have an immediate good idea for how to test that. I’m sure there’s a way.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants