Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SEGV on Deception for ExaGO+IPOPT+ma57 #154

Open
2 of 13 tasks
ovasios opened this issue Aug 8, 2024 · 9 comments
Open
2 of 13 tasks

SEGV on Deception for ExaGO+IPOPT+ma57 #154

ovasios opened this issue Aug 8, 2024 · 9 comments
Labels
bug Something isn't working opflow Concerning the OPFLOW application

Comments

@ovasios
Copy link
Collaborator

ovasios commented Aug 8, 2024

Issue type

  • New feature
  • Bug
  • Discussion
  • Other

Relates to

  • OPFLOW
  • SOPFLOW
  • SCOPFLOW
  • TCOPFLOW
  • CMake build system
  • Spack configuration
  • Manual
  • Web docs
  • Other

Summary

Running OPFLOW using ma57 rather than ma27 on Deception results in SEGV. May be similar to #152.

Branch: develop
Machine: deception
Optimizer: IPOPT
System models: case_ACTIVSg25k.m, tgo30K_JUN_13_2018.m

Command:
./opflow -netfile case_ACTIVSg25k.m

Result:

[ExaGO] Creating OPFlow


******************************************************************************
This program contains Ipopt, a library for large-scale nonlinear optimization.
 Ipopt is released as open source code under the Eclipse Public License (EPL).
         For more information visit http://projects.coin-or.org/Ipopt
******************************************************************************

This is Ipopt version 3.12.10, running with linear solver ma57.

Number of nonzeros in equality constraint Jacobian...:   348438
Number of nonzeros in inequality constraint Jacobian.:   186640
Number of nonzeros in Lagrangian Hessian.............:   250917

Total number of variables............................:    57558
                     variables with only lower bounds:        0
                variables with lower and upper bounds:    32559
                     variables with only upper bounds:        0
Total number of equality constraints.................:    50000
Total number of inequality constraints...............:    46660
        inequality constraints with only lower bounds:        0
   inequality constraints with lower and upper bounds:    46660
        inequality constraints with only upper bounds:        0

iter    objective    inf_pr   inf_du lg(mu)  ||d||  lg(rg) alpha_du alpha_pr  ls
   0  5.0088244e+06 2.23e+04 3.61e+03  -1.0 0.00e+00    -  0.00e+00 0.00e+00   0
Input Error: Incorrect objective type.
Input Error: Incorrect objective type.
[0]PETSC ERROR: ------------------------------------------------------------------------
[0]PETSC ERROR: Caught signal number 11 SEGV: Segmentation Violation, probably memory access out of range
[0]PETSC ERROR: Try option -start_in_debugger or -on_error_attach_debugger
[0]PETSC ERROR: or see https://petsc.org/release/faq/#valgrind and https://petsc.org/release/faq/
[0]PETSC ERROR: configure using --with-debugging=yes, recompile, link, and run
[0]PETSC ERROR: to get more information on the crash.
[0]PETSC ERROR: Run with -malloc_debug to check if memory corruption is causing the crash.
--------------------------------------------------------------------------
MPI_ABORT was invoked on rank 0 in communicator MPI_COMM_WORLD
with errorcode 59.

NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes.
You may or may not see output from other processes, depending on
exactly when Open MPI kills them.
@cameronrutherford
Copy link
Contributor

I think this is completely distinct from #152. These are separate platforms and different runs.

cc @nychiang @pelesh @cnpetra as we have discussed offline

cc @abhyshr

@pelesh
Copy link
Collaborator

pelesh commented Aug 8, 2024

I came across similar issue. It seems as if ExaGO does not allocate sufficient space for large problems.

@ovasios, if you substitute 25k system with 10k one in the input, does your simulation run correctly?

@pelesh
Copy link
Collaborator

pelesh commented Aug 8, 2024

Also, what I noticed -- why is Ipopt reporting Input Error: Incorrect objective type.?

@pelesh
Copy link
Collaborator

pelesh commented Aug 8, 2024

I think this is completely distinct from #152. These are separate platforms and different runs.

I also think this is a different issue. @ovasios, can you backtrace the segfault?

@pelesh pelesh added bug Something isn't working opflow Concerning the OPFLOW application labels Jan 13, 2025
@pelesh
Copy link
Collaborator

pelesh commented Jan 13, 2025

@ovasios, any follow up on this issue?

@ovasios
Copy link
Collaborator Author

ovasios commented Jan 14, 2025

@pelesh:

Here is the error again:

[ExaGO] Creating OPFlow


******************************************************************************
This program contains Ipopt, a library for large-scale nonlinear optimization.
 Ipopt is released as open source code under the Eclipse Public License (EPL).
         For more information visit http://projects.coin-or.org/Ipopt
******************************************************************************

This is Ipopt version 3.12.10, running with linear solver ma57.

Number of nonzeros in equality constraint Jacobian...:   347979
Number of nonzeros in inequality constraint Jacobian.:   186638
Number of nonzeros in Lagrangian Hessian.............:   247225

 Runtime parameters:
   Objective type: Unknown!
   Coarsening type: Unknown!
   Initial partitioning type: Unknown!
   Refinement type: Unknown!
   Perform a 2-hop matching: No
   Number of balancing constraints: 1
   Number of refinement iterations: 328601616
   Random number seed: 550201216
   Number of separators: 0
   Compress graph prior to ordering: Yes
   Detect & order connected components separately: No
   Prunning factor for high degree vertices: 0.000000
   Allowed maximum load imbalance: 8232.408

Input Error: Incorrect objective type.
 nbrpool statistics
        nbrpoolsize:            0   nbrpoolcpos:            0
    nbrpoolreallocs:            0

 Runtime parameters:
   Objective type: Unknown!
   Coarsening type: Unknown!
   Initial partitioning type: Unknown!
   Refinement type: Unknown!
   Perform a 2-hop matching: Yes
   Number of balancing constraints: 1
   Number of refinement iterations: 328601616
   Random number seed: 6
   Number of separators: 0
   Compress graph prior to ordering: Yes
   Detect & order connected components separately: No
   Prunning factor for high degree vertices: 0.000000
   Allowed maximum load imbalance: 1.000

Input Error: Incorrect objective type.
 nbrpool statistics
        nbrpoolsize:            0   nbrpoolcpos:            0
    nbrpoolreallocs:            0

[0]PETSC ERROR: ------------------------------------------------------------------------
[0]PETSC ERROR: Caught signal number 11 SEGV: Segmentation Violation, probably memory access out of range
[0]PETSC ERROR: Try option -start_in_debugger or -on_error_attach_debugger
[0]PETSC ERROR: or see https://petsc.org/release/faq/#valgrind and https://petsc.org/release/faq/
[0]PETSC ERROR: configure using --with-debugging=yes, recompile, link, and run
[0]PETSC ERROR: to get more information on the crash.
[0]PETSC ERROR: Run with -malloc_debug to check if memory corruption is causing the crash.
--------------------------------------------------------------------------
MPI_ABORT was invoked on rank 0 in communicator MPI_COMM_WORLD
with errorcode 59.

NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes.
You may or may not see output from other processes, depending on
exactly when Open MPI kills them.
--------------------------------------------------------------------------

And here is the backtrace:

#0  0x00002aaaab991266 in ma57kd (n=200424, ipe=..., iw=..., lw=2258276, iwfr=1172247, perm=..., ips=..., nv=..., flag=..., ncmpa=0) at ma57/ma57d.f:3842
#1  0x00002aaaab992c84 in ma57ad (n=<optimized out>, ne=1028926, irn=..., jcn=..., lkeep=3060014, keep=..., iwork=..., icntl=..., info=..., rinfo=...) at ma57/ma57d.f:752
#2  0x00002aaacd46f1ed in Ipopt::Ma57TSolverInterface::SymbolicFactorization(int const*, int const*) ()
   from /qfs/projects/earthshot/src/deception-ci/install/linux-centos7-zen2/gcc-9.1.0/ipopt-3.12.10-zdw2vxvsvydsuwpa3uwjwdtj3unwgg2n/lib/libipopt.so.1
#3  0x00002aaacd470428 in Ipopt::Ma57TSolverInterface::InitializeStructure(int, int, int const*, int const*) ()
   from /qfs/projects/earthshot/src/deception-ci/install/linux-centos7-zen2/gcc-9.1.0/ipopt-3.12.10-zdw2vxvsvydsuwpa3uwjwdtj3unwgg2n/lib/libipopt.so.1
#4  0x00002aaacd46a2dd in Ipopt::TSymLinearSolver::InitializeStructure(Ipopt::SymMatrix const&) ()
   from /qfs/projects/earthshot/src/deception-ci/install/linux-centos7-zen2/gcc-9.1.0/ipopt-3.12.10-zdw2vxvsvydsuwpa3uwjwdtj3unwgg2n/lib/libipopt.so.1
#5  0x00002aaacd46ad15 in Ipopt::TSymLinearSolver::MultiSolve(Ipopt::SymMatrix const&, std::vector<Ipopt::SmartPtr<Ipopt::Vector const>, std::allocator<Ipopt::SmartPtr<Ipopt::Vector const> > >&, std::vector<Ipopt::SmartPtr<Ipopt::Vector>, std::allocator<Ipopt::SmartPtr<Ipopt::Vector> > >&, bool, int) ()
   from /qfs/projects/earthshot/src/deception-ci/install/linux-centos7-zen2/gcc-9.1.0/ipopt-3.12.10-zdw2vxvsvydsuwpa3uwjwdtj3unwgg2n/lib/libipopt.so.1
#6  0x00002aaacd45c554 in Ipopt::StdAugSystemSolver::MultiSolve(Ipopt::SymMatrix const*, double, Ipopt::Vector const*, double, Ipopt::Vector const*, double, Ipopt::Matrix const*, Ipopt::Vector const*, double, Ipopt::Matrix const*, Ipopt::Vector const*, double, std::vector<Ipopt::SmartPtr<Ipopt::Vector const>, std::allocator<Ipopt::SmartPtr<Ipopt::Vector const> > >&, std::vector<Ipopt::SmartPtr<Ipopt::Vector const>, std::allocator<Ipopt::SmartPtr<Ipopt::Vector const> > >&, std::vector<Ipopt::SmartPtr<Ipopt::Vector const>, std::allocator<Ipopt::SmartPtr<Ipopt::Vector const> > >&, std::vector<Ipopt::SmartPtr<Ipopt::Vector const>, std::allocator<Ipopt::SmartPtr<Ipopt::Vector const> > >&, std::vector<Ipopt::SmartPtr<Ipopt::Vector>, std::allocator<Ipopt::SmartPtr<Ipopt::Vector> > >&, std::vector<Ipopt::SmartPtr<Ipopt::Vector>, std::allocator<Ipopt::SmartPtr<Ipopt::Vector> > >&, std::vector<Ipopt::SmartPtr<Ipopt::Vector>, std::allocator<Ipopt::SmartPtr<Ipopt::Vector> > >&, std::vector<Ipopt::SmartPtr<Ipopt::Vector>, std::allocator<Ipopt::SmartPtr<Ipopt::Vector> > >&, bool, int) () from /qfs/projects/earthshot/src/deception-ci/install/linux-centos7-zen2/gcc-9.1.0/ipopt-3.12.10-zdw2vxvsvydsuwpa3uwjwdtj3unwgg2n/lib/libipopt.so.1
#7  0x00002aaacd3aeac9 in Ipopt::AugSystemSolver::Solve(Ipopt::SymMatrix const*, double, Ipopt::Vector const*, double, Ipopt::Vector const*, double, Ipopt::Matrix const*, Ipopt::Vector const*, double, Ipopt::Matrix const*, Ipopt::Vector const*, double, Ipopt::Vector const&, Ipopt::Vector const&, Ipopt::Vector const&, Ipopt::Vector const&, Ipopt::Vector&, Ipopt::Vector&, Ipopt::Vector&, Ipopt::Vector&, bool, int)
    () from /qfs/projects/earthshot/src/deception-ci/install/linux-centos7-zen2/gcc-9.1.0/ipopt-3.12.10-zdw2vxvsvydsuwpa3uwjwdtj3unwgg2n/lib/libipopt.so.1
#8  0x00002aaacd3f8732 in Ipopt::LeastSquareMultipliers::CalculateMultipliers(Ipopt::Vector&, Ipopt::Vector&) ()
   from /qfs/projects/earthshot/src/deception-ci/install/linux-centos7-zen2/gcc-9.1.0/ipopt-3.12.10-zdw2vxvsvydsuwpa3uwjwdtj3unwgg2n/lib/libipopt.so.1
#9  0x00002aaacd3bbaa9 in Ipopt::DefaultIterateInitializer::least_square_mults(Ipopt::Journalist const&, Ipopt::IpoptNLP&, Ipopt::IpoptData&, Ipopt::IpoptCalculatedQuantities&, Ipopt::SmartPtr<Ipopt::EqMultiplierCalculator> const&, double) () from /qfs/projects/earthshot/src/deception-ci/install/linux-centos7-zen2/gcc-9.1.0/ipopt-3.12.10-zdw2vxvsvydsuwpa3uwjwdtj3unwgg2n/lib/libipopt.so.1
#10 0x00002aaacd3be800 in Ipopt::DefaultIterateInitializer::SetInitialIterates() ()
   from /qfs/projects/earthshot/src/deception-ci/install/linux-centos7-zen2/gcc-9.1.0/ipopt-3.12.10-zdw2vxvsvydsuwpa3uwjwdtj3unwgg2n/lib/libipopt.so.1
#11 0x00002aaacd3cf2d6 in Ipopt::IpoptAlgorithm::InitializeIterates() ()
   from /qfs/projects/earthshot/src/deception-ci/install/linux-centos7-zen2/gcc-9.1.0/ipopt-3.12.10-zdw2vxvsvydsuwpa3uwjwdtj3unwgg2n/lib/libipopt.so.1
#12 0x00002aaacd3d36c6 in Ipopt::IpoptAlgorithm::Optimize(bool) ()
   from /qfs/projects/earthshot/src/deception-ci/install/linux-centos7-zen2/gcc-9.1.0/ipopt-3.12.10-zdw2vxvsvydsuwpa3uwjwdtj3unwgg2n/lib/libipopt.so.1
#13 0x00002aaacd35b07e in Ipopt::IpoptApplication::call_optimize() ()
   from /qfs/projects/earthshot/src/deception-ci/install/linux-centos7-zen2/gcc-9.1.0/ipopt-3.12.10-zdw2vxvsvydsuwpa3uwjwdtj3unwgg2n/lib/libipopt.so.1
#14 0x00002aaacd35d4f3 in Ipopt::IpoptApplication::OptimizeNLP(Ipopt::SmartPtr<Ipopt::NLP> const&, Ipopt::SmartPtr<Ipopt::AlgorithmBuilder>&) ()
   from /qfs/projects/earthshot/src/deception-ci/install/linux-centos7-zen2/gcc-9.1.0/ipopt-3.12.10-zdw2vxvsvydsuwpa3uwjwdtj3unwgg2n/lib/libipopt.so.1
#15 0x00002aaacd3576bb in Ipopt::IpoptApplication::OptimizeNLP(Ipopt::SmartPtr<Ipopt::NLP> const&) ()
   from /qfs/projects/earthshot/src/deception-ci/install/linux-centos7-zen2/gcc-9.1.0/ipopt-3.12.10-zdw2vxvsvydsuwpa3uwjwdtj3unwgg2n/lib/libipopt.so.1
#16 0x00002aaacd35801b in Ipopt::IpoptApplication::OptimizeTNLP(Ipopt::SmartPtr<Ipopt::TNLP> const&) ()
   from /qfs/projects/earthshot/src/deception-ci/install/linux-centos7-zen2/gcc-9.1.0/ipopt-3.12.10-zdw2vxvsvydsuwpa3uwjwdtj3unwgg2n/lib/libipopt.so.1
#17 0x00002aaacd3659ed in IpoptSolve () from /qfs/projects/earthshot/src/deception-ci/install/linux-centos7-zen2/gcc-9.1.0/ipopt-3.12.10-zdw2vxvsvydsuwpa3uwjwdtj3unwgg2n/lib/libipopt.so.1
#18 0x00000000005de9ce in OPFLOWSolverSolve_IPOPT (opflow=0x1d3b8170) at /people/vasi/exagoFolder/ExaGO/src/opflow/solver/ipopt/opflow_ipopt.cpp:391
#19 0x000000000050a985 in OPFLOWSolve (opflow=0x1d3b8170) at /people/vasi/exagoFolder/ExaGO/src/opflow/interface/opflow.cpp:2098
#20 0x00000000004e6cca in main (argc=3, argv=0x7fffffff6128) at /people/vasi/exagoFolder/ExaGO/applications/opflow_main.cpp:92

PS: Case case_ACTIVSg10k.m successfully converges with the same settings.

@abhyshr
Copy link
Collaborator

abhyshr commented Jan 14, 2025

The 25k case also converges using ma27, right?

@ovasios
Copy link
Collaborator Author

ovasios commented Jan 14, 2025

Yes, the 25k case converges using ma27.

@pelesh
Copy link
Collaborator

pelesh commented Jan 14, 2025

MA57 chokes at symbolic factorization, as far as I can tell. Are you sure MA57 is built with METIS support? In your config.log file for COINHSL, you should see lines like this:

configure:22207: checking for library Metis with combined link and compile check
configure:22357: clang -c -O2 -DNDEBUG  -I/opt/local/include conftest.c >&5
configure:22357: $? = 0
configure:22414: clang -o conftest -O2 -DNDEBUG  -I/opt/local/include  conftest.c -L/opt/local/lib -lmetis >&5
configure:22414: $? = 0
configure:22457: result: yes
configure:22521: checking whether Metis library is linkable with Fortran compiler
configure:22535: gfortran -c -O2   conftest.f >&5
configure:22535: $? = 0
configure:22537: result: yes

In your Ipopt options file, do you use any nondefault options for MA57?

CC: @cnpetra @nychiang @maksud

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working opflow Concerning the OPFLOW application
Projects
None yet
Development

No branches or pull requests

4 participants