-
Notifications
You must be signed in to change notification settings - Fork 15
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Trouble solving unsymmetric system in serial #54
Comments
Does the same happen if you don't store the matrix out of core? |
Yes it does. When it does run (on another computer) I get the following output:
|
Coming back to this, let us consider this MWE (just the readme example with a different matrix, no using MUMPS
using MPI
MPI.Init()
mumps = Mumps{Float64}(mumps_unsymmetric, default_icntl, default_cntl64) # Real, general unsymmetric
A = sprand(570000,570000,0.00019); rhs = rand(570000) # Happens on all cores
A=A+A'
A+=5I
associate_matrix!(mumps, A)
factorize!(mumps)
associate_rhs!(mumps, rhs)
solve!(mumps)
x = get_solution(mumps)
finalize(mumps)
MPI.Finalize() which upon reaching factorize! gives Entering DMUMPS 5.2.1 from C interface with JOB, N, NNZ = 4 570000 124018036
executing #MPI = 1, without OMP
=================================================
MUMPS compiled with option -Dparmetis
This MUMPS version includes code for SAVE_RESTORE
=================================================
L U Solver for unsymmetric matrices
Type of parallelism: Working host
****** ANALYSIS STEP ********
signal (11): Segmentation fault: 11
in expression starting at no file:0
dmumps_ana_gnew_ at /usr/local/Cellar/brewsci-mumps/5.2.1/lib/libdmumps.dylib (unknown line)
__dmumps_ana_aux_m_MOD_dmumps_ana_f at /usr/local/Cellar/brewsci-mumps/5.2.1/lib/libdmumps.dylib (unknown line)
dmumps_ana_driver_ at /usr/local/Cellar/brewsci-mumps/5.2.1/lib/libdmumps.dylib (unknown line)
dmumps_ at /usr/local/Cellar/brewsci-mumps/5.2.1/lib/libdmumps.dylib (unknown line)
dmumps_f77_ at /usr/local/Cellar/brewsci-mumps/5.2.1/lib/libdmumps.dylib (unknown line)
dmumps_c at /usr/local/Cellar/brewsci-mumps/5.2.1/lib/libdmumps.dylib (unknown line)
macro expansion at /Volumes/rgates/.julia/dev/MUMPS/src/MUMPS.jl:23 [inlined]
factorize! at /Volumes/rgates/.julia/dev/MUMPS/src/MUMPS_lib.jl:75
jl_fptr_trampoline at /Users/osx/buildbot/slave/package_osx64/build/src/gf.c:1829
do_call at /Users/osx/buildbot/slave/package_osx64/build/src/interpreter.c:324
eval_stmt_value at /Users/osx/buildbot/slave/package_osx64/build/src/interpreter.c:363 [inlined]
eval_body at /Users/osx/buildbot/slave/package_osx64/build/src/interpreter.c:686
jl_interpret_toplevel_thunk_callback at /Users/osx/buildbot/slave/package_osx64/build/src/interpreter.c:799
unknown function (ip: 0xfffffffffffffffe)
unknown function (ip: 0x11a4d235f)
unknown function (ip: 0xffffffffffffffff)
jl_interpret_toplevel_thunk at /Users/osx/buildbot/slave/package_osx64/build/src/interpreter.c:808
jl_toplevel_eval_flex at /Users/osx/buildbot/slave/package_osx64/build/src/toplevel.c:831
jl_toplevel_eval_in at /Users/osx/buildbot/slave/package_osx64/build/src/builtins.c:633
eval at ./boot.jl:319
eval_user_input at /Users/osx/buildbot/slave/package_osx64/build/usr/share/julia/stdlib/v0.7/REPL/src/REPL.jl:85
macro expansion at /Users/osx/buildbot/slave/package_osx64/build/usr/share/julia/stdlib/v0.7/REPL/src/REPL.jl:117 [inlined]
#28 at ./task.jl:262
jl_apply at /Users/osx/buildbot/slave/package_osx64/build/src/./julia.h:1538 [inlined]
start_task at /Users/osx/buildbot/slave/package_osx64/build/src/task.c:268
Allocations: 9498111 (Pool: 9496305; Big: 1806); GC: 29
[Coruscant:03325] *** Process received signal ***
[Coruscant:03325] Signal: Segmentation fault: 11 (11)
[Coruscant:03325] Signal code: Address not mapped (1)
[Coruscant:03325] Failing at address: 0x1437bd000
[Coruscant:03325] [ 0] 0 libsystem_platform.dylib 0x00007fff5d233f5a _sigtramp + 26
[Coruscant:03325] *** End of error message ***
Segmentation fault: 11 Can you reproduce the issue? Any ideas? |
Yes, we also had the symmetric case work, however we also got that to fail by playing with the sizes. MUMPS3.jl solved the system reliably, so we assume it is an interface problem...
… On 28. Nov 2019, at 11:53, Max Bittens ***@***.***> wrote:
I am able to reproduce that error on my macbook pro with mac os mojave 10.14.5, DMUMPS 5.2.1 and julia 0.7.0 installed.
Anyway the error vanishes and MUMPS does return a solution vector, if i change julia mumps_unsymmetric to julia mumps_symmetric.
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub, or unsubscribe.
|
I'll try to look into the merger with mumps3.jl in the next few days. |
Are you using the same MUMPS library with this package and MUMPS3.jl? |
Yes, installed brewsci-mumps 5.2.1 via homebrew. Both packages reference the same installation.
… On 28. Nov 2019, at 18:56, Dominique ***@***.***> wrote:
Are you using the same MUMPS library with this package and MUMPS3.jl?
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub, or unsubscribe.
|
I have some trouble solving a moderately large numerically unsymmetric, structurally system without mpirun on OS X. The solve is part of an iterative procedure which works fine most of the time but fails undeterministically (when exactly it fails depends on whether the compiled julia code changes). Before the problem occurs, I check whether
A*b
can be computed, in order to verify that the matrix is structurally sound. Then, after entering the MUMPS analysis step, MUMPS reports zero density and zero structural symmetry. On some computers it doesn't even get that far and segfaults immediately, on others it continues, reporting a MUMPS error with INFO(1) = 1, reporting the number of nonzeros correctly. According to the MUMPS manual, this implies that the I, J - indices are out of range. However, if the matrix were broken, I would assumeA*b
to fail. Here is my solver code:The segmentation fault looks like this:
All those Barriers were inserted in an attempt to fix the problem. Any ideas? My next step would be to extract the matrix and rhs, save them to HDF5 and to do tests on this matrix outside the main code, perhaps producing an MWE. However, I thought the error might be obvious, and therefore decided to ask first.
The text was updated successfully, but these errors were encountered: