Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multiple runs using the boutpp module #2855

Open
totork opened this issue Jan 30, 2024 · 3 comments
Open

Multiple runs using the boutpp module #2855

totork opened this issue Jan 30, 2024 · 3 comments

Comments

@totork
Copy link

totork commented Jan 30, 2024

If create a PhysicsModel using boutpp, the module does not allow you to run mutiple objects of the class.

The first run is working. If you create a second object with the same attributes, the code is crashing with the only error being


Fatal Python error: Aborted

Thread 0x00007f3278aba700 (most recent call first):
File "/home/IPP-HGW/toto/anaconda3/envs/boutpp/lib/python3.10/site-packages/zmq/utils/garbage.py", line 47 in run
File "/home/IPP-HGW/toto/anaconda3/envs/boutpp/lib/python3.10/threading.py", line 1016 in _bootstrap_inner
File "/home/IPP-HGW/toto/anaconda3/envs/boutpp/lib/python3.10/threading.py", line 973 in _bootstrap

Main thread:
Current thread 0x00007f34ee6af4c0 (most recent call first):
File "/tmp/ipykernel_1206591/1743057888.py", line 1 in

Kernel wird neu gestartet...


Resolving the same object twice does not work either. @dschwoerer

@dschwoerer
Copy link
Contributor

Here is an error I get with a new model:

  what():  Permission denied
file: ../../cxx4/ncFile.cpp  line:94
[raven02:110478] *** Process received signal ***
[raven02:110478] Signal: Aborted (6)
[raven02:110478] Signal code:  (-6)
[raven02:110478] [ 0] /lib64/libc.so.6(+0x40710)[0x14caf8924710]
[raven02:110478] [ 1] /lib64/libc.so.6(+0x98184)[0x14caf897c184]
[raven02:110478] [ 2] /lib64/libc.so.6(gsignal+0x1e)[0x14caf892465e]
[raven02:110478] [ 3] /lib64/libc.so.6(abort+0xdf)[0x14caf890c902]
[raven02:110478] [ 4] /lib64/libstdc++.so.6(+0xa5d79)[0x14caf737cd79]
[raven02:110478] [ 5] /lib64/libstdc++.so.6(+0xb7bdc)[0x14caf738ebdc]
[raven02:110478] [ 6] /lib64/libstdc++.so.6(_ZSt10unexpectedv+0x0)[0x14caf737c921]
[raven02:110478] [ 7] /lib64/libstdc++.so.6(+0xb7e68)[0x14caf738ee68]
[raven02:110478] [ 8] /lib64/libnetcdf_c++4.so.1(+0x1db48)[0x14caf7a3bb48]
[raven02:110478] [ 9] /lib64/libnetcdf_c++4.so.1(_ZN6netCDF6NcFile4openERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEENS0_8FileModeE+0xe4)[0x14caf7a43bf4]
[raven02:110478] [10] /lib64/libnetcdf_c++4.so.1(_ZN6netCDF6NcFileC2ERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEENS0_8FileModeE+0x3b)[0x14caf7a43f6b]
[raven02:110478] [11] /u/dave/soft/BOUT-dev/v4.4.0/build/lib/libbout++.so.5.1.0(_ZN4bout13OptionsNetCDF5writeERK7OptionsRKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEE+0x2ac)[0x14caf820d9dc]
[raven02:110478] [12] /u/dave/soft/BOUT-dev/v4.4.0/build/lib/libbout++.so.5.1.0(_ZN12PhysicsModel8postInitEb+0x310)[0x14caf8192dd0]
[raven02:110478] [13] /u/dave/soft/BOUT-dev/v4.4.0/build/lib/libbout++.so.5.1.0(_ZN12PhysicsModel10initialiseEP6Solver+0xd5)[0x14caf8191d95]
[raven02:110478] [14] /u/dave/soft/BOUT-dev/v4.4.0/build/lib/libbout++.so.5.1.0(_ZN6Solver8setModelEP12PhysicsModel+0x34)[0x14caf81c3774]
[raven02:110478] [15] /u/dave/soft/BOUT-dev/v4.4.0/build/tools/pylib/boutpp/libboutpp.cpython-312-x86_64-linux-gnu.so(+0x5fab1)[0x14caf838eab1]
[raven02:110478] [16] /u/dave/soft/BOUT-dev/v4.4.0/build/tools/pylib/boutpp/libboutpp.cpython-312-x86_64-linux-gnu.so(+0xa1bb8)[0x14caf83d0bb8]
[raven02:110478] [17] /lib64/libpython3.12.so.1.0(PyObject_Vectorcall+0x5c)[0x14caf8cdbd0c]
[raven02:110478] [18] /lib64/libpython3.12.so.1.0(+0x10f78a)[0x14caf8be078a]
[raven02:110478] [19] /lib64/libpython3.12.so.1.0(PyEval_EvalCode+0xb6)[0x14caf8d5c246]
[raven02:110478] [20] /lib64/libpython3.12.so.1.0(+0x2ade7a)[0x14caf8d7ee7a]
[raven02:110478] [21] /lib64/libpython3.12.so.1.0(+0x2a8d8e)[0x14caf8d79d8e]
[raven02:110478] [22] /lib64/libpython3.12.so.1.0(+0x2c98d3)[0x14caf8d9a8d3]
[raven02:110478] [23] /lib64/libpython3.12.so.1.0(_PyRun_SimpleFileObject+0x1ca)[0x14caf8d99c1a]
[raven02:110478] [24] /lib64/libpython3.12.so.1.0(_PyRun_AnyFileObject+0x4f)[0x14caf8d999af]
[raven02:110478] [25] /lib64/libpython3.12.so.1.0(Py_RunMain+0x352)[0x14caf8d8a752]
[raven02:110478] [26] /lib64/libpython3.12.so.1.0(Py_BytesMain+0x3c)[0x14caf8d4603c]
[raven02:110478] [27] /lib64/libc.so.6(+0x2a088)[0x14caf890e088]
[raven02:110478] [28] /lib64/libc.so.6(__libc_start_main+0x8b)[0x14caf890e14b]
[raven02:110478] [29] python3(_start+0x25)[0x55ee2a2b2095]

As a workaround, this seems to work for me:

model = MyModel()
model.solve()
model = MyModel()
gc.collect()
model.solve()

The gc.collect() is unfortunately needed, due to the re-use of the dump files, to ensure that the files are closed before they are opened again.

Concerning calling solve() again:

model.solve()
model.solve()

This fails with:

Run time : 1 s
Solver running for 10 outputs with output timestep of 1.000000e-01
terminate called after throwing an instance of 'BoutException'
  what():  ERROR: Solver is already initialised

[raven02:129947] *** Process received signal ***
[raven02:129947] Signal: Aborted (6)
[raven02:129947] Signal code:  (-6)
[raven02:129947] [ 0] /lib64/libc.so.6(+0x40710)[0x14fd82c89710]
[raven02:129947] [ 1] /lib64/libc.so.6(+0x98184)[0x14fd82ce1184]
[raven02:129947] [ 2] /lib64/libc.so.6(gsignal+0x1e)[0x14fd82c8965e]
[raven02:129947] [ 3] /lib64/libc.so.6(abort+0xdf)[0x14fd82c71902]
[raven02:129947] [ 4] /lib64/libstdc++.so.6(+0xa5d79)[0x14fd816e1d79]
[raven02:129947] [ 5] /lib64/libstdc++.so.6(+0xb7bdc)[0x14fd816f3bdc]
[raven02:129947] [ 6] /lib64/libstdc++.so.6(_ZSt10unexpectedv+0x0)[0x14fd816e1921]
[raven02:129947] [ 7] /lib64/libstdc++.so.6(+0xb7e68)[0x14fd816f3e68]
[raven02:129947] [ 8] /u/dave/soft/BOUT-dev/v4.4.0/build/lib/libbout++.so.5.1.0(+0x1236e9)[0x14fd822d06e9]
[raven02:129947] [ 9] /u/dave/soft/BOUT-dev/v4.4.0/build/lib/libbout++.so.5.1.0(_ZN11PvodeSolver4initEv+0x1bf)[0x14fd82511e9f]
[raven02:129947] [10] /u/dave/soft/BOUT-dev/v4.4.0/build/lib/libbout++.so.5.1.0(_ZN6Solver5solveEid+0xab)[0x14fd8253547b]
[raven02:129947] [11] /u/dave/soft/BOUT-dev/v4.4.0/build/tools/pylib/boutpp/libboutpp.cpython-312-x86_64-linux-gnu.so(+0x5fa2e)[0x14fd826f3a2e]
[raven02:129947] [12] /u/dave/soft/BOUT-dev/v4.4.0/build/tools/pylib/boutpp/libboutpp.cpython-312-x86_64-linux-gnu.so(+0xa1bb8)[0x14fd82735bb8]
[raven02:129947] [13] /lib64/libpython3.12.so.1.0(PyObject_Vectorcall+0x5c)[0x14fd83040d0c]
[raven02:129947] [14] /lib64/libpython3.12.so.1.0(+0x10f78a)[0x14fd82f4578a]
[raven02:129947] [15] /lib64/libpython3.12.so.1.0(PyEval_EvalCode+0xb6)[0x14fd830c1246]
[raven02:129947] [16] /lib64/libpython3.12.so.1.0(+0x2ade7a)[0x14fd830e3e7a]
[raven02:129947] [17] /lib64/libpython3.12.so.1.0(+0x2a8d8e)[0x14fd830ded8e]
[raven02:129947] [18] /lib64/libpython3.12.so.1.0(+0x2c98d3)[0x14fd830ff8d3]
[raven02:129947] [19] /lib64/libpython3.12.so.1.0(_PyRun_SimpleFileObject+0x1ca)[0x14fd830fec1a]
[raven02:129947] [20] /lib64/libpython3.12.so.1.0(_PyRun_AnyFileObject+0x4f)[0x14fd830fe9af]
[raven02:129947] [21] /lib64/libpython3.12.so.1.0(Py_RunMain+0x352)[0x14fd830ef752]
[raven02:129947] [22] /lib64/libpython3.12.so.1.0(Py_BytesMain+0x3c)[0x14fd830ab03c]
[raven02:129947] [23] /lib64/libc.so.6(+0x2a088)[0x14fd82c73088]
[raven02:129947] [24] /lib64/libc.so.6(__libc_start_main+0x8b)[0x14fd82c7314b]
[raven02:129947] [25] python3(_start+0x25)[0x559fa8844095]

I guess that should be fixable in bout++, to allow calling solve again.
Not sure what the best way forward is though, as changing that might break the code in subtle ways, as the assumption was that BOUT++ is used as a one-shot run, and then everything gets deleted.

Is the workaround for now sufficient?
How much time do you spend in the setup of the simulation, and how much actually running it?

@dschwoerer
Copy link
Contributor

gc.collect(0) seems to be sufficient to actually delete the just-freed model in my case.

@totork
Copy link
Author

totork commented Jan 30, 2024

Is the workaround for now sufficient?
How much time do you spend in the setup of the simulation, and how much actually running it?

It works like this yes. The setup is not long, most of the time is spend running.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants