Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

C stress test failed #149

Closed
zaikunzhang opened this issue Jan 29, 2024 · 5 comments · Fixed by #180
Closed

C stress test failed #149

zaikunzhang opened this issue Jan 29, 2024 · 5 comments · Fixed by #180
Labels
c Issues related to the C interface or implementation ci Issues related to CI cmake Issues related to CMake

Comments

@zaikunzhang
Copy link
Member

zaikunzhang commented Jan 29, 2024

See the branch

https://github.com/libprima/prima/tree/c_stress_test_failure

and the workflow run

https://github.com/libprima/prima/actions/runs/7934551296/job/21665774763

@zaikunzhang zaikunzhang added c Issues related to the C interface or implementation cmake Issues related to CMake ci Issues related to CI labels Jan 29, 2024
@zaikunzhang
Copy link
Member Author

zaikunzhang commented Feb 9, 2024

Hi @nbelakovski ,

Could you take a look at this? There is a bug here (I have created a branch to reproduce the bug easily: https://github.com/libprima/prima/tree/c_stress_test_failure).

In the new version (which has other issues as you mentioned; I will fix them), the bug is not triggered anymore but we do not know why. There is nothing more scary than a magically disappeared bug.

Many thanks.

Best regards,
Zaikun

@nbelakovski
Copy link
Contributor

nbelakovski commented Mar 24, 2024

I think I see the issue. result is not initialized, and in C variables that aren't initialized can theoretically have garbage data in them (basically the contents of whatever memory address they were given). In the branch linked, all of the algorithms except cobyla fail prima_check_problem with the return code indicating a problem mismatch (because problem.calcfc is set for all of them), and so prima_init_result is never called. Then stress.c called prima_free_result on uninitialized memory and hence tried to free memory that was not allocated.

I've been unable to reproduce this locally, even when using the same intel compiler, so the above is theoretical but I think it's a compelling case, particularly since the issue does not arise for cobyla (also note the failed windows test is due to the issue with the setup-fortran action linking the wrong libgfortran which we fixed).

Note that calcfc is no longer specified for all algorithms, it appears this was corrected in c85e631.

However the potential issue of the user calling prima_free_result on uninitialized memory still exists. I see two options for fixing this:

  1. We can call prima_init_result before we call prima_check_problem. However if prima_check_problem fails then we should free the memory and set result.x/result.nlconstr to 0 before returning.
  2. We call memset(result, 0, sizeof(prima_result_t)); at the very beginning of prima_minimize. In this option we keep the order of prima_init_result and prima_check_problem as they are (checking the problem before initializing).

Either option allows the user to call prima_free_result afterwards with no ill effects, since it guarantees that result.x and result.nlconstr will be 0 or they will be allocated. I think option 2 is better since it helps to avoid unnecessary calls to malloc and free.

@zaikunzhang
Copy link
Member Author

set result.x/result.nlconstr to 0

We should not set them to an arbitrary normal value if they are not defined yet. (why 0? Why not 1? 100?)

If they are not defined yet, then they should have a velue that indicates "uninitialised" (in the mathematical sense, which is different from "uninitialised" in the programming sense). The ideal value would be NaN.

This is w very important point. Otherwise, the logic is wrong and it may lead to mysterious bugs.

However, here the situation is different. result.x should be x0, and the other should be the constraint value at x0.


BTW, I am against passing the value of x0 to minimize using result.x. The logic is wrong. Before the optimization starts, result.x should be uninitialised (in the mathematical sense).

Thank you.

@zaikunzhang
Copy link
Member Author

zaikunzhang commented Mar 25, 2024

We should not set them to an arbitrary normal value if they are not defined yet. (why 0? Why not 1? 100?)

BTW, I am against passing the value of x0 to minimize using result.x. The logic is wrong. Before the optimization starts, result.x should be uninitialised (in the mathematical sense).

Let me put is in this way: at any moment, the content of "result" should be consistent. It cannot be partially initialised and partially not (unless the initialisation is being done), and it cannot contain values that seems "normal" when it is indeed "uninitialised" (mathematically), giving us the wrong impression that it is initialised when it is indeed not.

nbelakovski added a commit to nbelakovski/prima that referenced this issue Mar 25, 2024
@nbelakovski
Copy link
Contributor

set result.x/result.nlconstr to 0

We should not set them to an arbitrary normal value if they are not defined yet. (why 0? Why not 1? 100?)

I wasn't clear here: result.x and result.nlconstr are pointers, so I meant setting them to NULL (which is also 0), not an arbitrary value.

As for the other points we discussed them in your office and I've now opened #180 to implement the changes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
c Issues related to the C interface or implementation ci Issues related to CI cmake Issues related to CMake
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants