Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Memory Leak Fixes #819

Open
KD-92 opened this issue Sep 22, 2024 · 7 comments · May be fixed by #918
Open

Memory Leak Fixes #819

KD-92 opened this issue Sep 22, 2024 · 7 comments · May be fixed by #918
Assignees
Labels
bug Something isn't working

Comments

@KD-92
Copy link

KD-92 commented Sep 22, 2024

Describe the bug
When reinitializing the Simulator repeatedly, e.g., in a Reinforcement Learning (RL) setting, memory growth can be observed. Please find different fixes below (this is not a Pull Request due to the last item in the list).

To reproduce

  1. ReactionWheelStateEffector Memory Leak
    In addition to issue #441, the destructor ReactionWheelStateEffector::~ReactionWheelStateEffector() in src/simulation/dynamics/reactionWheels/reactionWheelStateEffector.cpp uses free() on a data structure allocated with new. Additionally, it may be helpful to delete the elements of rwOutMsgs:
ReactionWheelStateEffector::~ReactionWheelStateEffector()
{
    for (long unsigned int c = 0; c < this->rwOutMsgs.size(); c++) {
        delete this->rwOutMsgs.at(c);
    }

    for (long unsigned int c = 0; c < this->ReactionWheelData.size(); c++) {
        delete this->ReactionWheelData.at(c);
    }
}
  1. Missing Py_DECREF in SWIG Eigen Type Map
    There are missing Py_DECREF in src/architecture/_GeneralModuleFiles/swig_eigen.i in the fragment checkPyObjectIsMatrixLike. The changes are annotated with new decrement.
%fragment("checkPyObjectIsMatrixLike", "header", fragment="getInputSize") {

/*
This method returns an empty optional only if the given input is like the template type T.
Otherwise, the returned optional contains a relevant error message.
The size of the input is compared to the expected size of T (where dynamically
sized matrix T allow any number of rows/columns). Moreover, an error is raised for
ragged nested sequences (rows with different numbers of elements).
*/
template<class T>
std::optional<std::string> checkPyObjectIsMatrixLike(PyObject *input)
{
    auto maybeSize = getInputSize(input);

    if (!maybeSize)
    {
        return "Input is not a sequence";
    }

    auto [numberRows, numberColumns] = maybeSize.value(); 

    if (T::RowsAtCompileTime != -1 && T::RowsAtCompileTime != numberRows)
    {
        std::string errorMsg = "Input does not have the correct number of rows. Expected " 
            + std::to_string(T::RowsAtCompileTime) + " but found " + std::to_string(numberRows);
        return errorMsg;
    }

    if (T::ColsAtCompileTime != -1 && T::ColsAtCompileTime != numberColumns)
    {
        std::string errorMsg = "Input does not have the correct number of columns. Expected " 
            + std::to_string(T::ColsAtCompileTime) + " but found " + std::to_string(numberColumns);
        return errorMsg;
    }

    for(Py_ssize_t row=0; row<numberRows; row++)
    {
        PyObject *rowPyObj = PySequence_GetItem(input, row);
        Py_ssize_t localNumberColumns = PySequence_Check(rowPyObj) ? PySequence_Length(rowPyObj) : 1;
        if (localNumberColumns != numberColumns)
        {
            Py_DECREF(rowPyObj); // new decrement
            return "All rows must be the same length! Row " + std::to_string(row) + " is not.";
        }
        Py_DECREF(rowPyObj); // new decrement
    }
    
    return {};
}
}
  1. Check for Msg disown()
    It may be of interest to verify whether <type>Msg.this.disown() is necessary in:
  • src/utilities/fswSetupRW.py (writeConfigMessage(), rwConfigMsg.this.disown(), line 87)
  • src/utilities/simIncludeRW.py (getConfigMessage(self), rwConfigMsg.this.disown(), line 379)

Expected behavior
With these changes, memory growth became unnoticeable for us even after millions of iterations.

@schaubh
Copy link
Contributor

schaubh commented Sep 27, 2024

Howdy @KD-92 , thanks for the feedback. Something to look at in more detail. Regarding item 3., the reason we use disown() in the factory classes to ensure that the message memory is not released after being created in a sub-routine. There are likely better ways to do this that avoids creating msg objects which never get released

@sassy-asjp
Copy link
Contributor

@schaubh @KD-92 @juan-g-bonilla

I think the conflict between to disown() or not to disown() is related to #676 . Developing a proper fix for that bug would allow the lifetime of the msg objects to be properly handled, instead of having to choose between mistakenly garbage collected in some situations vs intentionally leaked in all situations.

@Natsoulas Natsoulas self-assigned this Dec 10, 2024
@Natsoulas Natsoulas added the bug Something isn't working label Dec 10, 2024
@Natsoulas Natsoulas moved this to 🏗 In progress in Basilisk Dec 10, 2024
@Natsoulas
Copy link
Contributor

Natsoulas commented Jan 17, 2025

@KD-92 Your exact changes don't seem to resolve RW memory leaks, and unit tests don't pass (python crashes) if I follow your fixes. I'm curious if your local basilisk repo has any other modifications? Or perhaps there's something left out of your code blocks etc?

@Natsoulas
Copy link
Contributor

Natsoulas commented Jan 17, 2025

Another finding:

The .disown()'s in the factories are necessary for the RW Monte Carlo set up to function properly. If it is removed, python crashes.

@sassy-asjp
Copy link
Contributor

Another finding:

The .disown()'s in the factories are necessary for the RW Monte Carlo set up to function properly. If it is removed, python crashes.

That's almost certainly because the C/C++ side is unable to keep the Python side object alive by itself.

@schaubh
Copy link
Contributor

schaubh commented Jan 17, 2025

@Natsoulas , let’s look at this more. I bet we can rewrite these scenarios to ensure the RW in the factory class remain in memory. Can we make the factory class instance a class variable of BskSim?

@Natsoulas Natsoulas linked a pull request Feb 5, 2025 that will close this issue
@Natsoulas Natsoulas linked a pull request Feb 5, 2025 that will close this issue
@schaubh
Copy link
Contributor

schaubh commented Feb 18, 2025

The latest branch for these memory issues now avoids the use of disown() by ensuring variables created inside sub-routines are properly retain by parent functions if needed. All Monte Carlo runs work and the memory stability is looking much better. This PR #918 is now being reviewed.

@schaubh schaubh moved this from 🏗 In progress to 👀 In review in Basilisk Feb 18, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
Status: 👀 In review
Development

Successfully merging a pull request may close this issue.

4 participants