Skip to content

Fix memory leak in conversion of symbolic expressions #40003

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 4 commits into
base: develop
Choose a base branch
from

Conversation

r-mb
Copy link
Contributor

@r-mb r-mb commented Apr 23, 2025

I have looked into bug #27536.
When constants are converted, for instance when calling RDF(pi), the method _eval_self is called. In the process, Py_INCREF is called twice in numeric::to_pyobject (in src/sage/symbolic/ginac/numeric.cpp) and basic & ex::construct_from_pyobject (in src/sage/symbolic/ginac/ex.cpp).
To avoid this, I propose to inline the code instead of calling it implicitly when returning. In the inlined code, of course, remove the Py_INCREF.

Another leak was due to PyDict_New being called without a corresponding Py_DECREF, this is also fixed.

I have also noticed that some computations and memory usage were due to default overall coefficients being copied every time. I added another commit to return pointers to these coefficients instead.

It turns out this was the problem in #27185.

It seems to me that this fixes the problems mentioned in the bug tickets. But this is my first time pushing C++ code into production so there might be some mistakes.
I am not sure whether a numeric object keeps some memory after being deleted, but at least the Python objects do not remain, and this frees gigabytes of memory when running the codes in the tickets.
Doctesting Sage runs fine on my device.

Fixes #27185.
Fixes #27536.

📝 Checklist

  • The title is concise and informative.
  • The description explains in detail what this PR is about.
  • I have linked a relevant issue or discussion.
  • I have created tests covering the changes.
  • I have updated the documentation and checked the documentation preview.

Copy link

github-actions bot commented Apr 23, 2025

Documentation preview for this PR (built with commit e7de1e3; changes) is ready! 🎉
This preview will update shortly after each push to this PR.

@user202729
Copy link
Contributor

In theory should be fine, but in practice I'd rather finding a more sane way to handle this e.g. by making lifetime ownership more explicit / following rule of 3/5/0 / etc. We're using C++ here, not C.

@r-mb
Copy link
Contributor Author

r-mb commented Apr 26, 2025

In theory should be fine, but in practice I'd rather finding a more sane way to handle this e.g. by making lifetime ownership more explicit / following rule of 3/5/0 / etc. We're using C++ here, not C.

Thank you very much for looking into this!
I am not totally sure I understood which lifetime ownership you were referring to, but I have looked into your suggestions.
I first tried to apply the rule of 3 to ex, but it turned out it was not the solution. In fact, the Py_INCREF(o) in ex::construct_from_pyobject had nothing to do there in the first place: it was the only Python reference counting function in the whole ex.cpp file.

So I have another fix which goes as follows: move the above Py_INCREF(o) to numeric::numeric, whose
associated Py_DECREF are already implemented in numeric's destructor, move constructor etc.
Instead of the inlinings in my fix, change (for instance here in ConstantEvalf) with:

        ex res = x;
        Py_DECREF(x);
        return res;

This has to be done in various places in numeric.cpp, but it seems more satisfying to me.

This still fixes both bugs on my device, and all doctests run fine. But before pushing a commit, I want to make sure that this is the kind of fix you had in mind? (yes, you have unmasked me as a C user!)

@user202729
Copy link
Contributor

user202729 commented Apr 26, 2025

I… will need to look at the code and decide how best to encapsulate the lifetime. If you have to change every call sites, this is pretty much a breaking API change.

P/s: I wonder how much we have diverged from ginac. There are issues like #38868 which is already fixed upstream.

@user202729
Copy link
Contributor

My conclusion:

  • there's extensive use the ptr<T> template and raw pointer and new and manually keeping track of flags::dynallocated, it would probably be very difficult to change this completely. As such we try to avoid changing the API.
  • the numeric constructor from PyObject* moves the ownership of the shared reference to the numeric object. (which is what the comment means by "numeric steals a reference". Anyway "steal" is in official documentation as well)
  • On the other hand, inexplicably, the ex::ex constructor chooses to copy the ownership of the shared reference to the PyObject*.

I don't know why the difference exists. But I think the best option is to either

  • destruct the reference on caller (which is what you suggested with Py_DECREF), or
  • create a new API ex::construct_from_pyobject_steal_reference() or something, and use that. That way the code duplication can be contained in ex (you can either make the two constructors with duplicated code standing next to each other, or just make a helper function)

Side note: browsing through the occurrences of PyObject (there are 508 (!!) of them), I notice what appears to be another memory leak

numeric numeric::conj() const {
        switch (t) {
        case LONG:
        case MPZ:
        case MPQ: return *this;
        case PYOBJECT: {
                PyObject *obj = PyObject_GetAttrString(v._pyobject,
                "conjugate");
                if (obj == nullptr)
                        return *this;
                obj = PyObject_CallObject(obj, NULL);
                if (obj == nullptr)
                        py_error("Error calling Python conjugate");
                return obj;
        }

here PyObject_GetAttrString creates a new reference, but PyObject_CallObject doesn't steal the reference, so it's lost.

@user202729
Copy link
Contributor

I have also noticed that some computations and memory usage were due to default overall coefficients being copied every time. I added another commit to return pointers to these coefficients instead.

Just return a const numeric&, that way call sites doesn't need to be called. Although looking at how upstream ginac implements it, they now return an ex instead, which is just ptr<basic>, and copying that one is cheap.

@r-mb
Copy link
Contributor Author

r-mb commented Apr 28, 2025

P/s: I wonder how much we have diverged from ginac. There are issues like #38868 which is already fixed upstream.

This seems to be a big issue actually. I feel that the code has not been much updated with upstream patches, and making the code up to date would be a major effort.

Just return a const numeric&, that way call sites doesn't need to be called. Although looking at how upstream ginac implements it, they now return an ex instead, which is just ptr<basic>, and copying that one is cheap.

I have done it with the reference, thanks!

I don't know why the difference exists. But I think the best option is to either

* destruct the reference on caller (which is what you suggested with `Py_DECREF`), or

* create a new API `ex::construct_from_pyobject_steal_reference()` or something, and use that. That way the code duplication can be contained in `ex` (you can either make the two constructors with duplicated code standing next to each other, or just make a helper function)

So I arrived to the same conclusion for the difference between the constructors of ex and numeric. While I still don't understand why it was implemented that way, it turns out that your idea for a new API is more or less already implemented with ex::construct_from_basic. It turns out that in the end, the fix was as simple as changing the returned PyObject* as a numeric. This is how it was already done in numeric::power for instance.
I have pushed the commit, doctests run fine on my device and the bugs are fixed.

Side note: browsing through the occurrences of PyObject (there are 508 (!!) of them), I notice what appears to be another memory leak

Thanks for catching that one, I did take care of it as well!

@user202729
Copy link
Contributor

I suggest adding a comment to ex::construct_from_pyobject or the ex(PyObject*) constructor

note: unlike `numeric(o)`, `ex(o)` does not steal a reference to `o`. You can explicitly move the reference to the newly-constructed `ex` object by `ex(numeric(o))`.

@r-mb
Copy link
Contributor Author

r-mb commented Apr 28, 2025

Good idea! Done.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Conversion of mathematical constant such as pi to RDF leaks memory defect: sqrt memory leak
2 participants