Get rid of eval #59

cstjean · 2020-10-28T17:34:48Z

Fix #48, at the cost of putting the cache in a poorly-performing global.

Not sure if this is acceptable. It's frustrating that Julia seemingly lacks the tools to deal with this elegantly.

If const worked in a local function, we'd just put const and be done with it.
If typeconst existed for global variables, that would work too.
If there was a way to generate a different expansion at the local and global scope, we could use that.

Memoization.jl uses generated functions, which causes other problems. And it feels like the wrong solution too.

Maybe we should put the const, and have a separate @memoize_local for local functions? Feels like an acceptable trade-off.

Fix #48, at the cost of putting the variable in a poorly-performing global. Not sure if this is acceptable. It's frustrating that Julia seemingly lacks the tools to deal with this elegantly. - If `const` worked in a local function, we'd just put `const` and be done with it. - If `typeconst` existed for global variables, that would work too. Memoization.jl uses generated functions, which causes other problems. And it feels like the wrong solution too.

cstjean · 2020-11-07T10:27:41Z

Maybe

begin
function foo end
let memo=...
    foo(x) = get(memo, x) do ... end
end
end

EDIT: Nope, doesn't work at global scope.

cstjean · 2020-12-30T08:20:20Z

Ref discussion in https://discourse.julialang.org/t/how-to-write-memoize-that-works-in-both-local-and-global-scope/49727/6

I'm increasingly of the opinion that if we ever do a 2.0 release, we should have a separate @memoize_local, at least until Julia fixes one of the three bullet points above...

at the global scope

cstjean · 2020-12-30T09:10:43Z

I implemented the solution in #48 (comment). It's very clean and elegant, and it allows us to get rid of empty! (see #68). Unfortunately, I can't see how to preserve #60. @rikhuijzer any ideas?

EDIT: figured it out. Now there's only one @test_broken left to fix.

cstjean · 2020-12-30T09:51:52Z

I fixed the last remaining test with

         try
             # So that redefining a function doesn't leak memory through the previous cache.
             empty!(memoize_cache($f))
         catch
         end

but that brings back the empty! issue for #68, kinda. It won't wipe it on the first definition, but it should wipe it on the redefinition, if Revise is enabled (or at least, I hope it does). @peterahrens would that work for you? It makes some sense to do it like that, because presumably the on-disk cache should be wiped if the method definition changes.

rikhuijzer · 2020-12-30T11:07:01Z

This problem is going over my head a bit. What can I look at, specifically? I did notice that the clean cache tests are not in this branch; are you doing those manually?

cstjean · 2020-12-30T12:01:29Z

Maybe you looked before I rebased on top of master. The cache tests should be here now, and I did find a simple solution to the cache-wiping problem.

Now everything works, but I have to admit that with all the bugfixes, this PR has lost a lot of its appeal.

rikhuijzer · 2020-12-30T13:40:44Z

Now everything works, but I have to admit that with all the bugfixes, this PR has lost a lot of its appeal.

Hmm, life of a software engineer.

willow-ahrens · 2020-12-30T15:43:13Z

For correctness, I don't think we need to empty the cache if we simply formalize the expectation that the cache passed to memoize only contains correct values (this would be easier to state if the syntax was to simply supply an expression which evaluated to an AbstractDict rather than a no-argument function that returned a dictionary. But I digress...). Note that an empty dictionary only contains correct values. If a method is overwritten, the new method will have a new cache of correct values. If the new method is just a specialization of a previous, then the cache of the old method will only get called with arguments that correspond to the still correct elements of the old cache.

While I understand why we may want to invalidate caches for garbage collection, I'd like to post an example where this seems too eager.

julia> @memoize fib(n::Int32) = (println("fibbing Int32"); if n < 2 1 else fib(n - Int32(1)) + fib(n - Int32(2)) end)
fib (generic function with 1 method)

julia> fib(Int32(4))
fibbing Int32
fibbing Int32
fibbing Int32
fibbing Int32
fibbing Int32
5

julia> @memoize fib(n::Int64) = (println("fibbing Int64"); if n < 2 1 else fib(n - Int64(1)) + fib(n - Int64(2)) end)
fib (generic function with 2 methods)

julia> fib(Int32(4))
fibbing Int32
fibbing Int32
fibbing Int32
fibbing Int32
fibbing Int32
5

In this example, we end up recalculating the old fib, even though the values in the cache of fib(::Int32) should always be valid whenever the method is called.

Furthermore, while emptying the old cache gets rid of most of its memory impact, the old dictionary itself is never erased. The old dictionary itself is still there, referenced by the const gensym() variable that refers to it, and unreachable by standard memoize calls like memoize_cache since the $(f)_memoized_cache variable has since been overwritten.

What if instead of emptying the cache every time, we change memoize_cache to return a list of all of the dictionaries used by a particular function, and provide a convenience function to empty them all when the user wants to save on memory by deleting caches from out-of-date functions? If this is agreeable to everyone, I would be happy to implement the changes.

willow-ahrens · 2020-12-30T16:24:40Z

Reflecting a bit, I realize now that my previous comment was written under the assumption that only the memoized function gets redefined, and that it doesn't reference any changing global state. I suspect it would be tricky (and likely out of scope) to invalidate the cache whenever functions called by the memoized function are redefined, or when relevant global state is changed.

cstjean · 2020-12-30T18:49:35Z

Since this is a per-method cache, it is breaking. It's also incorrect when clearing the cache of a function with multiple cached methods.

cstjean · 2021-01-01T11:10:03Z

In this example, we end up recalculating the old fib, even though the values in the cache of fib(::Int32) should always be valid whenever the method is called.

Yeah, this is the bug I describe just above.

Furthermore, while emptying the old cache gets rid of most of its memory impact, the old dictionary itself is never erased. The old dictionary itself is still there, referenced by the const gensym() variable that refers to it, and unreachable by standard memoize calls like memoize_cache since the $(f)_memoized_cache variable has since been overwritten.

If clearing up the cache is about freeing memory after a Revision, then leaking a few empty Dicts' memory is inconsequential...

What if instead of emptying the cache every time, we change memoize_cache to return a list of all of the dictionaries used by a particular function, and provide a convenience function to empty them all when the user wants to save on memory by deleting caches from out-of-date functions? If this is agreeable to everyone, I would be happy to implement the changes.

Returning a list of all dictionaries would fix to the bug described. I don't know if we should stop emptying the cache on redefinition. Before this PR, emptying the Dict was more critical, because otherwise bad results could be returned. With this PR, it's "just" a memory concern during interactive development.

Come to think of it, is there an argument against emptying the (now unreachable) Dict on method redefinition? It makes the code a bit uglier, but that's it, no?

Reflecting a bit, I realize now that my previous comment was written under the assumption that only the memoized function gets redefined, and that it doesn't reference any changing global state. I suspect it would be tricky (and likely out of scope) to invalidate the cache whenever functions called by the memoized function are redefined, or when relevant global state is changed.

Yes, that's all true. Cache invalidation is complex. I use @memoize foo(x; _=world_age()) = ... when I want more conservative invalidation. Maybe we should define world_age() in this package, and document this trick.

It makes the macro expansion much more palatable

cstjean · 2021-01-01T13:48:39Z

Another point in favor of this PR: it's not particularly hard to share the cache between methods if that's the desired behaviour, with

cache = Dict()
@memoize cache foo(x::Int) = ...
@memoize cache foo(x::Vector) = ...

willow-ahrens · 2021-01-01T17:53:37Z

Come to think of it, is there an argument against emptying the (now unreachable) Dict on method redefinition? It makes the code a bit uglier, but that's it, no?

Here are three reasons not to empty the old cache.

While the old cache ends up "unreachable" in this PR in the sense that the global variable no longer points to it and it doesn't get returned by memoized_cache, that cache is still used by the old method if we specialize rather than redefine (memoize currently can't tell the difference). This is the fib example above. Emptying that cache would delete useful, valid calculations. In cases where we specialize the function, this isn't a memory leak.
Emptying the old dict means that users can't pass caches with precomputed values into a function and expect them to survive. Consider issue Saving cache to file and reading from it #68.
In the implementation in this PR, emptying the old method cache only empties the cache of the most recently defined method. The complaint here is not that the code is ugly, but that the behavior is arbitrary. Consider

   @memoize f(x::Int) = (println("hello"); 1)
   @memoize f(y::String) = (println("world"); 1)
    f(4)
    f("foo")
   @memoize f(z::Bool) = 3
    f(4)
    f("foo")

In this PR, the second invocation of f(::String) will recalculate, but f(::Int) will not.

cstjean · 2021-01-01T20:26:01Z

While the old cache ends up "unreachable" in this PR in the sense that the global variable no longer points to it and it doesn't get returned by memoized_cache, that cache is still used by the old method if we specialize rather than redefine (memoize currently can't tell the difference). This is the fib example above. Emptying that cache would delete useful, valid calculations. In cases where we specialize the function, this isn't a memory leak.

Right, but that would be fixed by the list-of-caches approach. We should only wipe the redefined method's cache.

Emptying the old dict means that users can't pass caches with precomputed values into a function and expect them to survive. Consider issue #68.

Only if you redefine the function. At some point, it seems reasonable to carve some use case domain for Memoize.jl. Perhaps we can suggest in the README to use

cache = OnDiskDict("blah.data")
foo(x) = get!(cache, x) do
    ...
end

In the implementation in this PR, emptying the old method cache only empties the cache of the most recently defined method. The complaint here is not that the code is ugly, but that the behavior is arbitrary.

Right, it should be fixed before merging.

willow-ahrens · 2021-01-01T21:30:05Z

Ah, I see. So instead of a list of caches, you're proposing a dictionary of caches mapping each method to its cache. Then, when a method is redefined (in this case, Julia gives a warning regarding method overwriting I think), you would empty only the cache corresponding to the overwritten method to clear up memory. In that case, I agree with you that it's pretty safe to empty the cache. Storing the method associated with each cache has the added bonus of affording the user more granularity when clearing caches manually. We could define functions like function_memories(f) and method_memories(f, argtypes) that return either all the caches for a function or just the specific cache for a specific method.

You're right that this solution would work with #68 assuming the user doesn't overwrite methods, so I'm in favor.

As an aside: It's too bad the gc doesn't see that there's no way to access the overwritten cache and take care of deleting it on its own.

cstjean · 2021-01-04T16:56:06Z

To recap this PR, because I don't see myself working on it anytime soon:

Needs rebasing to run on Github-CI
For multiple cached methods, the remembered cache is wrong (so clearing it won't work). We need to maintain a Dict (?) of method-args -> memo-Dict
This PR is breaking (cache is no longer shared among methods), so it needs a major version change. Right now, everyone seems on board with the changes, but it's probably worth reviewing one more time for impact.
If we're breaking, then we should also remove the special casing of @memoize OrderedDict in favor of @memoize OrderedDict(). Maybe it should be deprecated in a 0.4.x release?

cstjean mentioned this pull request Oct 29, 2020

World age error marius311/Memoization.jl#10

Closed

cstjean added 2 commits December 30, 2020 03:51

Use local directive so that it's also performant

88c8c6f

at the global scope

Merge branch 'master' into global-const-cache

cc4b00a

Support memoize_cache too

72ab0cd

cstjean marked this pull request as draft December 30, 2020 09:29

Fix the broken finalized test

7d5f2f4

cstjean marked this pull request as ready for review December 30, 2020 09:31

Comment

0d703de

This was referenced Dec 30, 2020

Saving cache to file and reading from it #68

Open

Running @memoize a second time silently does nothing, even if the second call refers to nonexistent names #66

Open

cstjean added the breaking label Dec 30, 2020

Factor out try_empty_cache.

c596fdb

It makes the macro expansion much more palatable

willow-ahrens mentioned this pull request Jan 5, 2021

Remove eval, memoize methodwise, traits, callables, typed caches #70

Closed

willow-ahrens mentioned this pull request Jan 9, 2021

Add support for method-wise cache invalidation. #71

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Get rid of eval #59

Get rid of eval #59

cstjean commented Oct 28, 2020

cstjean commented Nov 7, 2020 •

edited

Loading

cstjean commented Dec 30, 2020

cstjean commented Dec 30, 2020 •

edited

Loading

cstjean commented Dec 30, 2020 •

edited

Loading

rikhuijzer commented Dec 30, 2020

cstjean commented Dec 30, 2020

rikhuijzer commented Dec 30, 2020

willow-ahrens commented Dec 30, 2020 •

edited

Loading

willow-ahrens commented Dec 30, 2020

cstjean commented Dec 30, 2020

cstjean commented Jan 1, 2021

cstjean commented Jan 1, 2021

willow-ahrens commented Jan 1, 2021 •

edited

Loading

cstjean commented Jan 1, 2021

willow-ahrens commented Jan 1, 2021 •

edited

Loading

cstjean commented Jan 4, 2021

Get rid of eval #59

Are you sure you want to change the base?

Get rid of eval #59

Conversation

cstjean commented Oct 28, 2020

cstjean commented Nov 7, 2020 • edited Loading

cstjean commented Dec 30, 2020

cstjean commented Dec 30, 2020 • edited Loading

cstjean commented Dec 30, 2020 • edited Loading

rikhuijzer commented Dec 30, 2020

cstjean commented Dec 30, 2020

rikhuijzer commented Dec 30, 2020

willow-ahrens commented Dec 30, 2020 • edited Loading

willow-ahrens commented Dec 30, 2020

cstjean commented Dec 30, 2020

cstjean commented Jan 1, 2021

cstjean commented Jan 1, 2021

willow-ahrens commented Jan 1, 2021 • edited Loading

cstjean commented Jan 1, 2021

willow-ahrens commented Jan 1, 2021 • edited Loading

cstjean commented Jan 4, 2021

cstjean commented Nov 7, 2020 •

edited

Loading

cstjean commented Dec 30, 2020 •

edited

Loading

cstjean commented Dec 30, 2020 •

edited

Loading

willow-ahrens commented Dec 30, 2020 •

edited

Loading

willow-ahrens commented Jan 1, 2021 •

edited

Loading

willow-ahrens commented Jan 1, 2021 •

edited

Loading