Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support caching generators #100

Open
rinslow opened this issue Jun 2, 2018 · 6 comments
Open

Support caching generators #100

rinslow opened this issue Jun 2, 2018 · 6 comments

Comments

@rinslow
Copy link

rinslow commented Jun 2, 2018

Many times I'll have a function such as

def x():
    result = some_long_function_that_eventually_calculates_a_list()
    yield from (result.name for result in result)
    yield from (result.path for result in result)

I can't cache this function using regular cached_property because the generator will be exhausted.
I want to calculate it once, then whenever it is being called again return the items I retrieved in the order I retrieved them.

@wizpig64
Copy link

wizpig64 commented Aug 15, 2018

I think it's clear that mixing cached_property with generator functions is a little confusing at first glance. Rather than keeping the data returned by a generator method, using cached_property in this case returns the data once and then immediately loses it, leading one to ask, "why isn't it caching the results?" Of course, it is caching something, just not what you might naturally expect it to cache. It's caching a generator object which can only be exhausted once, rather than caching the results from the generator.

I think the solution here is to insert an intermediate decorator that dumps the generator content into a list. This list is then the thing that becomes cached by cached_property:

from cached_property import cached_property


def yield_into_list(func):
    def wrapped(*args, **kwargs):
        return list(func(*args, **kwargs))
    return wrapped


class IceCreamPuns:
    x = 2

    @cached_property
    def i_scream(self):
        # returns a list of results. a silly example, i admit.
        return [n for n in range(self.x)]

    @cached_property
    def eye_scream(self):
        # with the yield syntax, this code just got much more readable.
        # but cached_property caches the generator, not the results!
        yield from range(self.x)

    @cached_property
    @yield_into_list
    def eyes_cream(self):
        # now we can use the yield syntax and re-use the results.
        yield from range(self.x)


puns = IceCreamPuns()
list(puns.i_scream)  # [0, 1]
list(puns.i_scream)  # [0, 1]
list(puns.eye_scream)  # [0, 1]
list(puns.eye_scream)  # []  <- unexpected if you don't know this is a generator!
list(puns.eyes_cream)  # [0, 1]
list(puns.eyes_cream)  # [0, 1]

@pydanny Given there are now three tickets on this subject (#34, #90), I think it would be worth suggesting a simple solution like this in the readme, and maybe including a decorator like yield_into_list in the package. If not, I hope this comment helps other developers better understand how decorators and generators work. :)

@rinslow
Copy link
Author

rinslow commented Aug 17, 2018

Hi @wizpig64 and thanks for replying!

I think your @yield_int_list decorator will fail on the following example:

from cached_property import cached_property


def yield_into_list(func):
    def wrapped(*args, **kwargs):
        return list(func(*args, **kwargs))
    return wrapped


class ConutriesPuns:
    def Czech_this_out(self):
        a = 0
        while True:
            yield a
            a += 1

    @cached_property
    @yield_into_list
    def i_do_not_Bolivia(self):
        a = 0
        while True:
            yield a
            a += 1


puns = CountriesPuns()

for x in puns.Czech_this_out():
   print x   # Works

# -----------------------------------------------------

for x in puns.i_do_not_Bolivia:
   print x  # Doesn't work

I think I would try and implement a cached generator that handles these cases and is lazily-evaluated, so maybe you guys could point to it as a solution

@wizpig64
Copy link

wizpig64 commented Aug 17, 2018

You're right, my solution would only work on finite generators, not infinite ones. Worse, i_do_not_Bolivia locked up my system!

From a few minutes of googling, it looks like this is the right way to duplicate a generator:

from itertools import tee

def infinite_generator():
    a = 0
    while True:
        yield a
        a += 1

original = infinite_generator()

original, copy = tee(original)

for _ in range(3):
    print('original', next(original))
    print('copy', next(copy))

prints

original 0
copy 0
original 1
copy 1
original 2
copy 2

The docs for itertools.tee mention a few caveats though, which makes me wary:

This itertool may require significant auxiliary storage (depending on how much temporary data needs to be stored). In general, if one iterator uses most or all of the data before another iterator starts, it is faster to use list() instead of tee().

I'll try and see if I can work this into a decorator regardless.

@wizpig64
Copy link

wizpig64 commented Aug 17, 2018

@rinslow Okay, here's a decorator that returns a new copy of the original generator every time __get__ is called, using itertools.tee.

Notes:

  1. Rather than storing value in self.__dict__ like cached_property does, I store in a new dictionary, self.generators, as I want to fork and re-save the generator again for every new __get__() call, and don't want __dict__ to bypass it. This could probably be integrated into cached_property in a way that still keeps the __dict__ speedup for non-generators, however.
  2. I have no idea if this is async-friendly.
  3. I think using list() is still going to be the better approach when dealing with finite generators, in most cases.
from itertools import tee

class cached_generator_property:

    def __init__(self, func):
        self.__doc__ = getattr(func, "__doc__")
        self.func = func
        self.generators = {}

    def __get__(self, obj, cls):
        if obj is None:
            return self
        if self.func.__name__ in self.generators:
            generator = self.generators[self.func.__name__]
        else:
            generator = self.func(obj)
        self.generators[self.func.__name__], value = tee(generator)
        return value


class Foo:

    @cached_generator_property
    def finite(self):
        yield from range(3)

    @cached_generator_property
    def infinite(self):
        a = 0
        while True:
            print('debug: you should only see this line once for a =', a)
            yield a
            a += 1

testing it in the console:

>>> foo = Foo()
>>> print(list(foo.finite))
[0, 1, 2]
>>> print(list(foo.finite))
[0, 1, 2]
>>> a = foo.infinite
>>> b = foo.infinite
>>> for _ in range(3):
...     print(next(a))
...     print(next(b))
... 
debug: you should only see this line once for a = 0
0
0
debug: you should only see this line once for a = 1
1
1
debug: you should only see this line once for a = 2
2
2
>>> c = foo.infinite
>>> for _ in range(4):
...     print(next(c))
... 
0
1
2
debug: you should only see this line once for a = 3
3

@rinslow
Copy link
Author

rinslow commented Aug 18, 2018

Hey guys, I implemented a cachedgenerator, it's quite simple, you can read the source code at:
https://github.com/rinslow/cachedgenerator

would you want to integrate it with the cached-property library?

@pydanny
Copy link
Owner

pydanny commented Sep 30, 2018

@rinslow Integration would be a good thing as others have had to struggle through this "edge" case. If you decide to submit, I'll make it a priority to review your pull request(s). 😄

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants