Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stream.of(...) consumes the underlying iterable eagerly #94

Open
adrian-herscu opened this issue Aug 25, 2024 · 2 comments
Open

Stream.of(...) consumes the underlying iterable eagerly #94

adrian-herscu opened this issue Aug 25, 2024 · 2 comments
Assignees
Labels
bug Something isn't working

Comments

@adrian-herscu
Copy link

Describe the bug
Stream.of(...) consumes the underlying iterable eagerly.
Perhaps there is another way of wrapping an iterable?

To Reproduce

def gen() -> Generator[int, None, None]:
    for i in range(1, 4):
        print(">>")
        yield i


def should_stream0():
    for i in gen():
        print(i)


def should_stream2():
    Stream.of(gen()) \
        .map(lambda i: str(i)) \
        .for_each(print)

Expected behavior
should_stream2 should behave the same as should_stream0 by printing >> interleaved with numbers.

Machine (please complete the following information):

  • linux
  • Python Version: 3.12.4
  • PyStreamAPI Version: 1.3.0
@adrian-herscu adrian-herscu added the bug Something isn't working label Aug 25, 2024
@adrian-herscu

This comment was marked as outdated.

@garlontas
Copy link
Member

garlontas commented Sep 2, 2024

Thanks @adrian-herscu for your helping improve pystreamapi!

You are right that generators are not handled as they should be. The problem is very visible in this snippet:

def infinite_gen():
    yield from iter(int, 1)

Stream.of(infinite_gen()) \
    .map(lambda x: x * 2) \ # 💀 Infinite loop
    .limit(10) \ 
    .for_each(print)

Currently, you have to position limit at the beginning of the stream as the first intermediate operation.
In Java, any order is supported as long as you use limit somewhere.

def infinite_gen():
    yield from iter(int, 1)

Stream.of(infinite_gen()) \
    .limit(10) \ 
    .map(lambda x: x * 2) \
    .for_each(print) # 👍 Working perfectly

Pystreamapi is executing all operations lazy, but internally uses lists, being the reason the whole generator is consumed with the first intermediate operation.

The best solution is to use generators internally so the generators are not completeley consumed.

Note: The missing pattern in your output is not necessarily always an issue, since there are intermediate ops which require the whole generator to be consumed (sorted, distinct, etc.)

Tasks to complete

This issue can be split into the following three tasks

  • Adapt the sequential implementation to support infinite sources: 🐛 Fix sequential stream consuming source eager #98
  • Change the parallelization strategy used to support consuming infinite sources
  • Adapt the parallel implementation to use the new parallelizer and (possibly) remove joblib dependency

garlontas added a commit that referenced this issue Nov 25, 2024
…generators-in-sequential-streams' into bugfix/#94/support-for-infinite-generators-in-sequential-streams
garlontas added a commit that referenced this issue Nov 25, 2024
…ite-generators-in-sequential-streams

🐛 Fix sequential stream consuming source eager
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
Status: 🆕 New
Development

No branches or pull requests

2 participants