Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix nesting iterations over the same Accumulate instance #37

Merged
merged 31 commits into from
Feb 26, 2025

Conversation

Baptouuuu
Copy link
Member

@Baptouuuu Baptouuuu commented Feb 21, 2025

Problem

In a closed source project a bug was found with deferred Sets. The scenario is :

  • it starts with a deferred Set A
  • A is composed with maps and filters producing a Set B
  • B is partially loaded via a ->find()->match()
  • B is composed with remove, add and maps producing a Set C
  • C is compared against A via equals

This results in duplicated values inside C. It seems to be linked to the fact when iterating over C it moves the cursor on A, but C depends on A's cursor to correctly iterate its values.

When trying to reproduce this bug outside the project with a dumbed down example it fails to reproduce the bug. Since the Sets are produced via multiple abstractions it's hard to track down the missing critical step to reproduce the bug.

However it showed that the cursor inside Accumulate is not correctly handled when multiple loops moves the cursor on the same instance.


When working on the problem the smallest bug can be expressed via :

$a = [1, 2, 3];

foreach ($a as $i) {
    foreach ($a as $j) {
        var_dump([$i, $j]);
    }
}

As one would expect it produces : [1, 1], [1, 2], [1, 3], [2, 1], [2, 2], [2, 3], [3, 1], [3, 2], [3, 3].

When the inner loop exits PHP knows where the cursor on $a should be for the next iteration on the outer loop.

However if $a is wrapped via new ArrayIterator([1, 2, 3]) it produces an infinite loop because PHP doesn't seek the cursor on $a when exiting the inner loop.

The whole problem boils down to this, seeking the correct cursor position when exiting a loop.

Solution

  • Map\DoubleIndex now uses Sequence instead of Sequence\Implementation to make sure it doesn't rely on the underlying iterator, as it may result in the iterator cursor not being correctly positioned after a method call.
  • Set\Implementation::iterator() has been removed to make sure it doesn't rely on the underlying iterator of sequences.
  • Add missing calls to ->rewind() before using iterators (such as in Aggregate). This wasn't problematic because the iterators were in the correct state. This is mainly to be coherent throughout the implementation.
  • All iterators inside Sequence\ namespace are now explicitly called via their methods instead of implicit calls via the foreach statement. This is to make it explicit what happens and introduce the cleanup below.
  • Sequence\Implementation::iterator() now returns a new Iterator class that has a cleanup method to correctly leave the iterator in a good state when the loop it's in is partially iterated over.
    • When an iterator on loaded data (in an array) the cleanup does nothing (same state as if it was completely iterated over)
    • When an iterator is lazy the registered cleanup function is called (this is a behaviour change as the function wasn't called before in this case)
    • When an iterator is deferred it re-position the cursor to its previous position on the cache before entering a new loop (signaled via the rewind call)
      • except when the underlying generator is or won't be used by any other Set/Sequence, in this case nothing happens
  • Fixes a bug on the optimisation introduced by Improve deferred sequence #27 where a deferred Sequence doesn't hold all intermediary values if it's not used elsewhere to avoid keeping too many values in memory. The optimisation didn't take into account that the source monads may no longer exist in memory by the time one, or multiple, ones that rely on it are consumed.
    • Instead of trying to optimise at capture time which data source to use, it's done at detonation time.

Note

To simplify the implementation the iterators are decorated by 2 objects each time (Iterator and Iterator\Lazy|Defer|Primitive). This complexifies stack traces and has a minor impact on memory.

This should be negligible but is worth mentioning.

@Baptouuuu Baptouuuu added the bug Something isn't working label Feb 21, 2025
@Baptouuuu Baptouuuu self-assigned this Feb 21, 2025
Copy link

codecov bot commented Feb 25, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 98.40%. Comparing base (fb87cc3) to head (6184b11).
Report is 32 commits behind head on develop.

Additional details and impacted files
@@              Coverage Diff              @@
##             develop      #37      +/-   ##
=============================================
+ Coverage      97.92%   98.40%   +0.48%     
- Complexity      1050     1084      +34     
=============================================
  Files             72       76       +4     
  Lines           4235     4768     +533     
=============================================
+ Hits            4147     4692     +545     
+ Misses            88       76      -12     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@Baptouuuu Baptouuuu marked this pull request as ready for review February 26, 2025 13:00
@Baptouuuu Baptouuuu merged commit 6ebe96d into develop Feb 26, 2025
33 checks passed
@Baptouuuu Baptouuuu deleted the fix-bad-cursor-memory-in-accumulate branch February 26, 2025 13:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant