Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add from and to parameters to Array* iterators #100

Open
emilbayes opened this issue Aug 2, 2017 · 21 comments
Open

Add from and to parameters to Array* iterators #100

emilbayes opened this issue Aug 2, 2017 · 21 comments

Comments

@emilbayes
Copy link

emilbayes commented Aug 2, 2017

The correct way in Pony to iterate over items seems to be through the for control structure which uses the Iterator interface. However if I want to iterate over a subset, I have to use Range from collections and use that to index into the array.

Changing .keys(), .values() and .pairs() to take (from: USize, to: USize) would make this a lot easier. The change seems quite small, but where I'm unsure is how to handle from > .size() and to < from. Some might say that these should cause an error (making these functions partial) and some might argue that they should cause an empty set / iterator.

I propose that the respective Iterator classes should change their default constructor to:

  new create(array: B, from: USize = 0, to: USize = USize.max_value()) =>
    _array = array
    _i = from
    _end = to.min(array.size())

Another issue is arrays that somehow become shorter than the iterator, while it is in use, since the signature of the array iterators are eg. ArrayKeys[A, B: Array[A] #read], where only #read is required.

@emilbayes
Copy link
Author

Oh yeah, I'd be happy to write the PR to ponyc and the RFC too

@jemc
Copy link
Member

jemc commented Aug 2, 2017

This makes sense in general to me - we could talk about the details in the RFC.

Note that we could apply this convention not just to Array but also other random-access sequential collections.

@emilbayes
Copy link
Author

emilbayes commented Aug 2, 2017

Returning an "empty" iterator would also be very simple. If from is larger than to the internal _i counter would start out larger than _end in my example and we could do the following:

fun has_next(): Bool =>
    _i < _end

Then there wouldn't be any breaking changes from iterator constructor becoming a partial function. But then the constructor would probably have _end = to.min(array.size()), although the clamping would be "silent".

What about an array becoming smaller than the iterator while it is being looped over?

@emilbayes
Copy link
Author

emilbayes commented Aug 2, 2017

Currently the only example of a linked-list style iterator, for a class that has Seq in it's inheritance (?) tree is List, so we'd need to figure out if Seq and ReadSeq should demand fun values() or that should be another trait.

Here's a diff of what I've had to change for the proposed changes to compile:

diff --git a/packages/builtin/array.pony b/packages/builtin/array.pony
index 4026e2e8f..13f78fcd2 100644
--- a/packages/builtin/array.pony
+++ b/packages/builtin/array.pony
@@ -528,34 +528,36 @@ class Array[A] is Seq[A]
       end
     end
 
-  fun keys(): ArrayKeys[A, this->Array[A]]^ =>
+  fun keys(from: USize = 0, to: USize = USize.max_value()): ArrayKeys[A, this->Array[A]]^ =>
     """
     Return an iterator over the indices in the array.
     """
-    ArrayKeys[A, this->Array[A]](this)
+    ArrayKeys[A, this->Array[A]](this, from, to)
 
-  fun values(): ArrayValues[A, this->Array[A]]^ =>
+  fun values(from: USize = 0, to: USize = USize.max_value()): ArrayValues[A, this->Array[A]]^ =>
     """
     Return an iterator over the values in the array.
     """
-    ArrayValues[A, this->Array[A]](this)
+    ArrayValues[A, this->Array[A]](this, from, to)
 
-  fun pairs(): ArrayPairs[A, this->Array[A]]^ =>
+  fun pairs(from: USize = 0, to: USize = USize.max_value()): ArrayPairs[A, this->Array[A]]^ =>
     """
     Return an iterator over the (index, value) pairs in the array.
     """
-    ArrayPairs[A, this->Array[A]](this)
+    ArrayPairs[A, this->Array[A]](this, from, to)
 
 class ArrayKeys[A, B: Array[A] #read] is Iterator[USize]
   let _array: B
   var _i: USize
+  let _end: USize
 
-  new create(array: B) =>
+  new create(array: B, from: USize = 0, to: USize = USize.max_value()) =>
     _array = array
-    _i = 0
+    _i = from
+    _end = to.min(array.size())
 
   fun has_next(): Bool =>
-    _i < _array.size()
+    _i < _end
 
   fun ref next(): USize =>
     if _i < _array.size() then
@@ -567,13 +569,15 @@ class ArrayKeys[A, B: Array[A] #read] is Iterator[USize]
 class ArrayValues[A, B: Array[A] #read] is Iterator[B->A]
   let _array: B
   var _i: USize
+  let _end: USize
 
-  new create(array: B) =>
+  new create(array: B, from: USize = 0, to: USize = USize.max_value()) =>
     _array = array
-    _i = 0
+    _i = from
+    _end = to.min(array.size())
 
   fun has_next(): Bool =>
-    _i < _array.size()
+    _i < _end
 
   fun ref next(): B->A ? =>
     _array(_i = _i + 1)?
@@ -585,13 +589,15 @@ class ArrayValues[A, B: Array[A] #read] is Iterator[B->A]
 class ArrayPairs[A, B: Array[A] #read] is Iterator[(USize, B->A)]
   let _array: B
   var _i: USize
+  let _end: USize
 
-  new create(array: B) =>
+  new create(array: B, from: USize = 0, to: USize = USize.max_value()) =>
     _array = array
-    _i = 0
+    _i = from
+    _end = to.min(array.size())
 
   fun has_next(): Bool =>
-    _i < _array.size()
+    _i < _end
 
   fun ref next(): (USize, B->A) ? =>
     (_i, _array(_i = _i + 1)?)
diff --git a/packages/builtin/read_seq.pony b/packages/builtin/read_seq.pony
index c273fab65..8e7b03f95 100644
--- a/packages/builtin/read_seq.pony
+++ b/packages/builtin/read_seq.pony
@@ -13,7 +13,7 @@ interface box ReadSeq[A]
     is out of bounds. Note that this returns A^, not this->A.
     """
 
-  fun values(): Iterator[this->A]^
+  fun values(from: USize = 0, to: USize = USize.max_value()): Iterator[this->A]^
     """
     Returns an iterator over the elements of the sequence. Note that this
     iterates over A^, not this->A.
diff --git a/packages/builtin/seq.pony b/packages/builtin/seq.pony
index 233e7d5a3..c2bb821bc 100644
--- a/packages/builtin/seq.pony
+++ b/packages/builtin/seq.pony
@@ -75,7 +75,7 @@ interface Seq[A]
     If the sequence is already smaller than len, do nothing.
     """
 
-  fun values(): Iterator[this->A]^
+  fun values(from: USize = 0, to: USize = USize.max_value()): Iterator[this->A]^
     """
     Returns an iterator over the elements of the sequence.
     """
diff --git a/packages/builtin/string.pony b/packages/builtin/string.pony
index 10dde5eaf..35cae2d43 100644
--- a/packages/builtin/string.pony
+++ b/packages/builtin/string.pony
@@ -1438,17 +1438,17 @@ actor Main
   fun string(): String iso^ =>
     clone()
 
-  fun values(): StringBytes^ =>
+  fun values(from: USize = 0, to: USize = USize.max_value()): StringBytes^ =>
     """
     Return an iterator over the bytes in the string.
     """
-    StringBytes(this)
+    StringBytes(this, from, to)
 
-  fun runes(): StringRunes^ =>
+  fun runes(from: USize = 0, to: USize = USize.max_value()): StringRunes^ =>
     """
     Return an iterator over the codepoints in the string.
     """
-    StringRunes(this)
+    StringRunes(this, from, to)
 
   fun ref _set(i: USize, value: U8): U8 =>
     """
@@ -1459,13 +1459,15 @@ actor Main
 class StringBytes is Iterator[U8]
   let _string: String box
   var _i: USize
+  let _end: USize
 
-  new create(string: String box) =>
+  new create(string: String box, from: USize = 0, to: USize = USize.max_value()) =>
     _string = string
-    _i = 0
+    _i = from
+    _end = to.min(string.size())
 
   fun has_next(): Bool =>
-    _i < _string.size()
+    _i < _end
 
   fun ref next(): U8 ? =>
     _string(_i = _i + 1)?
@@ -1473,13 +1475,15 @@ class StringBytes is Iterator[U8]
 class StringRunes is Iterator[U32]
   let _string: String box
   var _i: USize
+  let _end: USize
 
-  new create(string: String box) =>
+  new create(string: String box, from: USize = 0, to: USize = USize.max_value()) =>
     _string = string
-    _i = 0
+    _i = from
+    _end = to.min(string.size())
 
   fun has_next(): Bool =>
-    _i < _string.size()
+    _i < _end
 
   fun ref next(): U32 ? =>
     (let rune, let len) = _string.utf32(_i.isize())?
diff --git a/packages/collections/list.pony b/packages/collections/list.pony
index 1672bfc72..ed2903a31 100644
--- a/packages/collections/list.pony
+++ b/packages/collections/list.pony
@@ -530,7 +530,7 @@ class List[A] is Seq[A]
     """
     ListNodes[A, this->ListNode[A]](_head, true)
 
-  fun values(): ListValues[A, this->ListNode[A]]^ =>
+  fun values(a: USize = 0, b: USize = USize.max_value()): ListValues[A, this->ListNode[A]]^ =>
     """
     Return an iterator on the values in the list.
     """

@jemc
Copy link
Member

jemc commented Aug 2, 2017

What about an array becoming smaller than the iterator while it is being looped over?

IIRC we already don't have any behavioural guarantees for mutating a collection that you're iterating over, so I think the answer that "behaviour in such a case is undefined" is the best we can say.

@jemc
Copy link
Member

jemc commented Aug 2, 2017

@emilbayes Regarding the question of whether List (and other linked-lists) should accept from and to parameters in the values method, I'd say that because they already support indexed random-access (via the apply and update methods), adding from and to constraints to their iterators should be fair game.

@jemc
Copy link
Member

jemc commented Aug 2, 2017

Also, I notice that your diff is missing the collections/persistent subpackage - you probably want to spruce those up as well.

@emilbayes
Copy link
Author

emilbayes commented Aug 2, 2017

Thanks for the input @jemc

Re: List: Should the "price" of forwarding to from be paid in the iterator constructor? Is that intuitive? (ofc it should be documented)

Re: collections/persistent: Will do! I guess these didn't show up as they're not part of any of the traits I changed. They also look more tricky at first glance (I haven't used them before)

@jemc
Copy link
Member

jemc commented Aug 2, 2017

Re: List: Should the "price" of forwarding to from be paid in the iterator constructor? Is that intuitive? (ofc it should be documented)

Seems intuitive to me, but maybe I'm not understanding you correctly.

@emilbayes
Copy link
Author

I just want it to be unsurprising that a large from with Listwill incur some kind of cost, unlike with Array :)

@jemc
Copy link
Member

jemc commented Aug 2, 2017

Makes sense - I think documentation is enough for that. As I said above, if it's already allowing random access via apply, it's not like you're adding something new in terms of jump cost, as I see it.

@SeanTAllen
Copy link
Member

@emilbayes are you still interested in writing an RFC for this?

@ponylang-main ponylang-main added the discuss during sync Should be discussed during an upcoming sync label Feb 4, 2022
@SeanTAllen SeanTAllen removed the discuss during sync Should be discussed during an upcoming sync label Feb 8, 2022
@SeanTAllen
Copy link
Member

@jemc are you still in favor of this, if yes, I can take on writing an RFC as it seems relatively straightforward to write.

@ponylang-main ponylang-main added the discuss during sync Should be discussed during an upcoming sync label Jan 2, 2023
@jemc
Copy link
Member

jemc commented Jan 3, 2023

Yes, I'm still in favor of it.

@SeanTAllen
Copy link
Member

Some thoughts/discussion points...

If we have some functions (existing or new) that has from and to parameters that result in "can't compute" it would make sense to have the function(s) be partial. I don't like the idea of "empty set iterator" for "you gave incorrect input". That's a nice way to hide a bug.

  • I don't want to make the existing methods partial. Ewwww...
  • I don't much like the idea of adding additional methods for "a range".

Perhaps... ArrayKeys et al are modified and keys, values, etc methods are left as is. Something like this...

  new free_candy(array: B, from: Usize, to: USize) ? =>
    ...
    
  new free_candy_from(x: ArrayKeys[A, B], from: USize, to: USize) ? => 

Where you can either.. create an ArrayKeys directly that is constrained to a subset of the Array or you can create from an existing ArrayKeys.

Thoughts Joe? Technically, people can do these on their own without it being incorporated into the standard library. There's nothing to stop someone from creating these iterators on their own. The only issue is "is there a method to get directly from Array et al". I'm quite ok with Array et al have keys, values etc only return "most commonly used iterator". Modifying Array to support a new iterator type is opening up to continual churn to support others in the future.

I do think that modifying ArrayKeys etc to support some kind of "range" seems completely fine though and would be handy to a wide enough audience to include as a non-breaking change into the standard library.

If folks are generally in agreement with this approach, I can write an RFC that includes what classes would be augmented and with an exemplar implementation.

@jemc
Copy link
Member

jemc commented Jan 5, 2023

I don't like the idea of "empty set iterator" for "you gave incorrect input". That's a nice way to hide a bug.

Personally, I'm in the camp of returning an empty iterator for such cases. Which would be consistent with how we already handle out-of-bound range-restricting methods.

All of the range-restricting methods I could find in builtin use the pattern of silently clipping the requested range into bounds:

  • Array.slice
  • Array.trim
  • Array.trim_in_place
  • String.substring
  • String.trim
  • String.trim_in_place
  • String.cut
  • String.cut_in_place
  • String.codepoints

To me it makes perfect sense for the following two expressions to have the same result:

my_array.slice(start, finish).values()
my_array.values(start, finish)

@SeanTAllen
Copy link
Member

I'm not personally inclined to change values without there being a compelling performance case given that you can do your first example easily enough. I doubt that people would be doing this in hot paths, so the existing way of using slice, seems fine to me. Given that and a general lack of agreement, I don't think I will move forward with writing an RFC for this.

@jemc
Copy link
Member

jemc commented Jan 6, 2023

I doubt that people would be doing this in hot paths, so the existing way of using slice, seems fine to me.

I may be missing something obvious, but I'm curious: why would you doubt that people would use this mechanism in a hot path?

@SeanTAllen
Copy link
Member

If I was in a hot path, I wouldn't want to repeatedly create iterator objects, I'd index directly. Save repeated object creation in the hot path. And if I'm not in a hot path then the extra cost of slice then iterator it pretty trivial to me.

@jasoncarr0
Copy link

One more issue here:
As is, Pony does not treat default arguments as not present, when matching interfaces. This would cause the Array class to not be some sequence interfaces that require values()

@jemc
Copy link
Member

jemc commented Jan 10, 2023

Good point - I think to move this forward we'd also want to update the other values iterators (and interfaces of them) as well.

@SeanTAllen SeanTAllen removed the discuss during sync Should be discussed during an upcoming sync label Jan 10, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants