Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Combinations doesn't handle duplicate elements #12540

Closed
arminsumo opened this issue Feb 24, 2022 · 8 comments · Fixed by scala/scala#9947
Closed

Combinations doesn't handle duplicate elements #12540

arminsumo opened this issue Feb 24, 2022 · 8 comments · Fixed by scala/scala#9947

Comments

@arminsumo
Copy link

reproduction steps

using Scala 2.13,

"aba".combinations(3).toSeq // returns Seq("aab"), should return Seq("aba")
"abaa".combinations(3).toSeq // returns Seq("aaa", "aab"), should return Seq("aaa", "aba")

https://scastie.scala-lang.org/V3sDnjcJTTSjVq3nfKXLzA

problem

Combinations method does not seem to gracefully handle input with duplicated elements. The output does not represent an ordered combination of the input, as the docs suggest it should do

Iterates over combinations. A _combination_ of length n is a subsequence of the original sequence, with the elements taken in order.
@SethTisue
Copy link
Member

does scala/scala#9942 shed any light on this for you? cc @counter2015

@som-snytt
Copy link

Doc at https://www.scala-lang.org/api/current/scala/collection/StringOps.html#combinations(n:Int):Iterator[String]

with implementation at https://www.scala-lang.org/api/current/scala/collection/Seq.html#combinations(n:Int):Iterator[C]

It appears that, of the many uses of underscore, emphasis is not one.

"aba" has 2 * a and 1 * b, and index of a < index of b. Selecting 3 means taking 2 a then 1 b.

@som-snytt
Copy link

I see @SethTisue is up to his usual hijinks, with the emphasis on jinx.

The recent update didn't touch ArrayOps. The issue of how examples are presented is worth nailing down.

@arminsumo
Copy link
Author

arminsumo commented Feb 24, 2022

The updated docs makes it more in line with the current behavior, but I wonder if this is the most intuitive implementation? It's a rather surprising behavior, that only the first occurrence is considered for ordering purposes.

Ex, python implementation is more intuitive imo

import itertools
print(list(itertools.combinations('aba', 3))) // returns ['aba']

@som-snytt
Copy link

@arminsumo thanks for the heads up about expectations.

I'll submit a doc tweak before I forget what I just learned.

@SethTisue
Copy link
Member

SethTisue commented Feb 25, 2022

PR that documents the status quo: scala/scala#9947

That still leaves the question of whether the behavior should change. Not sure we can change the behavior in a 2.13.x point release, since it would be a design change and not merely fixing a "bug"?

@som-snytt
Copy link

som-snytt commented Feb 25, 2022

If only there were a repo where contributors could add functionality.

Oh I see people do add issues at https://github.com/scala/scala-library-next/issues

Offhand, I can imagine 3 flavors:

  • status quo (arbitrary sequences)
  • multiset result
  • filter of current sequence (so if I want 2x, 1y, "xyx" yields "xyx" not "xxy")

scala/scala-library-next#117

@counter2015
Copy link

I prefer multiset result. Mathematically speaking, combinations should not care about order.

Order is the thing that permutation should care.

"abaa".combinations(3) =
Set(
  Set('a', 'a', 'a'),
  Set('b', 'a', 'a')
)

However, in real world, when we compute a large number's combinations, it's better to use iterator rather than set which This will introduce the concept of order.

And the multiset result is not as easy to read or use than sequence result.

@SethTisue SethTisue added this to the 3.x milestone Mar 2, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants