Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ensure Repeatable Results by Preserving Insert Order in Unique Array Generation Methods #4586

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

Jason5Lee
Copy link

Context:

The methods ArrayOfUniqueValues and ArrayOfUniqueStrings aim to generate repeatable sequences of unique values or strings. The repeatability is achieved using a fixed Seed (as noted in the comment on line 16). However, the current implementation relies on the traversal order of HashSet, which does not guarantee a consistent order.

Issue:

The assumption that a fixed insertion sequence into a HashSet ensures a repeatable traversal order is flawed:

  1. HashSet does not provide any guarantees about order semantics in its documentation.
  2. Some standard libraries in other programming languages, or even future implementations of .NET, may introduce variations in Set behavior where traversal order is inconsistent even with a fixed insertion sequence.

This discrepancy could lead to non-repeatable results, breaking the method's intended behavior.

Solution:

  • Instead of relying on HashSet.CopyTo for transferring elements to the result array, we now directly assign values to the array in the order they are added to the HashSet. This ensures the output matches the insertion order and remains repeatable with a fixed seed.
  • Updated both ArrayOfUniqueValues and ArrayOfUniqueStrings methods to reflect this approach.

Changes:

  1. Modified ArrayOfUniqueValues:
    • Removed reliance on HashSet.CopyTo.
    • Assigned values to the result array during insertion to preserve order.
  2. Modified ArrayOfUniqueStrings:
    • Similarly, ensured the assignment to the array occurs during insertion into the HashSet.

Benefits:

  • Guarantees repeatable results consistent with the fixed seed.
  • Eliminates reliance on undocumented behavior of HashSet traversal.

Copy link
Contributor

@e-kharion e-kharion left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, good catch.

Copy link
Member

@adamsitnik adamsitnik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's true that HashSet does not provide any guarantees about order semantics. But on the other hand, I would expect the current implementation to always result in the same order. And when we take this change, the order can change. And if this method is used by some benchmarks for which the order is important (like sorting for example), it could affect the time reported by those benchmarks. And this is something that we try to avoid, as it could be detected as a regression/improvement by our automation. More: https://github.com/dotnet/performance/blob/main/docs/microbenchmark-design-guidelines.md#benchmarks-are-immutable

Having said that, I am hesitant to take this change, as it could affect existing benchmarks.

If the current owners of the repo are fine with it, I would recommend ensuring that the affected methods are not used by benchmarks where execution time is dependent on the order of inputs.

Thank you for your contribution @Jason5Lee !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants